the cornell veterinarian a metadata perspective

15
The Cornell Veterinarian A Metadata Perspective

Upload: branden-bond

Post on 23-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Cornell Veterinarian A Metadata Perspective

The Cornell VeterinarianA Metadata Perspective

Page 2: The Cornell Veterinarian A Metadata Perspective
Page 3: The Cornell Veterinarian A Metadata Perspective

The Challenge (Reprise)

Page 4: The Cornell Veterinarian A Metadata Perspective

Hathi Volume Interface

Page 5: The Cornell Veterinarian A Metadata Perspective

Hathi Data API

Page 6: The Cornell Veterinarian A Metadata Perspective

Hathi METS File

Page 7: The Cornell Veterinarian A Metadata Perspective

Hathi METS File (Continued)

Page 8: The Cornell Veterinarian A Metadata Perspective

Hathifile Record ElementsHathi Volume ID: mdp.39015076694507

Access: allow [Notes on mapping for rights attributes where contextual user data would affect access]

Rights: pd [public domain]

HathiTrust record number: 000529434

Enumeration/Chronology: v.33 no.11 1900

Source: MIU

Source institution record number: 000529434

OCLC number: 1554176

Title: The Chicago medical times.

Page 9: The Cornell Veterinarian A Metadata Perspective

What I [naively] thought was the solution…

1. Use the Hathi Data API to find Table of Contents for each Volume

2. Gather the related OCR

3. Parse out the article citation values from the OCR (hopefully in a mostly automated way)

4. Use the pagination data from the TOC to build links

5. What could be automated could be done manually

Goal: a citation index with Hathi URLs that could be used to build an interface or given to an index like PubMED

Page 10: The Cornell Veterinarian A Metadata Perspective

HathiTrust OCR for TOC

Page 11: The Cornell Veterinarian A Metadata Perspective

PubMed Indexing and API

Page 12: The Cornell Veterinarian A Metadata Perspective

Path for automation(For citations in PubMed for which the HathiTrust has a single volume)

Query: PubMed Volume AND Hathi Catalog ID against Hathi File to get all corresponding object id’s from the METS.

Query: METS object id’s AND the PubMed start page for each citation to find the Orderlabel to get the Order number from METS files.

Create each URL: The Hathi METS object id and Order number are used to create the URL, e.g http://babel.hathitrust.org/cgi/pt?id=coo.31924051143075;view=1up;seq=11

Page 13: The Cornell Veterinarian A Metadata Perspective

The Metadata that Got Away…

Articles not indexed by PubMed (1991-1914) Supplemental volumes

What we hope to do about it: Still working to see if we can programmatically create URL’s

for Supplemental Volumes Manually capture citation data and URL’s for pre-1945

articles using OCR.

Page 14: The Cornell Veterinarian A Metadata Perspective

PubMed Data Requirements

Linking Format (when we’re only contributing URL’s) PubMed Id’s and corresponding URL’s Administrative metadata, e.g. access restrictions, contributing

source.

Required data elements for contributing citations Journal ISSN Journal ID or Journal title abbreviation Journal Publisher Copyright statement, where applicable Volume/Issue/Article sequence or pagination Issue publication date Article electronic publication date? AND URL’s

Page 15: The Cornell Veterinarian A Metadata Perspective

What does it all mean?

For the project: The Cornell Veterinarian should be available via PubMed for

the years already indexed soon.

We’re still scoping out what it would take to capture the remaining citations manually. If funded this will be sent to PubMed to complete the backfile.

Larger picture: Potential for improved access to other titles currently lacking

full-text linking in PubMed [if in HathiTrust]

Consider suggesting improvements to the Hathi workflows.