national library of medicine pubmed central and the nlm dtds

20
NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

Upload: fay-hodge

Post on 16-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

PubMed Central

and the

NLM DTDs

Page 2: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

PubMed Central

PubMed Central (PMC) is NLM's digital archive of life sciences journal literature.

Dual Purpose:• Archiving journals

• Display of “free” full-text journal articles

PMC contains over 100,000 articles from more than 100 titles.

(the BMC disclaimer)

Page 3: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

PMC Workflow

Page 4: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

Back Issue Scanning

PMC has started a pilot project to digitize back issues of journals. • start with journals participating in PMC (JMLA, PNAS, ASM titles)

• journal is scanned cover to cover (including frontmatter and ads.

• article headers and abstracts (that are not available through PubMed) are being keyed in XML

• articles will be displayed as HTML headers with PDF or TIFF representations of the pages.

• 4C and halftone images will be scanned and displayed with the article.

All current journals should be scanned by Spring 2004.

Page 5: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

Intermission

The NLM DTDs

Page 6: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

PubMed Central DTD Historypmc-1.dtd

DTD currently in production.

Derived from keton.dtd and BMC article.dtd.

Designed to be a simple DTD for online display and archive.

Written with samples from PNAS, MBC, and BMC.

Why a new DTD? Elements/attributes had to be added to accommodate new journals.

DTD would become cumbersome quickly if we had to keep making changes for each new title.

Original “simplicity” of design would lead to confusing data structures as the dtd expanded.

Moved away from standard XML practices to accommodate source SGML.

Needed an independent review.

Page 7: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

The Reviewers Mulberry Technologies, Inc

The Task Review the pmc-1.dtd for XML best practices, applicability to archive and online retrieval use, and completeness in application to STM journals.

Create an updated version of the DTD

Document the new DTD.

An electronic publishing consultancy specializing in SGML- and XML-based systems.

Has been active in SGML since 1984 and in XML since 1996.

Has extensive experience in the development and maintenance of SGML and XML applications for STM publishers.

Page 8: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

The Resultspmc-2.dtd

Mulberry’s Suggestions Create two DTDs:

• one for archiving to allow us to convert data from multiple sources to our DTD.

• a subset for authoring to allow us to retain some control when publishers create articles to the DTD.

Use proven solutions like XLINK and the XHTML table standard.

Use data models to simplify the DTD.

Page 9: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

Harvard E-Journal Archiving Project• The Melon Foundation funded the Harvard Library to study the

feasibility of using one DTD for archiving journal articles.

• Harvard commissioned Inera, Inc. for the E-Journal Archive DTD Feasibility Study. • Conclusion – yes, it is feasible, but the right DTD does not exist.

• A meeting was held in April 2002 to discuss the changes needed to the PMC2 DTD to expand its range to include most any journal. Attendees included PMC, Mulberry Technologies, Inc. (consultant to PMC), The Mellon Foundation, The Harvard Library, and Inera (consultant to Harvard-Mellon).

Page 10: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

Conclusions 1. PMC and Harvard-Mellon had different ideas about what the

DTD should do.

Harvard was interested in an Interchange DTD, which would allow publishers to submit in multiple formats, which would all be valid. PMC was interested in an Archive DTD, which would be open enough to allow conversion of multiple sources into one single format.

2. If the PMC2 DTD was modularized, and some pieces were added (like the OASIS table model), many DTDs could be built using the same elements, giving both flexibility and consistency.

Page 11: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

Status

• The “NLM Archiving and Interchange DTD Suite” has been created and released.

Mulberry and Inera analyzed hundreds of journals across subjects to insure that the DTD Suite was powerful enough to tag them.

• The “NLM Journal Archiving DTD” and the “Journal Publishing DTD” have been created from the DTD Suite.

The Archiving DTD and the Suite were circulated through the Mulberry’s and Inera’s contacts in the electronic publishing world for comments and suggestions. Suggestions that made the DTD more useable were incorporated.

Page 12: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

Archiving / Publishing DTDs

• PLoS is using the DTD for their journals

• TechBooks is using Journal Publishing DTD to send PMC content for J. Athletic Training

• High Wire Press analyzing the DTDs for its use

• JSToR will use the DTD for its E-Journal Archive

• CSIRO (Australia's Commonweath Scientific & Industrial Research Organisation) will tag its journals with the new DTD

• Several others small journals trying to use the DTD to submit content to PMC

Page 13: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

The Metadata

The DTDs are article-based.

Metadata in each article is broken down into two parts:

• Journal Metadata

• Article Metadata

Page 14: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

Journal Metadata

Journal Metadata carries all information about the journal that the article is (was) published in.

• Journal Identifier(s) (by archive name, doi, nlm title abbreviation, publisher ids)

• Journal Title

• Abbreviated Journal Title

• ISSN(s) – print and/or electronic

• Publisher information

Page 15: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

Article Metadata

Article Metadata carries information about the article (and its ‘address’ related to the journal.

• Article Id(s)

• Article Categories –subject categories or TOC sections

• Article Titles – includes title, subtitle, translated title and alternate title.

• Contributors – authors and editors and their affiliations

• Author notes – ‘footnotes’ specific to authors

• Publication Dates – print, electronic, preprint, collection

• Volume and Issue

Page 16: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

More Article Metadata

• Pagination – first/last page or ‘elocation-id’

• Article-level links

• Product information – for book, software, or hardware reviews

• Article History – dates received, accepted, etc

• Copyright information

• Related article information

• Abstracts

• Keywords

• Contract/Grant information

• Counts – figures, tables, equations, pages

Page 17: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

What’s Next?: Working Group

To keep the DTD relevant to the publishing and archiving communities, we have created the XML Interchange Structure Working Group. This group advises NLM on recommended changes in and/or additions to the tagset.

The Working group met for the first time on August 18, 2003.

The recommendations from this meeting led to version 1.1 of the DTDs, released on November 1, 2003.

Page 18: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

What’s Next?: Other DTDs

Because the DTD is built as a set of DTD modules, other document types can be created (relatively) easily using the same content models.

We are building a Books DTD and planning an Online Documentation DTD.

Page 19: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

Links

PubMed Central –

http://www.pubmedcentral.gov

NLM DTDs and documentation

http://dtd.nlm.nih.gov

[email protected]

Page 20: NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs

NATIONAL LIBRARY OF MEDICINE

The PMC Team

Andrei Kolotev Marla Fogelman

Anh Nguyen Morais Burge

Brooke Dine Sergey Koshelkov

Ed Sequeira Sergey Krasnov

Jane Davenport Vladimir Sarkisov

Jeff Beck Vladislav Merker

Laura Kelly