jane greenberg, professor and

18
Jane Greenberg, Professor and Director, Metadata Research Center <MRC> School of Information And Library Science University of North Carolina at Chapel Hill

Upload: bona

Post on 23-Feb-2016

46 views

Category:

Documents


1 download

DESCRIPTION

The Dryad Repository: A Metadata Best Practice for a Scientific Data Repository ASIST RESEARCH DATA ACCESS & PRESERVATION SUMMIT April 9-10, 2010, Phoenix, Arizona. Jane Greenberg, Professor and Director, Metadata Research Center School of Information And Library Science - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Jane Greenberg, Professor and

Jane Greenberg, Professor and Director, Metadata Research Center <MRC>School of Information And Library Science University of North Carolina at Chapel Hill

Page 2: Jane Greenberg, Professor and

2

EcologyEcology PaleontologyPaleontology Population Population geneticsgeneticsPhysiologyPhysiologySystematics Systematics GenomicsGenomics

Dryad GoalsOne-stop deposition/access for data

objects supporting published research…Acquisition, preservation, discovery, and

reuse of heterogeneous digital datasets

Page 3: Jane Greenberg, Professor and
Page 4: Jane Greenberg, Professor and

American Society of NaturalistsAmerican Naturalist

Ecological Society of AmericaEcology, Ecological Letters, Ecological Monographs, etc.

European Society for Evolutionary BiologyJournal of Evolutionary Biology

Society for Integrative and Comparative BiologyIntegrative and Comparative Biology

Society for Molecular Biology and EvolutionMolecular Biology and Evolution

Society for the Study of Evolution EvolutionSociety for Systematic Biology

Systematic BiologyCommercial journals

Molecular EcologyMolecular Phylogenetics and Evolution

Page 5: Jane Greenberg, Professor and
Page 6: Jane Greenberg, Professor and

Dyrad’s functional requirements: Support resource discovery and use; data interoperability; computer-

aided metadata generation, augmentation, and quality control; linking publications and underlying datasets; and data security (Dube et al, 2006, White et al 2008)

Dryad’s high-level metadata requirements Simple: depositor created metadata; automation of

metadata gen; discovery of heterogeneous datasets Interoperable: harvesting, cross-system searching Semantic Web Compatible: sustainable and adaptable

metadata architecture, supporting machine processing

Page 7: Jane Greenberg, Professor and

Digital data repositories ought to support immediate operational needs and long term goals to be sustainable. ~ a two-pronged approach~ a two-pronged approach

1. Sustainable in the evolving information environment (DCAM application profile)

2. Immediate need to make content available in Dspace (XML schema)

Page 8: Jane Greenberg, Professor and

SynthesisSynthesis

SharingSharing

DiscoveryDiscovery

PreservationPreservation

Page 9: Jane Greenberg, Professor and

Dublin Core based• Circumvents limitations of using a single scheme• Interoperable with other schemes• Why reinvent the wheel?Modular scheme:

1. Data package2. Journal citation3. Data object(Carrier, et al., 2007; White, et al, 2008; Greenberg, 2009; Greenberg et al; in press JLM, 2010)

Namespaces: 1.Dublin Core2.Data Documentation Initiative (DDI)3.Ecological Metadata Language (EML)4.PREMIS Data Dictionary for Preservation Metadata 5.Publishing Requirements for Industry Standard Metadata (PRISM)6.Darwin Core (DwC)7.Dryad

Page 10: Jane Greenberg, Professor and

Singapore Framework(machine processing, long term quality control Semantic web/linked data)

A loose standard for DC endorsed application profiles

Page 11: Jane Greenberg, Professor and

A “loose” standard for Dublin Core “endorsed” application profiles

Singapore framework provides guidelines for creating a DCAM-conformant Application Profile (“DC Application Profile”)

A packet of documentation which consists of:

1. Functional requirements (desirable)2. Domain model (mandatory)3. Description Set Profile (DSP) (mandatory)4. Usage guidelines (optional)5. Encoding syntax guidelines (optional)

Stuff we do any how…just better! ( JLM paper)

Page 12: Jane Greenberg, Professor and

DSP is “an information model and XML expression” (http://www.unc.edu/~scarrier/dryad/DSPLevelOneAppProfDraft.xml) Obligation (optional, mandatory) Non-literal (thing – philosophically – things in the real

world, known in different ways)http://purl.org/dc/elements/1.1/rights (mandatory), there

are different rightsSubject, creator, description…

Literals (strings): http://purl.org/dc/elements/1.1/identifier =

http://purl.org/dc/terms/URI,http://purl.org/dc/terms/available =

http://purl.org/dc/terms/W3CDTF

12

Toward the Semantic Web… linked data….

Potential for metadata and …data reuse in different contexts

Page 13: Jane Greenberg, Professor and

22/04/23 Titel (edit in slide master) 13

<AMG> approach for integrating discipline CVs Model addressing C V cost, interoperability, and usability constraints (interdisciplinary environment)

Building, Sharing, Evaluation the HIVE….

Page 14: Jane Greenberg, Professor and

Dryad development—reversed engineered long-term goals first did not have to hit the ground running

Commencement of data exchange plans DCAP work refocus XML schema

Harvesting EML records from LTER Network’s Metacat

Data exchange plans Dryad TreeBASE Discouraged …approaches not at odds

App. profile work instrumental in creating a sufficient XML schema

Renderings of the same intellectual work on a continuum

GRDDL (Gleaning Resource Descriptions from Dialects of Languages) – enabling technology to create RDF

Page 15: Jane Greenberg, Professor and

Positive aspects- Intellectually

engaging- Think we are making

a contribution, have to start somewhere…

- Machine capabilities- eScience/data

synthesis

Challenges- Infrastructure not all

there… (a lot is not in RDF)- Registered Dryad

“purl”- Proof of concept

difficult- Time consuming- Documentation

lacking

Page 16: Jane Greenberg, Professor and

Conclusions: Dryad development team has discovered that a two

prong hybrid philosophy works well in this environment, keeps legacy practices from impeding development

A bridge… Development has been consensus driven

biologists, computer scientists, librarians New models need to be explored No exploration, no discovery…unsuccessful results are Not always BAD DCAPDCAP

Semantic Semantic WebWeb~ linked ~ linked data…data…

Page 17: Jane Greenberg, Professor and

HIVE Advisory Board Jim Balhoff, NESCent Libby Dechman, LCSH Mike Frame, USGS Alistair Miles, CCLRC Rutherford

Appleton Laboratory William Moen, University of

North Texas Eva Méndez Rodríguez,

University Carlos III of Madrid Joseph Shubitowski, Getty

Research Institute Ed Summers, LCSH Barbara Tillett, Library of

Congress Kathy Wisser, UNC Chapel Hill Lisa Zolly, USGS

Vocabulary Partners Library of Congress: LCSH the Getty Research Institute

(GRI): TGN (Thesaurus of Geographic Names )

United States Geological Survey (USGS): NBII Thesaurus

Dryad TeamNESCEnt Hilmar Lapp

Assistant Director for Informatics Ryan Scherle

Data Repository Architect Todd Vision, Associate Director of

Informatics

UNC/SILS/MRC Jane Greenberg, Professor Lina Huang, MSIS Student,

Research Assistant Robert M. Losee, Professor Jose R. Pérez-Agüera, Clinical

Assistant Professor Hollie White, Metadata Research

Center Doctoral FellowCollaborators North Carolina State University University of New Mexico/LTER Yale University Partner journals and societies

Page 18: Jane Greenberg, Professor and

For Information Dryad

http://datadryad.org/ Dryad Wiki

https://www.nescent.org/wg_digitaldata/Main_Page Includes links to publications, the application

profile, and lists Dryad team members Metadata Research Center <MRC>

http://www.ils.unc.edu/mrc/ National Evolutionary Synthesis Center

(NESCent) http://www.nescent.org/index.php

The Dryad Data Repository

22/04/23 18