sharing between data repositories

Post on 21-Dec-2014

1.120 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Dryad is a generic subject repository that shares author submitted data with other scientific repositories. In a part "how we done it" and part "things to consider" talk, I'll discuss 1) why we chose BagIt and OAI-ORE as mechanisms for sharing our data, 2) how we've integrated with TreeBASE -- a subject repository of phylogenetic information), and 3) the possibility of this method of data sharing being adopted by other repositories within the larger DataONE community. There will be cake.

TRANSCRIPT

Kevin S. Clarkeksclarke@nescent.org

Thanks to the Dryad Data Repository contributors and funders:

Ryan Scherle, Todd J. Vision, Hilmar Lapp (NESCent)Ben Bosman, Mark Diggory, Kevin Van de Velde (@mire, Inc.)

Sharing Between Data Repositories

NESCent

The Bio-ReposphereThe Bio-Reposphere

(Generic Subject Repository)

(Subject Specific Repository)

(General Scholarly Repository)

Generic vs. Specific ReposGeneric vs. Specific Repos

✔ Easy submission✔ Simple metadata✔ Data is a “black box”✔ No “orphaned” data

✔ Complex submission✔ More useful metadata✔ Well structured data✔ Specific type of data

A Dryad Data PackageA Dryad Data Package

One Possible WorkflowOne Possible Workflow

““Save the Time of the User” #1Save the Time of the User” #1

““Save the Time of the User” #2Save the Time of the User” #2

Three Simple StepsThree Simple Steps

Case 1: TreeBASE Data ImportCase 1: TreeBASE Data Import

Harvesting and Web ServicesHarvesting and Web Services

OAI-PMH

PhyloWS

Case 2: Data Uploaded to DryadCase 2: Data Uploaded to Dryad

Partner Repository UploadPartner Repository Upload

BagIt DisseminatorBagIt Disseminator(implements DSpace PackageDisseminator) (implements DSpace PackageDisseminator)

DSpaceMetadata

XSLTCrosswalk

Dryad Application Profile

DryadData

Package

DryadPublication

DryadData File

DryadData File

DryadData File

DatafromDSpace

Bag

A BagIt BagA BagIt Bag

data

bag-info.txt

bagit.txt

manifest-md5.txt tagmanifest-md5.txt

Dryad Data in the BagDryad Data in the Bag

dryadpkg.xml

dryadpub.xml

ApineDNA.nexusdryadfile-2.xml

ApineCYTB.nexusdryadfile-1.xml

datafile-2

datafile-1

HTTP PUT HandshakeHTTP PUT Handshake

BagIt Upload

Email

TreeBASE URL

Lessons LearnedLessons Learned

✔ Just enough to get the job done and no more

✔ Less local conventions and more “standards”

✔ There will always be custom solutions

✔ Options are developing quickly in this space

Future DirectionsFuture Directions

Less reliance on local conventions✔ Plan to use OAI-ORE and Pairtree(s) within BagIt

OAI-ORE: Because it's Linked Data

Pairtree Filesystem✔ So we can dereference URIs in ORE Resource Maps http://dx.doi.org/10.5061/dryad.8343

URI prefix: http://dx.doi.org/10.5061/dryad. Path: 83/43 83/43/Arctostaphylos.nex

Other Interesting DevelopmentsOther Interesting Developments

DataONE✔ Distributing data files and metadata✔ May support packages in the future

“Dropbox of Bags” or Bag replication network (BagNet?)

METS in Bags (in contrast to ORE)

The EndThe End

The cake was a lie

ReferencesDryad Code http://dryad.googlecode.com

Dryad Data Repository http://datadryad.org

BagIt http://en.wikipedia.org/wiki/BagIt

OAI-ORE Primer http://www.openarchives.org/ore/1.0/primer

OAI-ORE in BagIt http://groups.google.com/group/oai-ore/browse_thread/thread/3ebfa7fcb4588048

ADMIRAL Data Packages (Planning ORE in BagIt) http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_data_packages

DSpace Packagers https://wiki.duraspace.org/display/DSPACE/PackagerPlugins

top related