sharing between data repositories

21
Kevin S. Clarke [email protected] Thanks to the Dryad Data Repository contributors and funders: Ryan Scherle, Todd J. Vision, Hilmar Lapp (NESCent) Ben Bosman, Mark Diggory, Kevin Van de Velde (@mire, Inc.) Sharing Between Data Repositories NESCent

Upload: kevin-clarke

Post on 21-Dec-2014

1.120 views

Category:

Technology


3 download

DESCRIPTION

Dryad is a generic subject repository that shares author submitted data with other scientific repositories. In a part "how we done it" and part "things to consider" talk, I'll discuss 1) why we chose BagIt and OAI-ORE as mechanisms for sharing our data, 2) how we've integrated with TreeBASE -- a subject repository of phylogenetic information), and 3) the possibility of this method of data sharing being adopted by other repositories within the larger DataONE community. There will be cake.

TRANSCRIPT

Page 1: Sharing Between Data Repositories

Kevin S. [email protected]

Thanks to the Dryad Data Repository contributors and funders:

Ryan Scherle, Todd J. Vision, Hilmar Lapp (NESCent)Ben Bosman, Mark Diggory, Kevin Van de Velde (@mire, Inc.)

Sharing Between Data Repositories

NESCent

Page 2: Sharing Between Data Repositories

The Bio-ReposphereThe Bio-Reposphere

(Generic Subject Repository)

(Subject Specific Repository)

(General Scholarly Repository)

Page 3: Sharing Between Data Repositories

Generic vs. Specific ReposGeneric vs. Specific Repos

✔ Easy submission✔ Simple metadata✔ Data is a “black box”✔ No “orphaned” data

✔ Complex submission✔ More useful metadata✔ Well structured data✔ Specific type of data

Page 4: Sharing Between Data Repositories

A Dryad Data PackageA Dryad Data Package

Page 5: Sharing Between Data Repositories

One Possible WorkflowOne Possible Workflow

Page 6: Sharing Between Data Repositories

““Save the Time of the User” #1Save the Time of the User” #1

Page 7: Sharing Between Data Repositories

““Save the Time of the User” #2Save the Time of the User” #2

Page 8: Sharing Between Data Repositories

Three Simple StepsThree Simple Steps

Page 9: Sharing Between Data Repositories

Case 1: TreeBASE Data ImportCase 1: TreeBASE Data Import

Page 10: Sharing Between Data Repositories

Harvesting and Web ServicesHarvesting and Web Services

OAI-PMH

PhyloWS

Page 11: Sharing Between Data Repositories

Case 2: Data Uploaded to DryadCase 2: Data Uploaded to Dryad

Page 12: Sharing Between Data Repositories

Partner Repository UploadPartner Repository Upload

Page 13: Sharing Between Data Repositories

BagIt DisseminatorBagIt Disseminator(implements DSpace PackageDisseminator) (implements DSpace PackageDisseminator)

DSpaceMetadata

XSLTCrosswalk

Dryad Application Profile

DryadData

Package

DryadPublication

DryadData File

DryadData File

DryadData File

DatafromDSpace

Bag

Page 14: Sharing Between Data Repositories

A BagIt BagA BagIt Bag

data

bag-info.txt

bagit.txt

manifest-md5.txt tagmanifest-md5.txt

Page 15: Sharing Between Data Repositories

Dryad Data in the BagDryad Data in the Bag

dryadpkg.xml

dryadpub.xml

ApineDNA.nexusdryadfile-2.xml

ApineCYTB.nexusdryadfile-1.xml

datafile-2

datafile-1

Page 16: Sharing Between Data Repositories

HTTP PUT HandshakeHTTP PUT Handshake

BagIt Upload

Email

TreeBASE URL

Page 17: Sharing Between Data Repositories

Lessons LearnedLessons Learned

✔ Just enough to get the job done and no more

✔ Less local conventions and more “standards”

✔ There will always be custom solutions

✔ Options are developing quickly in this space

Page 18: Sharing Between Data Repositories

Future DirectionsFuture Directions

Less reliance on local conventions✔ Plan to use OAI-ORE and Pairtree(s) within BagIt

OAI-ORE: Because it's Linked Data

Pairtree Filesystem✔ So we can dereference URIs in ORE Resource Maps http://dx.doi.org/10.5061/dryad.8343

URI prefix: http://dx.doi.org/10.5061/dryad. Path: 83/43 83/43/Arctostaphylos.nex

Page 19: Sharing Between Data Repositories

Other Interesting DevelopmentsOther Interesting Developments

DataONE✔ Distributing data files and metadata✔ May support packages in the future

“Dropbox of Bags” or Bag replication network (BagNet?)

METS in Bags (in contrast to ORE)

Page 20: Sharing Between Data Repositories

The EndThe End

The cake was a lie

Page 21: Sharing Between Data Repositories

ReferencesDryad Code http://dryad.googlecode.com

Dryad Data Repository http://datadryad.org

BagIt http://en.wikipedia.org/wiki/BagIt

OAI-ORE Primer http://www.openarchives.org/ore/1.0/primer

OAI-ORE in BagIt http://groups.google.com/group/oai-ore/browse_thread/thread/3ebfa7fcb4588048

ADMIRAL Data Packages (Planning ORE in BagIt) http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_data_packages

DSpace Packagers https://wiki.duraspace.org/display/DSPACE/PackagerPlugins