10-15-13 “metadata and repository services for research data curation” presentation slides
DESCRIPTION
“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation” Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.TRANSCRIPT
![Page 1: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/1.jpg)
October 15, 2014 Hot Topics: DuraSpace Community Webinar Series
Hot Topics: The DuraSpace Community Webinar Series
Series Six: “Research Data in Repositories”
Curated by David Minor
![Page 2: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/2.jpg)
October 15, 2013 Hot Topics: DuraSpace Community Webinar Series
Webinar 2: Metadata & Repository Services for Research Data Curation
Presented by: Declan Fleming, Chief Technology Strategist, UC San Diego Library Matt Critchlow, Manager of Development and Web Services, UC San Diego Library Arwen Hutt, Metadata Librarian, UC San Diego Library
![Page 3: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/3.jpg)
Hot Topics Web Seminar Series: Research Data in Repositories
The UC San Diego Experience Second Webinar: Metadata and Repository Services
for Research Data Curation
![Page 4: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/4.jpg)
General Series Intro
• First webinar: Intro and Framing: UC San Diego decisions and planning
• Second Webinar: Deep dive into technology and metadata
• Third Webinar: The perspective from researchers, next steps
![Page 5: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/5.jpg)
Your esteemed presenters …
First webinar: David Minor – Program Director, Research Data Curation Declan Fleming - Chief Technology Strategist
Second webinar: Declan Fleming - Chief Technology Strategist Arwen Hutt - Metadata Librarian Matt Critchlow - Manager of Development and Web Services
Third webinar: Dick Norris – Professor, Scripps Institution of Oceanography Rick Wagner – Data Scientist at San Diego Supercomputer Center
![Page 6: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/6.jpg)
Today we will …
• Discuss real-world researcher interaction
• Document how metadata and files combine to make digital objects
• Describe the DAMS data model and how it supports complex research objects
• Detail the technology driving the DAMS
• Point to the future
![Page 7: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/7.jpg)
Working with Researchers: Pilots
• The Brain Observatory
• NSF OpenTopography Facility
• Levantine Archaeology Laboratory • Scripps Institute of Oceanography
Geological Collections
• The Laboratory for Computational
Astrophysics
![Page 8: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/8.jpg)
Working with Researchers: Process
• Introductory meeting • Metadata point person • Ongoing discussions • One on one work
Iterative, collaborative, customized, experimental…pilot!
![Page 9: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/9.jpg)
Working with Researchers: Data management
• Collocation • Clean up • Identifiers • Metadata
![Page 10: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/10.jpg)
Working with Researchers: What is an object?
• What are the boundaries on a discreet set or subset of data? What is required to make the data intelligible, usable and reusable?
• What needs to be preserved? • What do they want to display and/or share? • What do they want to be able to refer to or
cite?
![Page 11: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/11.jpg)
Working with Researchers: What is an object?
Slice
Etc…
or
Brain
Artifact
Site
or
![Page 12: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/12.jpg)
Working with Researchers: Take Aways
They are the subject experts
There are a lot of broad level similarities
But no such thing as one size fits all
![Page 13: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/13.jpg)
We want a new data model…
• One that is flexible and accommodates disparate metadata from a variety of sources
• While promoting consistency within the data store • One that supports relationships within and between
objects • One that is more community engaged, both sharing
vocabularies and technology, and utilizing others shared vocabularies and technologies
• One that supports improved management of objects and metadata
![Page 14: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/14.jpg)
DAMS Data Model Development Process
• Five people, in a room, 16 hours a week for 4 months
• Worked through existing data, use case scenarios, known data requirements, investigated known ontologies, etc.
• Lots and lots and lots of discussion • Utilizes MADS (Metadata Authority Description
Schema) • Results = a data dictionary and an OWL ontology • Living document
![Page 15: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/15.jpg)
DAMS Data Model: Flexibility
• The data model provides enough flexibility that we can accommodate a wide variety of data within the schema – Vocabularies – Use of “types” or “display labels” to distinguish
specific subtypes of a data field – Flexible structures and relationships – Extensible
![Page 16: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/16.jpg)
DAMS Data Model: Consistency
• But enough consistency that searching and display rules do not need to be customized for each individual collection of material – Rules can be applied at the level of the broader
concept • As well as establishing the organizational
structure necessary for maintaining consistency over time – Evaluation and approval of modifications
![Page 17: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/17.jpg)
DAMS Data Model: Relationships
• It allows us to create a number of different relationships – Collections and sub-collections – Collections and objects – Objects and components
(complex hierarchical objects) – Other related resources internal
or external to the DAMS
complex object example
![Page 18: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/18.jpg)
DAMS Data Model: Vocabularies
• Allow management of local & community vocabularies – Vocabulary terms as entities – Ability to encode authority data (vocabulary
source, value uri, etc.) as well as sameAs relationships between the same term expressed in multiple sources
– Ability to update authority records as community vocabularies become more formalized.
![Page 19: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/19.jpg)
DAMS Data Model: Management
• One that supports improved management of objects and metadata – Authority management of vocabulary terms – Event metadata!
![Page 20: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/20.jpg)
DAMS Architecture
![Page 21: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/21.jpg)
Preservation: Chronopolis
Current DAMS Process 1. Create Bagit bags for all objects 2. Host via HTTP(S) 3. Bags are retrieved and ingested into Chronopolis DAMS4 Process 1. Create Bagit bags for Δ objects using Event metadata 2. Host via HTTP(S) or enqueue on messaging queue for
ingestion
![Page 22: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/22.jpg)
Storage
![Page 23: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/23.jpg)
Storage: EMC Isilon 72NL
Storage For Library Collections 1 cluster of 5 Nodes 1 Node = 36 x 2TB Drives Total Current Usable Storage of 320TB OneFS 7.0.2.1
![Page 24: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/24.jpg)
Storage: OpenStack
Storage For Research Data Collections Testing: • Performance versus Local Storage • Large Files (up to 1TB)
– Segmenting files > 5GB – Lexical order bug fix: 1,10,2 -> 0001,0002,…0010
• Rackspace CloudFiles API VS OpenStack REST API Testing Notes: https://libraries.ucsd.edu/blogs/dams/openstack-testing-notes/
![Page 25: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/25.jpg)
DAMS Repository
![Page 26: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/26.jpg)
DAMS Repository
Core Repository Application: Create, Read, Update, Delete (CRUD) Uses: Jena, ActiveMQ, JHOVE, Apache Tika, FFMPEG, ImageMagick Manages: • Metadata Triplestore • Storage • Solr
![Page 27: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/27.jpg)
DAMS Repository: Metadata Triplestore
![Page 28: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/28.jpg)
DAMS Repository: Metadata Triplestore
Triplestore was: Allegrograph Triplestore is: PostgresSQL DB + Jena • Schema: (ID), Parent, Subject, Predicate, Object Jena Usage: • Core/RDF API – Parsing, loading, updating, serializing RDF • ARQ API – SPARQL queries
![Page 29: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/29.jpg)
DAMS Repository: REST API
![Page 30: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/30.jpg)
Hydra Framework
Source: https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+Parts
![Page 31: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/31.jpg)
DAMS Repository: Fedora API-ish
![Page 32: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/32.jpg)
Fedora API – Next PID
![Page 33: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/33.jpg)
Fedora API – Next PID
![Page 34: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/34.jpg)
DAMS Manager
![Page 35: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/35.jpg)
DAMS Manager
Java application using Spring MVC framework • Collection Management
– Metadata Ingest and Export – File Ingest – Derivative Generation – Solr indexing by Collection
• Administrative Reporting and Statistics
![Page 36: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/36.jpg)
DAMS Hydra Head
![Page 37: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/37.jpg)
DAMS Hydra Head
![Page 38: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/38.jpg)
DAMS Hydra Head: Blacklight
![Page 39: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/39.jpg)
RDF in Hydra
![Page 40: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/40.jpg)
RDF in Hydra: (Read) Nested Attributes
![Page 41: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/41.jpg)
RDF in Hydra: (Create) Nested Attributes
![Page 42: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/42.jpg)
DAMS Hydra Head: Complex Objects
![Page 43: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/43.jpg)
Next Steps
Beta Release: Late October Production Release: January Future: • Sufia/Curate Integration for administrative functionality • Additional Linked Data Integration and Crosswalks
– Schema.org, OpenURL, Dublin Core, ResourceSync
• Fedora4
![Page 44: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/44.jpg)
More Information
DAMS Overview https://github.com/ucsdlib/dams/wiki/DAMS-Manual DAMS Hydra Head https://github.com/ucsdlib/damspas DAMS Ontology https://github.com/ucsdlib/dams/tree/master/ontology DAMS REST API https://github.com/ucsdlib/dams/wiki/REST-API Hot Topics Series 3: Get a Head on the Repository with Hydra http://duraspace.org/hot-topics Hydra Technical Overview https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+Parts OneFS Technical Overview http://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf Isilon Overview http://www.emc.com/collateral/software/data-sheet/h10541-ds-isilon-platform.pdf
![Page 45: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/45.jpg)
Coming Up Next
Final Webinar (October 31) The researcher perspective from two of our pilot participants Dick Norris – Professor, Scripps Institution of Oceanography Rick Wagner – Data Scientist at San Diego Supercomputer Center
![Page 46: 10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides](https://reader034.vdocuments.site/reader034/viewer/2022051818/54945127b47959604d8b4ac4/html5/thumbnails/46.jpg)
Questions?
Thanks! Declan Fleming @declan | [email protected] Arwen Hutt @arwenh | [email protected] Matt Critchlow @mattcritchlow | [email protected]