Download - Liberating Laboratory Data - Eureka
Eureka Research Workbench:Semantic Capture of the
Scientific Process
Stuart J. ChalkDepartment of ChemistryUniversity of North Florida
Jacksonville, FL [email protected]
Liberating Laboratory Data – Day 2
Data is a fundamental output of science, but… Data is not useful if it does not have context Big data analytics needs detailed, well structured
metadata and relationships to assemble aggregated datasets for useful interpretation
Options LabArchives http://www.labarchives.com eCAT
http://www.researchspace.com/electronic-lab-notebook/ LabTrove http://www.labtrove.org/ Dryad data publishing http://datadryad.org/ or …
Capturing Science Data
Started in 2006 as an offshoot of getting involved in the Analytical Information Markup Language (AnIML) project
No way to store all research notes in a digital format No way to capture the workflow of scientists Realized writing in a lab notebook is equivalent to
“multi-type” blogging in the digital world How to capture information? Many datatypes -> ExptML How to store files and make them available through web
interface? (Fedora-Commons) How to link data together? RDF (in Fedora-Commons)
Eureka Research Workbench
A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net)
Many datatypes (will expand…)
Experiment Markup Language (ExptML)
Sample Solution Space Specimen Substance Task Template Timeline User Vendor
Annotation Api Calculation Chemical Citation Communication Customer Data Dataset Definition
Element Equipment Event Experiment Group Project Protocol Quote Report Result
ExptML Chemical Schema
ExptML Chemical Schema
ExptML Chemical Instance
In computer science and ontology“formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to model a domain and support reasoning about concepts.”*
In essence, an ontology allows us to define the relationships and assertions about concepts
For substances represented in ExptML we define isSubstance (assertion) hasSubstance isSubstanceOf
Related Data - ExptML Ontology
*https://en.wikipedia.org/wiki/Ontology_(information_science)
ExptML Ontology
Digital repository software for creating and managing online digital libraries
Stores the ExptML files Stores any other files (PDFs, Images, Word
etc.) Stores relationships as RDF
Version control Checksumming Built in search of content and relationships
Fedora Commons
Fedora-Commons treats each ExptML file as an object
In the definition of a fedora object the file is just one stream of many. By default each object also has a “DC” stream of metadata and an “RELS-EXT” stream of relationships
Each Fedora object can have any number of additional streams for Paper PDFs, product/sample pictures, original file
formats (if a conversion has been done) Video, audio, anything
You can export individual streams or the whole Fedora object with streams binary encoded (Sharing/archiving)
File Storage
File Storage
So, finally to the Eureka Research Workbench! Web interface written in PHP using the CakePHP
Framework Communicates with Fedora-Commons API to
create, retrieve, update and delete (CRUD) ExptML and other files
Representational State Transfer (REST) format for URLs E.g. http://web.server/chemicals/view/exptml:chm1
Allows for searching of all files in Fedora Can also search based on relationships Can extract data out of XML files Can gather data from other websites (via API
controller) and add it to ExptML files
Eureka Interface
Eureka Website – NotebookTy
pic
al th
ings
we r
eco
rdin
our
note
book
Eureka uses ExptML for representing science data Reliable storage system for ExptML files (Fedora) Method for storage of relationships (RDF in Fedora) Web application to create ExptML files (Eureka) TODO
Provide web functionality to process data Provide mechanism for sharing of data (authenticated) Integration into the RDA model for sharing research data Integrate with many other websites, e.g. ChemSpider Support enlItemManifest and future RDA specifications
Conclusion
References
Eureka – http://sourceforge.net/projects/eureka
Fedora-Commons – http://fedora-commons.org
XML – http://www.w3.org/standards/xml ExptML – http://exptml.sourceforge.net/ JSON – http://www.json.org/ UnitsML – http://unitsml.nist.gov/ RDF – http://www.w3.org/RDF/ CIR – http://cactus.nci.nih.gov/chemical/
structure RDA – http://rd-alliance.org