converting an existing taxonomic data resource to employ an ontology and lsids jessie kennedy rob...
DESCRIPTION
Re-using LSIDs Using LSIDs per se will not address the issue of data sharing Repositories must reuse LSIDs to cross reference data within and outwith their own repository. It is important that we use the same LSID to refer to the same entity If multiple LSIDs exist for the same entity we would be required to decide whether or not two LSIDs were really the same thing. We would be in a similar situation as we are today, for example, trying to decide if two taxonomic names are really the same. Generating LSIDs for any self contained data set is a fairly trivial task Appointing LSIDs to existing data from an authoritative repository to re-use them is more challenging.TRANSCRIPT
Converting an Existing Taxonomic Data Resource to Employ an
Ontology and LSIDS
Jessie KennedyRob Gales, Robert Kukla
IntroductionData sharing is fundamental to biodiversity and taxonomic data
applications, Previous attempts to facilitate sharing have had limited success
lack of take up of data exchange standards now slowly happening due to the TDWG standards initiative
the absence of a common terminology or vocabulary for use in taxonomic data
the lack of reference database systems for serving authoritative data Proposed new technologies
a Core Ontology for taxonomic data to model the biodiversity domain. Adoption of Life Science Identifiers (LSIDs) by the TDWG GUID group
for uniquely identifying taxonomic data objects, e.g specimens, names, concepts, etc.
LSIDs can make use of an Ontology to define the data to be returned Need a mechanism for migrating existing data to the new
technologies explore the issues in using LSIDs and RDF according to an Ontology.
Re-using LSIDsUsing LSIDs per se will not address the issue of data sharingRepositories must reuse LSIDs to cross reference data within
and outwith their own repository. It is important that we use the same LSID to refer to the same entity
If multiple LSIDs exist for the same entity we would be required to decide whether or not two LSIDs were really the same thing. We would be in a similar situation as we are today, for example, trying to decide if two taxonomic names are really the
same.Generating LSIDs for any self contained data set is a fairly
trivial task Appointing LSIDs to existing data from an authoritative
repository to re-use them is more challenging.
Project Overview Imagining the future
Assume have authority providers for certain data Publications, names etc e.g. IPNI, ZOObank, IF, Pubbank…
Want to Convert Existing Data repository Relational database
the Hexacorallians of the World Represent existing data as RDF triples Use LSIDs to uniquely identify entities in data
according to a domain ontology which extends TDGW core ontology Use LSIDs to cross reference between the data in the repository
Some LSIDs re-used from external sources Some LSIDs generated locally
Owned data
Development of a tool to aid the process of converting internal database keys to LSIDs aid users in appointing the appropriate LSID from some external LSID
authority.
Creating Domain OntologyDraft Core Ontology
Core and BDI ontology Classes and optional relationships between classes
Extend to Domain Ontology Domain classes inherit from the core classes Extended with additional classes
Re-use existing ontologies where possible Specify additional literal properties
Where necessary Straightforward for developer
For Hexacorallia data
Creating RDF triples Manual mapping of relational data to RDF triples according to OWL
specification Used wasabi mapping extensions & custom code for generation
Hexacorallian Database
SpecimenTriple Store
PublicationTriple Store
Concept Triple Store
NameTriple Store
Map+ AutoLSID
Map+ AutoLSID
Map+ AutoLSID
Map+ AutoLSID
PersonTriple Store
Map+ AutoLSID
Simulated Authority
Data providers
e .g . IPNI /Zoobank , Pubbank ,
Museum _specimens
Test Data set
Simulate Authority Providers
Generate LSID and RDF instances according to classes in the ontology
appropriate to each “authority”
SpecimenTriple Store
PublicationTriple Store
Concept Triple Store
NameTriple
Store
Hexacorallia Thematic
Triple Store
ObservationTriple Store
LSID Observation subset
PersonTriple Store
Hexacorallia Thematic Provider
Map to ontology
Convert Existing Thematic Data Provider to use existing LSIDs and ontology
Match +->LSID
Authority (simulated )LSID Resolution
Services
LSID Match with linking tool
Match +->LSID
Match +->LSID
Match +->LSID
Match +->LSID
Convert Existing Provider
Original data repository RDF Data to be
updated with LSIDs from “authority”
providers
LinkerTool
Linking….WASABI Service Request Dispatcher
LSIDSPARQL OAI
WASABI Service Request Dispatcher
LSIDSPARQLLinker OAI
authoritative (“source”) provider & linker
local (“target”) provider
Linker Client
Hexacorallia Thematic
Triple Store
PersonTriple Store
Configure Provider for Update
Select class to be
linked
Name the local
repository
Linking….WASABI Service Request Dispatcher
LSIDSPARQL OAI
WASABI Service Request Dispatcher
LSIDSPARQLLinker OAI
authoritative (“source”) provider & linker
local (“target”) provider
Linker Client
Hexacorallia Thematic
Triple Store
PersonTriple Store
Configure the linkerSelect class to link on
Name authority provider
with linking service
Linking….WASABI Service Request Dispatcher
LSIDSPARQL OAI
WASABI Service Request Dispatcher
LSIDSPARQLLinker OAI
authoritative (“source”) provider & linker
local (“target”) provider
Linker Client
Hexacorallia Thematic
Triple Store
PersonTriple Store
Request Annotations
Wasabi
Linking Service
Data Cache Status Manager
RDF Handler
Linking Pipeline
RDF Status + Polling
Key
RDF Model
RDF Status + Polling
Key
Polling Key RDF Metadata
MetadataHandlerPoll Handler
Linker Bootstrap Ontology Cache
Linking Service…
Communication between linking
service and linking client
Application or Service
Ontology CacheLinker BootstrapExamine and cache
ontology
RDF
Internet
RDFOWL/RDFS
RDF Linking SuggestionsLinking Pipeline
Linkable Property Determiner
Post-weighting FilterWeighting
Filter(against application’s
dataset)
Linking ServiceDetermines
properties for matching
Weight possible matches
Return suggestions to the client
Confirm/Skip Annotations
Person to find LSID
for
Suggested match
Confirm/Skip Annotations
Person to find LSID
forChoice of possible persons with LSIDs
Research QuestionsHow effective is the draft ontology for representing existing data
sources? Can suitable extensions be easily defined?
Straight forward for developer Need independent verification…
What are the issues for an existing data provider to convert their data to using the ontology and LSIDs? Replace or annotate existing data
If, for example, I replace an author with a person LSID what I get when I resolve a person won’t likely be what I would have had when I had the data for an author.
Dependencies between LSID’able objects If you link via a taxon name LSID – the resolved name should have
embedded an LSID for a publication – so there shouldn’t be any need (in principal) to match publications for names
What about authorities that issues LSIDs but don’t map to other authorities e.g. name providers not mapping to either publication or specimen providers and don’t want to!
Research Questions…What support would a linking tool need to provide end users?
How would users want to process this data How much automation?
E.g. above a certain confidence level Would his be trusted? Order of matching
E.g. match all instances of persons at once Match of persons by publication?
Other Issues… Performance of existing linking tool approach
Lots of data passing going on Need better batch or one at a time
Finding authorities that provide linking services How do you find out about authorities with linking services? How do you know which ones to use?
Acknowledgements
TDWG/Gordon Betty Moore Foundation