converting an existing taxonomic data resource to employ an ontology and lsids jessie kennedy rob...

20
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Upload: ross-martin

Post on 18-Jan-2018

224 views

Category:

Documents


0 download

DESCRIPTION

Re-using LSIDs  Using LSIDs per se will not address the issue of data sharing  Repositories must reuse LSIDs to cross reference data within and outwith their own repository.  It is important that we use the same LSID to refer to the same entity  If multiple LSIDs exist for the same entity we would be required to decide whether or not two LSIDs were really the same thing.  We would be in a similar situation as we are today,  for example, trying to decide if two taxonomic names are really the same.  Generating LSIDs for any self contained data set is a fairly trivial task  Appointing LSIDs to existing data from an authoritative repository to re-use them is more challenging.

TRANSCRIPT

Page 1: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Converting an Existing Taxonomic Data Resource to Employ an

Ontology and LSIDS

Jessie KennedyRob Gales, Robert Kukla

Page 2: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

IntroductionData sharing is fundamental to biodiversity and taxonomic data

applications, Previous attempts to facilitate sharing have had limited success

lack of take up of data exchange standards now slowly happening due to the TDWG standards initiative

the absence of a common terminology or vocabulary for use in taxonomic data

the lack of reference database systems for serving authoritative data Proposed new technologies

a Core Ontology for taxonomic data to model the biodiversity domain. Adoption of Life Science Identifiers (LSIDs) by the TDWG GUID group

for uniquely identifying taxonomic data objects, e.g specimens, names, concepts, etc.

LSIDs can make use of an Ontology to define the data to be returned Need a mechanism for migrating existing data to the new

technologies explore the issues in using LSIDs and RDF according to an Ontology.

Page 3: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Re-using LSIDsUsing LSIDs per se will not address the issue of data sharingRepositories must reuse LSIDs to cross reference data within

and outwith their own repository. It is important that we use the same LSID to refer to the same entity

If multiple LSIDs exist for the same entity we would be required to decide whether or not two LSIDs were really the same thing. We would be in a similar situation as we are today, for example, trying to decide if two taxonomic names are really the

same.Generating LSIDs for any self contained data set is a fairly

trivial task Appointing LSIDs to existing data from an authoritative

repository to re-use them is more challenging.

Page 4: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Project Overview Imagining the future

Assume have authority providers for certain data Publications, names etc e.g. IPNI, ZOObank, IF, Pubbank…

Want to Convert Existing Data repository Relational database

the Hexacorallians of the World Represent existing data as RDF triples Use LSIDs to uniquely identify entities in data

according to a domain ontology which extends TDGW core ontology Use LSIDs to cross reference between the data in the repository

Some LSIDs re-used from external sources Some LSIDs generated locally

Owned data

Development of a tool to aid the process of converting internal database keys to LSIDs aid users in appointing the appropriate LSID from some external LSID

authority.

Page 5: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Creating Domain OntologyDraft Core Ontology

Core and BDI ontology Classes and optional relationships between classes

Extend to Domain Ontology Domain classes inherit from the core classes Extended with additional classes

Re-use existing ontologies where possible Specify additional literal properties

Where necessary Straightforward for developer

For Hexacorallia data

Creating RDF triples Manual mapping of relational data to RDF triples according to OWL

specification Used wasabi mapping extensions & custom code for generation

Page 6: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Hexacorallian Database

SpecimenTriple Store

PublicationTriple Store

Concept Triple Store

NameTriple Store

Map+ AutoLSID

Map+ AutoLSID

Map+ AutoLSID

Map+ AutoLSID

PersonTriple Store

Map+ AutoLSID

Simulated Authority

Data providers

e .g . IPNI /Zoobank , Pubbank ,

Museum _specimens

Test Data set

Simulate Authority Providers

Generate LSID and RDF instances according to classes in the ontology

appropriate to each “authority”

Page 7: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

SpecimenTriple Store

PublicationTriple Store

Concept Triple Store

NameTriple

Store

Hexacorallia Thematic

Triple Store

ObservationTriple Store

LSID Observation subset

PersonTriple Store

Hexacorallia Thematic Provider

Map to ontology

Convert Existing Thematic Data Provider to use existing LSIDs and ontology

Match +->LSID

Authority (simulated )LSID Resolution

Services

LSID Match with linking tool

Match +->LSID

Match +->LSID

Match +->LSID

Match +->LSID

Convert Existing Provider

Original data repository RDF Data to be

updated with LSIDs from “authority”

providers

LinkerTool

Page 8: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Linking….WASABI Service Request Dispatcher

LSIDSPARQL OAI

WASABI Service Request Dispatcher

LSIDSPARQLLinker OAI

authoritative (“source”) provider & linker

local (“target”) provider

Linker Client

Hexacorallia Thematic

Triple Store

PersonTriple Store

Page 9: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Configure Provider for Update

Select class to be

linked

Name the local

repository

Page 10: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Linking….WASABI Service Request Dispatcher

LSIDSPARQL OAI

WASABI Service Request Dispatcher

LSIDSPARQLLinker OAI

authoritative (“source”) provider & linker

local (“target”) provider

Linker Client

Hexacorallia Thematic

Triple Store

PersonTriple Store

Page 11: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Configure the linkerSelect class to link on

Name authority provider

with linking service

Page 12: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Linking….WASABI Service Request Dispatcher

LSIDSPARQL OAI

WASABI Service Request Dispatcher

LSIDSPARQLLinker OAI

authoritative (“source”) provider & linker

local (“target”) provider

Linker Client

Hexacorallia Thematic

Triple Store

PersonTriple Store

Page 13: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Request Annotations

Page 14: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Wasabi

Linking Service

Data Cache Status Manager

RDF Handler

Linking Pipeline

RDF Status + Polling

Key

RDF Model

RDF Status + Polling

Key

Polling Key RDF Metadata

MetadataHandlerPoll Handler

Linker Bootstrap Ontology Cache

Linking Service…

Communication between linking

service and linking client

Page 15: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Application or Service

Ontology CacheLinker BootstrapExamine and cache

ontology

RDF

Internet

RDFOWL/RDFS

RDF Linking SuggestionsLinking Pipeline

Linkable Property Determiner

Post-weighting FilterWeighting

Filter(against application’s

dataset)

Linking ServiceDetermines

properties for matching

Weight possible matches

Return suggestions to the client

Page 16: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Confirm/Skip Annotations

Person to find LSID

for

Suggested match

Page 17: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Confirm/Skip Annotations

Person to find LSID

forChoice of possible persons with LSIDs

Page 18: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Research QuestionsHow effective is the draft ontology for representing existing data

sources? Can suitable extensions be easily defined?

Straight forward for developer Need independent verification…

What are the issues for an existing data provider to convert their data to using the ontology and LSIDs? Replace or annotate existing data

If, for example, I replace an author with a person LSID what I get when I resolve a person won’t likely be what I would have had when I had the data for an author.

Dependencies between LSID’able objects If you link via a taxon name LSID – the resolved name should have

embedded an LSID for a publication – so there shouldn’t be any need (in principal) to match publications for names

What about authorities that issues LSIDs but don’t map to other authorities e.g. name providers not mapping to either publication or specimen providers and don’t want to!

Page 19: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Research Questions…What support would a linking tool need to provide end users?

How would users want to process this data How much automation?

E.g. above a certain confidence level Would his be trusted? Order of matching

E.g. match all instances of persons at once Match of persons by publication?

Other Issues… Performance of existing linking tool approach

Lots of data passing going on Need better batch or one at a time

Finding authorities that provide linking services How do you find out about authorities with linking services? How do you know which ones to use?

Page 20: Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

Acknowledgements

TDWG/Gordon Betty Moore Foundation