bio2rdf/virtuoso

17
Bio2RDF linked data about Human and Mouse genome François Belleau (Bio2RDF project) Kingsley Idehen (OpenLink Software)

Upload: francois-belleau

Post on 02-Jul-2015

1.972 views

Category:

Technology


0 download

DESCRIPTION

A presentation about the potential of SPARQL querying using Virtuoso over 65 millions triples about human and mouse genome from http://bio2rdf.org.

TRANSCRIPT

Page 1: Bio2RDF/Virtuoso

Bio2RDF linked data about Human and Mouse genome

François Belleau (Bio2RDF project)Kingsley Idehen (OpenLink Software)

Page 2: Bio2RDF/Virtuoso

Bioinformatics creepy world of data

from Carole Goble presentation at ISWC2005Using the Semantic Web for e-Science: inspiration, incubation, irritationhttp://iswc2005.semanticweb.org/keynoteabstracts.html

Page 3: Bio2RDF/Virtuoso

http://bio2rdf.org● Bio2RDF project goal is to make as much

boinformatics data available to the scientific community in semantic web RDF format.

● The main service of Bio2RDF server is to give access to millions of well formed RDF document, extracted from public databases, with normalized URI identified by a derefencable URL.

● Our vision is to use the semantic web paradigm to realize data integration in bioinformatics.

Page 4: Bio2RDF/Virtuoso

HTML document about Paget disease

Page 5: Bio2RDF/Virtuoso

HTML to RDF conversion : the art of rdfizing

http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=602080

http://bio2rdf.org/omim:602080

Page 6: Bio2RDF/Virtuoso

Paget page seen with Tabulator

Page 7: Bio2RDF/Virtuoso

Bio2RDF URI syntax● http://bio2rdf.org/public_namespace:private_id● For example :

– http://bio2rdf.org/omim:602080– http://bio2rdf.org/go:0032283– http://bio2rdf.org/geneid:15275

Page 8: Bio2RDF/Virtuoso

● #1 : URI are normalized and dereferencable● #2 : Authoritative public namespace are used● #3 : Mandatory predicates are used● #4 : Resource predicate are prefixed with an "x"● #5 : Blank nodes are forbidden● #6 : RDFizer program are made available

according to the GNU licence for open source● #7 : Deferenceable ontologies

Banff Manifesto rules of thumb used to design RDF document from existing web page

Page 9: Bio2RDF/Virtuoso

So what, just another look.

Not really, with Virtuoso if you can browse RDF from derefencable URI,

you can query the web with SPARQL !

Page 10: Bio2RDF/Virtuoso

A first query

Page 11: Bio2RDF/Virtuoso

Bio2RDF semantic mashup of 65 millions triples was build from 30 differents sources, each node is

a bioinformatic public database

Page 12: Bio2RDF/Virtuoso

Let's ask this 65 millions nodes graph a real question :

Which caracteristics of protein were assigned to genes involved in Paget disease ?

Page 13: Bio2RDF/Virtuoso

● Three databases are needed to answer this question :● http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim● http://www.pir.uniprot.org/● http://www.geneontology.org/

Page 14: Bio2RDF/Virtuoso

The SPARQL query submitted to Virtuoso

SELECT ?lab1, ?sub, ?lab3WHERE {<http://bio2rdf.org/omim:602080>

<http://bio2rdf.org/bio2rdf#xGeneticDisorder> ?omim .?omim <http://www.w3.org/2002/07/owl#sameAs> ?mim .?omim <http://www.w3.org/2000/01/rdf-schema#label> ?lab1 .?sub ?p ?mim .?sub <http://www.w3.org/2000/01/rdf-schema#label> ?lab2 .?sub <http://bio2rdf.org/uniprot:classifiedWith> ?clas .?clas <http://www.w3.org/2000/01/rdf-schema#label> ?lab3 .}

Page 15: Bio2RDF/Virtuoso

This knowledge map represents links that could have been visited by the last query.

Results are almost instantaneous.

Page 16: Bio2RDF/Virtuoso

Try it ...● We invite you to discover the potential of

bioinformatics linked data queried with SPARQL using Virtuoso.

● Install your own copy of Virtuoso server and download our Bio2RDF data about human and mouse genome.

Page 17: Bio2RDF/Virtuoso

Thank you● Virtuoso Open-Source Edition

– http://virtuoso.openlinksw.com/wiki/main/● Bio2RDF web site

– http://www.bio2rdf.org/● Bio2RDF download page

– http://www.bio2rdf.org/download– http://sourceforge.net/projects/bio2rdf/