bio2rdf/virtuoso
DESCRIPTION
A presentation about the potential of SPARQL querying using Virtuoso over 65 millions triples about human and mouse genome from http://bio2rdf.org.TRANSCRIPT
![Page 1: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/1.jpg)
Bio2RDF linked data about Human and Mouse genome
François Belleau (Bio2RDF project)Kingsley Idehen (OpenLink Software)
![Page 2: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/2.jpg)
Bioinformatics creepy world of data
from Carole Goble presentation at ISWC2005Using the Semantic Web for e-Science: inspiration, incubation, irritationhttp://iswc2005.semanticweb.org/keynoteabstracts.html
![Page 3: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/3.jpg)
http://bio2rdf.org● Bio2RDF project goal is to make as much
boinformatics data available to the scientific community in semantic web RDF format.
● The main service of Bio2RDF server is to give access to millions of well formed RDF document, extracted from public databases, with normalized URI identified by a derefencable URL.
● Our vision is to use the semantic web paradigm to realize data integration in bioinformatics.
![Page 4: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/4.jpg)
HTML document about Paget disease
![Page 5: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/5.jpg)
HTML to RDF conversion : the art of rdfizing
http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=602080
http://bio2rdf.org/omim:602080
![Page 6: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/6.jpg)
Paget page seen with Tabulator
![Page 7: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/7.jpg)
Bio2RDF URI syntax● http://bio2rdf.org/public_namespace:private_id● For example :
– http://bio2rdf.org/omim:602080– http://bio2rdf.org/go:0032283– http://bio2rdf.org/geneid:15275
![Page 8: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/8.jpg)
● #1 : URI are normalized and dereferencable● #2 : Authoritative public namespace are used● #3 : Mandatory predicates are used● #4 : Resource predicate are prefixed with an "x"● #5 : Blank nodes are forbidden● #6 : RDFizer program are made available
according to the GNU licence for open source● #7 : Deferenceable ontologies
Banff Manifesto rules of thumb used to design RDF document from existing web page
![Page 9: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/9.jpg)
So what, just another look.
Not really, with Virtuoso if you can browse RDF from derefencable URI,
you can query the web with SPARQL !
![Page 10: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/10.jpg)
A first query
![Page 11: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/11.jpg)
Bio2RDF semantic mashup of 65 millions triples was build from 30 differents sources, each node is
a bioinformatic public database
![Page 12: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/12.jpg)
Let's ask this 65 millions nodes graph a real question :
Which caracteristics of protein were assigned to genes involved in Paget disease ?
![Page 13: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/13.jpg)
● Three databases are needed to answer this question :● http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim● http://www.pir.uniprot.org/● http://www.geneontology.org/
![Page 14: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/14.jpg)
The SPARQL query submitted to Virtuoso
SELECT ?lab1, ?sub, ?lab3WHERE {<http://bio2rdf.org/omim:602080>
<http://bio2rdf.org/bio2rdf#xGeneticDisorder> ?omim .?omim <http://www.w3.org/2002/07/owl#sameAs> ?mim .?omim <http://www.w3.org/2000/01/rdf-schema#label> ?lab1 .?sub ?p ?mim .?sub <http://www.w3.org/2000/01/rdf-schema#label> ?lab2 .?sub <http://bio2rdf.org/uniprot:classifiedWith> ?clas .?clas <http://www.w3.org/2000/01/rdf-schema#label> ?lab3 .}
![Page 15: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/15.jpg)
This knowledge map represents links that could have been visited by the last query.
Results are almost instantaneous.
![Page 16: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/16.jpg)
Try it ...● We invite you to discover the potential of
bioinformatics linked data queried with SPARQL using Virtuoso.
● Install your own copy of Virtuoso server and download our Bio2RDF data about human and mouse genome.
![Page 17: Bio2RDF/Virtuoso](https://reader033.vdocuments.site/reader033/viewer/2022060122/559511a91a28ab16108b47aa/html5/thumbnails/17.jpg)
Thank you● Virtuoso Open-Source Edition
– http://virtuoso.openlinksw.com/wiki/main/● Bio2RDF web site
– http://www.bio2rdf.org/● Bio2RDF download page
– http://www.bio2rdf.org/download– http://sourceforge.net/projects/bio2rdf/