bio2rdf@bh2010

58
Bio2RDF Cognoscope A killer app for the life science François Belleau

Upload: francois-belleau

Post on 15-Jul-2015

4.953 views

Category:

Education


0 download

TRANSCRIPT

Bio2RDF CognoscopeA killer app for the life science

François Belleau

Agenda

The problem

What is RDF ?

The vision

What is know about hexokinase ?

A new approche: The Cognoscope

http://www.pcworld.idg.com.au/article/132245/berners-lee_seeks_killer_app_semantic_web

"Similarly, if we could get critical mass in life sciences, if we get a half a dozen or a dozen set of ontologies, the core ones for drug discovery out there, then suddenly the Semantic Web within life sciences would have a critical mass. It'll snowball much more rapidly and it will be copied. Other areas will realize: Oh it's worth investing in this,"

Tim Berners-LeeWWW inventor

The problem: How to do data integration in Bioinformatics ?

Carole Goble (ISWC 2005)

http://www.biopax.org/Docs/2004-10-28_SWLS-SessionVII.pdf

Tokyo subway map

Montreal subway map

http://informationarchitects.jp/ia-trendmap-2007v2/

Web Trend Map 2007

The proposed solution

Bio2RDF solve the problem of data integration in bioinformatics by applying the Semantic Web approach based on RDF, OWL and SPARQL technologies.

Web of data subway map from W3C

http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/#(1)

Bio2RDF inspiration in 2005

What is RDF ?

"Wouldn't it be great if you were able to organize all this information based on your own terms, instead of based on the application you use to access the information ?”

Ramanathan V. GuhaRDF initiator

http://cgi.netscape.com/columns/techvision/innovators_rg.html

Resource Description Framework

It is triples...

<subject> <predicate> <object_uri> .

OR

<subject> <predicate> "object_literal" .

A triple

The same in RDF/XML

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:exterms="http://www.example.org/terms/">

<rdf:Description rdf:about="http://www.example.org/index.html"> <exterms:creation-date>August 16, 1999</exterms:creation-date> </rdf:Description></rdf:RDF>

The same in NTRIPLES

<http://www.example.org/index.html> <http://www.example.org/terms/creation-date> “August 16, 1999” .

It is a technology stack

http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/

It is a distributed architecture

http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/

Goal #1

Convert many public bioinformatic databases to RDF.

Banff, May 8, 2007 CHUL research center ­ Laval University 22

Bio2RDF rdfised public databases

Bio2RDF first map in 2007

Bio2RDF Mouse and Human Atlas map in 2008 65 millions triples

Linked Data cloud evolution

http://linkeddata.org/http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics

Linked data cloudin March 2009

Linked data cloudin May 2007

http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

LODD wins the 2009 Triplify challenge

http://triplify.org/files/challenge_2009/LODD.pdf

Bio2RDF cloud map of namespaces from 2,3 billions triples

How we did it ?

http://www.w3.org/DesignIssues/LinkedData

http://bio2rdf.wiki.sourceforge.net/Banff%20Manifesto

Bio2RDF realtime rdfiser in 2007

Actual Architecture 2010

Offline rdfising process Virtuoso SPARQL endpoints network Namespace resolution through DNS subdomain

Bio2RDF has 3 mirror sites

http://cu.bio2rdf.org/http://qut.bio2rdf.org/http://quebec.bio2rdf.org/

Main REST services

Describe a ressource by a dereferencable URI

http://bio2rdf.org/ns:id

Global services over federated endpoints

http://bio2rdf.org/links/ns:id

http://bio2rdf.org/search/searchedTerm

Targeted services to a specific endpoint

http://bio2rdf.org/linksns/ns2/ns:id

http://bio2rdf.org/searchns/ns/searchedTerm

Goal #2

Ask a useful question to the network of SPARQL endpoints.

What is known about hexokinase ?

Existing integrated search services

NCBI/Entrez EBI/EB-eye

KEGG/DBGET Riken/OmicScan

Ask http://atlas.bio2rdf.org/fct

Submit a SPARQL queryhttp://atlas.bio2rdf.org/sparql

Ask it to each SPARQL endpointhttp://NAMESPACE.bio2rdf.org/fct

Ask Bio2RDF REST federated search http://bio2rdf.org/search/hexokinase

Or use the Cognoscope...

The mashup principle

To answer a complex question we first need to build a specific database, a mashup, to which we submit the appropriate query.

Cognoscope new definition

A Cognoscope is an instrument to explore and collect topics from the Linked Data cloud of SPARQL endpoints. It permits the querying over a distributed network of knowledge resource.

Cognoscope definition

The magnifying effect depends of the density of links between resource (entity links), which is a by-product of the human intellectual activity in the social network.

The filtering effect is based on the inherent semantic of RDF graph described using types and predicates.

Facet browsing is used to zoom in and out in the observed graph.

Full text search is used to discover concept.

Cognoscope function

How can we submit a complex query over the network of SPARQL endpoints ?

By using a workflow fetching individual SPARQL endpoints.

We use a workflow to build the mashup.

Bio2RDF Cognoscope architecture

Linked Data cloud of SPARQL endpoints

TriplestoreVirtuoso 6

Workflow engine

Taverna 2.1

By building a mashup with Taverna

Write your complex SPARQL query as if a global graph would be available

Identify the needed namespaces and split the query to fetch each data source separetly

Build a mashup using a Taverna workflow that instanciate a local triplestore

Execute your complex query locally on the mashup

The SPARQL query needed(dont try this home, do it on the web !)

Bio2RDF Cognoscope using Taverna 2.1

Cognoscope query for What is known about hexokinase ?

Et voilà !

Where to get Bio2RDF Cognoscopehttp://www.myexperiment.org/search?query=cognoscope

Bio2RDF SPARQL endpointshttp://delicious.com/tag/bio2rdf:sparql

Thanks

The Bio2RDF community

Centre de recherche du CHUL

Dumontier Lab

QUT eResearch Center

The software provider

Openlink Virtuoso

Taverna community

My colleagues

Marc-Alexandre Nolin

Michel Dumontier

Peter Ansell

Can you help ?