graph analytics in pharmacology over the web of life sciences linked open data
TRANSCRIPT
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
26th World Wide Web Conference (WWW)
Perth, 4th – 8th April 2017
M A U L I K R . K A M D A R A N D M A R K A . M U S E N
Stanford Center for Biomedical Informatics [email protected]
Semantic Web: Publishing Data as a Graph
5
589.25
mol_weight
Gleevec (Mol. Wt.: 589.25 g/mol, Half-Life: 18 hours) inhibits PDGFR, involved in signal transduction.
“18 hours”half-life
x-ref
GleevecDrugB: DB00619
Gleevec
Resource Description Framework (RDF)
Inhibits
target name
type
GO:0007165(Signal
Transduction)
process
PDGFRKEGG: D01441http://bio2rdf.org/kegg:D01441
http://bio2rdf.org/drugbank:DB00619
Uniform Resource Identifier
Semantic Web: Querying the Graph
< 1000
mol_weight
?half-life
x-ref
?
?
What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins
involved in signal transduction?
SPARQL Query Language6
Inhibits
?target name
type
GO:0007165(Signal
Transduction)
process
Life Sciences Linked Open Data Cloud – query federation
• Challenges associated with retrieving information from LSLOD sources• Pattern-based method to rewrite queries across LSLOD sources• An application in mechanism-based pharmacovigilance - PhLeGrA
What this talk is about …
7
Query Federation: Rewriting and executing queries across different sources
QUERY FEDERATION
Drug molecular-weight < 1000 target
process = “GO:0007165” half-life
9Schwarte, et al. ISWC 2012
Drug molecular-weight < 1000 target half-life
Drug molecular-weight < 1000 target
process = “GO:0007165”
What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit
proteins involved in signal transduction?
Heterogeneity in the LSLOD Cloud
10
Gleevecmolecular-weight
493.61 Gleevecmol_weight
589.25
Label Mismatch: Different labels for classes, relations and attributes
(clinical features) (biological features)
Heterogeneity in the LSLOD Cloud
11
Gleevecmolecular-weight
493.61 Gleevecmol_weight
589.25
Label Mismatch: Different labels for classes, relations and attributes
(clinical features) (biological features)
Heterogeneity in the LSLOD Cloud
12
Gleevec PDGFRdrug-target
Gleevec
Inhibits
PDGFRtarget
name
type
PubMed: 21152856
source
Model Mismatch: Different graph patterns to capture granularity
Gleevecmolecular-weight
493.61 Gleevecmol_weight
589.25
Label Mismatch: Different labels for classes, relations and attributes
(clinical features) (biological features)
Heterogeneity in the LSLOD Cloud
13
• Inconsistent Meanings
• Inconsistent URI labels for classes, relations and attributes
• Inconsistent Attribute values for entities
• Inconsistent Graph patterns for SPARQL queries
• Incomplete Relations between entities
Query Rewriting fails over the LSLOD Cloud
What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction?
?s a <Drug>?s <molecular-weight> ?mw?s <target> ?protein ?s <half-life> ?hl?mw < 1000 g/mol?protein <hasGO> <GO:0007165>
?s a <Drug>{?s <molecular-weight> ?mw}{?s <half-life> ?hl}?mw < 1000 g/mol
?s a <Drug>{?s <target> ?protein}?protein <hasGO> <GO:0007165>
Query Rewriting
14
Using Graph Patterns for Query Rewriting
?Drug DrugBank:drug-target ?Protein?Drug KEGG:target ?blank KEGG:link ?Protein
Mapping Rules:
15
?Drug hasTarget ?Protein
Using Graph Patterns for Query Rewriting
?Drug DrugBank:drug-target ?Protein?Drug KEGG:target ?blank KEGG:link ?Protein
Mapping Rules:
What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction?
?s a <Drug>?s <hasMolWt> ?mw?s <hasTarget> ?protein ?s <hasHalfLife> ?hl?mw < 1000 g/mol?protein <hasGO> <GO:0007165>
?s a <Drug>{?s <molecular-weight> ?mw}?s <drug-target> ?protein {?s <half-life> ?hl}?mw < 1000 g/mol
?s a <Drug>?s <mol_wt> ?mw{?s <target> ?protein_blank?protein_blank <link> ?protein}?protein <hasGO> <GO:0007165>
QueryRewriteQuery Rewriting
16
?Drug hasTarget ?Protein
Life Sciences Linked Open Data Cloud – query federation
• Challenges associated with retrieving information from LSLOD sources• Pattern-based method to rewrite queries across LSLOD sources• An application in mechanism-based pharmacovigilance - PhLeGrA
What this talk is about …
17
PhLeGrA – Linked Graph Analytics in Pharmacology
18
Phlegra is a spider genus of the Salticidae family, commonly termed jumping spiders.
Entities and Relations from 4 different sources are retrieved to create the k-partite Network
This k-partite network is generated in < 1 day
20
Query Federation overcomes heterogeneous Distribution of Entities and Relations
R1: Drug hasTarget ProteinE1: Drug
• Similar and complete unique entities and relations exist between data sources• Necessary to get the complete picture, but also determine sources of noise
21
Several underlying mechanisms are possible …
http://onto-apps.stanford.edu/phlegra 22
The story so far …
25
Pattern-based federation methods can retrieve data from multiple sources in the Life Sciences Linked Open Data Cloud, and can enable development of advanced
methods for mechanism-based pharmacovigilance.
…
Acknowledgments
Musen Lab, Stanford
Biomedical Informatics Training Program
Michel Dumontier
US NIH Grant HG004028
26
PhLeGrA – Linked Graph Analytics in Pharmacology
27
www.stanford.edu/~maulikrk/research.htmlwww.onto-apps.stanford.edu/phlegra