graph analytics in pharmacology over the web of life sciences linked open data

27
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data 26 th World Wide Web Conference (WWW) Perth, 4 th –8 th April 2017 M AULIK R. K AMDAR AND M ARK A. M USEN Stanford Center for Biomedical Informatics Research [email protected]

Upload: maulik-kamdar

Post on 23-Jan-2018

110 views

Category:

Science


3 download

TRANSCRIPT

Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data

26th World Wide Web Conference (WWW)

Perth, 4th – 8th April 2017

M A U L I K R . K A M D A R A N D M A R K A . M U S E N

Stanford Center for Biomedical Informatics [email protected]

Linked Open Data (LOD) Cloud

Cyganiak, Richard et al. 2014

2

Life Sciences Linked Open Data (LSLOD) Cloud

3

4

Semantic Web: Publishing Data as a Graph

5

589.25

mol_weight

Gleevec (Mol. Wt.: 589.25 g/mol, Half-Life: 18 hours) inhibits PDGFR, involved in signal transduction.

“18 hours”half-life

x-ref

GleevecDrugB: DB00619

Gleevec

Resource Description Framework (RDF)

Inhibits

target name

type

GO:0007165(Signal

Transduction)

process

PDGFRKEGG: D01441http://bio2rdf.org/kegg:D01441

http://bio2rdf.org/drugbank:DB00619

Uniform Resource Identifier

Semantic Web: Querying the Graph

< 1000

mol_weight

?half-life

x-ref

?

?

What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins

involved in signal transduction?

SPARQL Query Language6

Inhibits

?target name

type

GO:0007165(Signal

Transduction)

process

Life Sciences Linked Open Data Cloud – query federation

• Challenges associated with retrieving information from LSLOD sources• Pattern-based method to rewrite queries across LSLOD sources• An application in mechanism-based pharmacovigilance - PhLeGrA

What this talk is about …

7

8

Query Federation: Rewriting and executing queries across different sources

QUERY FEDERATION

Drug molecular-weight < 1000 target

process = “GO:0007165” half-life

9Schwarte, et al. ISWC 2012

Drug molecular-weight < 1000 target half-life

Drug molecular-weight < 1000 target

process = “GO:0007165”

What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit

proteins involved in signal transduction?

Heterogeneity in the LSLOD Cloud

10

Gleevecmolecular-weight

493.61 Gleevecmol_weight

589.25

Label Mismatch: Different labels for classes, relations and attributes

(clinical features) (biological features)

Heterogeneity in the LSLOD Cloud

11

Gleevecmolecular-weight

493.61 Gleevecmol_weight

589.25

Label Mismatch: Different labels for classes, relations and attributes

(clinical features) (biological features)

Heterogeneity in the LSLOD Cloud

12

Gleevec PDGFRdrug-target

Gleevec

Inhibits

PDGFRtarget

name

type

PubMed: 21152856

source

Model Mismatch: Different graph patterns to capture granularity

Gleevecmolecular-weight

493.61 Gleevecmol_weight

589.25

Label Mismatch: Different labels for classes, relations and attributes

(clinical features) (biological features)

Heterogeneity in the LSLOD Cloud

13

• Inconsistent Meanings

• Inconsistent URI labels for classes, relations and attributes

• Inconsistent Attribute values for entities

• Inconsistent Graph patterns for SPARQL queries

• Incomplete Relations between entities

Query Rewriting fails over the LSLOD Cloud

What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction?

?s a <Drug>?s <molecular-weight> ?mw?s <target> ?protein ?s <half-life> ?hl?mw < 1000 g/mol?protein <hasGO> <GO:0007165>

?s a <Drug>{?s <molecular-weight> ?mw}{?s <half-life> ?hl}?mw < 1000 g/mol

?s a <Drug>{?s <target> ?protein}?protein <hasGO> <GO:0007165>

Query Rewriting

14

Using Graph Patterns for Query Rewriting

?Drug DrugBank:drug-target ?Protein?Drug KEGG:target ?blank KEGG:link ?Protein

Mapping Rules:

15

?Drug hasTarget ?Protein

Using Graph Patterns for Query Rewriting

?Drug DrugBank:drug-target ?Protein?Drug KEGG:target ?blank KEGG:link ?Protein

Mapping Rules:

What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction?

?s a <Drug>?s <hasMolWt> ?mw?s <hasTarget> ?protein ?s <hasHalfLife> ?hl?mw < 1000 g/mol?protein <hasGO> <GO:0007165>

?s a <Drug>{?s <molecular-weight> ?mw}?s <drug-target> ?protein {?s <half-life> ?hl}?mw < 1000 g/mol

?s a <Drug>?s <mol_wt> ?mw{?s <target> ?protein_blank?protein_blank <link> ?protein}?protein <hasGO> <GO:0007165>

QueryRewriteQuery Rewriting

16

?Drug hasTarget ?Protein

Life Sciences Linked Open Data Cloud – query federation

• Challenges associated with retrieving information from LSLOD sources• Pattern-based method to rewrite queries across LSLOD sources• An application in mechanism-based pharmacovigilance - PhLeGrA

What this talk is about …

17

PhLeGrA – Linked Graph Analytics in Pharmacology

18

Phlegra is a spider genus of the Salticidae family, commonly termed jumping spiders.

k-partite network will be generated as output

19

Entities and Relations from 4 different sources are retrieved to create the k-partite Network

This k-partite network is generated in < 1 day

20

Query Federation overcomes heterogeneous Distribution of Entities and Relations

R1: Drug hasTarget ProteinE1: Drug

• Similar and complete unique entities and relations exist between data sources• Necessary to get the complete picture, but also determine sources of noise

21

Several underlying mechanisms are possible …

http://onto-apps.stanford.edu/phlegra 22

A graph analytics module to rank the mechanisms

23

Preliminary results using network-based Apriori Algorithm for ranking mechanisms

24

The story so far …

25

Pattern-based federation methods can retrieve data from multiple sources in the Life Sciences Linked Open Data Cloud, and can enable development of advanced

methods for mechanism-based pharmacovigilance.

Acknowledgments

Musen Lab, Stanford

Biomedical Informatics Training Program

Michel Dumontier

US NIH Grant HG004028

26

PhLeGrA – Linked Graph Analytics in Pharmacology

27

www.stanford.edu/~maulikrk/research.htmlwww.onto-apps.stanford.edu/phlegra