computing with knowledge-20070717

23
8/12/2019 Computing With Knowledge-20070717 http://slidepdf.com/reader/full/computing-with-knowledge-20070717 1/23  Computing with Knowledge  Alan Ruttenberg and Jonathan Rees Informatics and Interactomes in Huntingtons Disease Research July 17, 2007  

Upload: alanruttenberg

Post on 03-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 1/23

 

Computing with Knowledge 

 Alan Ruttenberg and Jonathan Rees

Informatics and Interactomes in Huntington’s Disease

Research July 17, 2007  

Page 2: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 2/23

Science Commons 

•  Accelerating the scientific research cycle

through targeted projects

 – 

Publishing: helping authors retain some rights –

 

Materials transfer: lowering transaction costs

 – 

Knowledge management: enabing automated

manipulation of data and curated findings

• 

Open source KM: using ‘semantic web’ approach to cultivate network effects

Page 3: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 3/23

Using knowledge in data analysis 

•  Effective work depends on use of previous scientificresults

• 

Researchers are constantly hunting for papers relevant totheir problems - this is time consuming and error-prone

•  Use of prior knowledge is uneven and unsystematic

•  Computational use of the interactome is proving to be a

useful computational tool

• 

How can we improve on its use, and extend the lesson to

other forms of knowledge? 

Page 4: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 4/23

What worked at Millennium?  

•  Collecting structured knowledge •  Integrated public, licensed, and internal KB’s

•  The best licensable KB: Ingenuity Systems

• 

Developing and applying methods that exploited theknowledge base to analyze experimental data 

•  Network based algorithms, such as PARIS

•  Tools for working with sets (categories)

•  Ran targeted queries against collected knowledge to

supply scientists with answers to specific questions •  What is known about the cell lines we use?

•  What are transcription factors and targets in pathways of interest?

•  What molecular processes are known to be disease specific?

Page 5: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 5/23

The rest of this talk  

•  Present examples of how we compute with

knowledge now

 – 

 Activity center algorithm for microarrays – Working with network statistics

 – Query across integrated databases

•  Discuss limitations and where we want to go

• 

Talk about what’s needed to get there 

Page 6: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 6/23

PARIS: Activity center analysis  

• 

Goal: Use prior knowledge to extract higher qualitysignal from expression data.

• 

Knowledge used: Pairs of interacting proteins, as

inferred from human, mouse and rat findings in KB,define a network where nodes are proteins andedges are interactions.

•  Strategy: Score each gene using its activitycombined with activities of its neighbors; obtain P-

values by testing significance; display usingnetwork layout based on distance between genesin functional network.

Method described in Pradines et al., J Biopharm. Stat., 14 (3) 2004, 701-721 

Page 7: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 7/23

 Activity center analysis 

• Perturbed by a compound

• Downstream of a target• Involved in drug resistance

Full Interaction Network Data, definingactivity   Active Sub-network

+ = 

• Compound vs. Normal

• Knockout vs. Wild Type• Responders vs. Non-responders

Hints on the Cellular ProcessesActivityFunctional Interactionsinvolving Gene Products

• Binds• Phosphorylates• Regulates

• Cleaves…

Page 8: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 8/23

Scoring activity  

•  Use Monte Carlo simulation to assess significance of scores

Neighborhood term ai  Overlap term  ij 

• 

Compute activity score si for each gene in the network 

To yield a p-valueanswering: how unusual is

this level of activity?

Score 0

1

   F  r  e  q  u  e  n

  c  y

si

Page 9: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 9/23

Exploring an activity center in an

inflammation experiment using PARIS 

Page 10: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 10/23

Edge-count statistics 

•  Goals: Exploit interaction network structureto analyze connectivity between and withinsets; mine the network itself for novelrelationships and structure.

• 

Knowledge used: Combinations ofnetworks and sets.

• 

Strategy: Apply theory of random graphs tocategory scoring, module discovery, and listexpansion.

Page 11: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 11/23

The problem with counting edges  

About 2 edges/node

About 5 edges/node

Do the 3 edges that link these groups have the same significance?

Page 12: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 12/23

Null model: Random network

with fixed degree sequence  

1

2

 At each step pick two edges and swapend nodes

25 swaps later

In this network thereare four edges

between pink andblue sets comparedto one in the initialnetworkEach node has the same

number of edges after a swap

Page 13: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 13/23

 Approximate (but fast) analytic formulas exist 

 L1   L2 

 X a=3

k=2 

Fast enough to interactively score 10,000s of gene sets

Page 14: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 14/23

Three statistics:  P a  P b  P l  

P a : Edges from a single node to a list ( a=attachment) 

P b: Edges between two lists of genes ( b=bipartite) 

P l : Number of edges within a list ( l=list)

 Pradines, Farutin, Rowley & Dancik, J. Comp. Biol 12(2), 2005, 113-128 

P a 

P b 

P l 

Page 15: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 15/23

 P l  profile 

•  Sort genes by expression data and evaluate how well the top n

genes map to known pathways.

Log(P l ) 

Time course of

treatment of

model cells

optimal number of genesfor mapping to pathways 

• 

Conclusion: perturbed pathways are best represented by 300genes at 1h and 3000 genes at 3h " important to take early (or

many) time points to study compound effect

Number of genes

Page 16: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 16/23

 Answering questions 

•  Goals: Get answers to questions posed tothe body of collected knowledge in aneffective way.

•  Knowledge used: Publicly availabledatabases, text mining! 

• 

Strategy: Integrate knowledge using careful

modeling, exploiting open Semantic Webstandards and technologies

Page 17: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 17/23

 A simple target discovery  

question 

Signal transduction pathways areconsidered to be rich in “druggable” targets - proteins that might respond tochemical therapy

CA1 Pyramidal Neurons are known tobe particularly damaged in Alzheimer ’sdisease.

Casting a wide net, can we findcandidate genes known to be involvedin signal transduction and active inPyramidal Neurons?

Page 18: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 18/23

There are a lot of high quality public databases

NeuronDB

BAMS

NC Annotations

Homologene

SWAN

EntrezGene

Gene

Ontology

MammalianPhenotype

PDSPki

BrainPharm

 AlzGene

 Antibodies

PubChem

MESH

Reactome

 Allen Brain Atlas

Publications

Page 19: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 19/23

 A SPARQL query spanning four sources 

prefix go: <http://purl.org/obo/owl/GO#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>prefix owl: <http://www.w3.org/2002/07/owl#>prefix mesh: <http://purl.org/commons/record/mesh/>prefix sc: <http://purl.org/science/owl/sciencecommons/>prefix ro: <http://www.obofoundry.org/ro/ro.owl#>

select ?genename ?processnamewhere

{ graph <http://purl.org/commons/hcls/pubmesh>{ ?paper ?p mesh:D017966  .?article sc:identified_by_pmid ?paper.?gene sc:describes_gene_or_gene_product_mentioned_by ?article.

}graph <http://purl.org/commons/hcls/goa>

{ ?protein rdfs:subClassOf ?res.?res owl:onProperty ro:has_function.?res owl:someValuesFrom ?res2.?res2 owl:onProperty ro:realized_as.?res2 owl:someValuesFrom ?process.

graph <http://purl.org/commons/hcls/20070416/classrelations>{{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166}union{?process rdfs:subClassOf go:GO_0007166 }}?protein rdfs:subClassOf ?parent.?parent owl:equivalentClass ?res3.?res3 owl:hasValue ?gene.}

graph <http://purl.org/commons/hcls/gene>{ ?gene rdfs:label ?genename }

graph <http://purl.org/commons/hcls/20070416>{ ?process rdfs:label ?processname}

}

Mesh: Pyramidal Neurons

Pubmed: Journal Articles

Entrez Gene: Genes

GO: Signal Transduction

Inference required  

Page 20: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 20/23

Results: genes, processes 

DRD1, 1812 adenylate cyclase activation ADRB2, 154 adenylate cyclase activation ADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathwayDRD1IP, 50632 dopamine receptor signaling pathwayDRD1, 1812 dopamine receptor, adenylate cyclase activating pathwayDRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathwayGRM7, 2917 G-protein coupled receptor protein signaling pathwayGNG3, 2785 G-protein coupled receptor protein signaling pathwayGNG12, 55970 G-protein coupled receptor protein signaling pathway

DRD2, 1813 G-protein coupled receptor protein signaling pathway ADRB2, 154 G-protein coupled receptor protein signaling pathwayCALM3, 808 G-protein coupled receptor protein signaling pathwayHTR2A, 3356 G-protein coupled receptor protein signaling pathwayDRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messengerSSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messengerMTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messengerCNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messengerHTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messengerGRIK2, 2898 glutamate signaling pathwayGRIN1, 2902 glutamate signaling pathway

GRIN2A, 2903 glutamate signaling pathwayGRIN2B, 2904 glutamate signaling pathway ADAM10, 102 integrin-mediated signaling pathwayGRM7, 2917 negative regulation of adenylate cyclase activityLRP1, 4035 negative regulation of Wnt receptor signaling pathway ADAM10, 102 Notch receptor processing ASCL1, 429 Notch signaling pathwayHTR2A, 3356 serotonin receptor signaling pathway ADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization)PTPRG, 5793 ransmembrane receptor protein tyrosine kinase signaling pathwayEPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathway

NRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathwayCTNND1, 1500 Wnt receptor signaling pathway

Many of the genes areindeed related to

Alzheimer’s Disease

through gammasecretase (presenilin)activity

Page 21: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 21/23

What we 

d like to do better 

•  Broader knowledge base - cells, anatomy,physiology, behavior, protocols, reagents

• 

Beyond simple interaction: More preciserepresentations of mechanism to be ableto query and exploit computationally

• 

Built in a open, scalable, scientifically

credible way, to encourage sustainedcontribution, and to take advantage of“web effects” 

Page 22: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 22/23

How do we get there?  

•  Interoperation is paramount, but modeling ishard: Work with the OBO Foundry

• 

Build a skilled community• 

Use (open!) Semantic Web Technologies toenable web effects

•  Support and nurture a growing and vigorous

community (SWAN, BIRN, OBI) all of whom buildon the rest and enable others to build more 

•  Work to advance key technologies andinfrastructure - text mining, structured abstracts,query, reasoning. 

Page 23: Computing With Knowledge-20070717

8/12/2019 Computing With Knowledge-20070717

http://slidepdf.com/reader/full/computing-with-knowledge-20070717 23/23

What KR in the trenches looks like