computing with knowledge-20070717
TRANSCRIPT
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 1/23
Computing with Knowledge
Alan Ruttenberg and Jonathan Rees
Informatics and Interactomes in Huntington’s Disease
Research July 17, 2007
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 2/23
Science Commons
• Accelerating the scientific research cycle
through targeted projects
–
Publishing: helping authors retain some rights –
Materials transfer: lowering transaction costs
–
Knowledge management: enabing automated
manipulation of data and curated findings
•
Open source KM: using ‘semantic web’ approach to cultivate network effects
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 3/23
Using knowledge in data analysis
• Effective work depends on use of previous scientificresults
•
Researchers are constantly hunting for papers relevant totheir problems - this is time consuming and error-prone
• Use of prior knowledge is uneven and unsystematic
• Computational use of the interactome is proving to be a
useful computational tool
•
How can we improve on its use, and extend the lesson to
other forms of knowledge?
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 4/23
What worked at Millennium?
• Collecting structured knowledge • Integrated public, licensed, and internal KB’s
• The best licensable KB: Ingenuity Systems
•
Developing and applying methods that exploited theknowledge base to analyze experimental data
• Network based algorithms, such as PARIS
• Tools for working with sets (categories)
• Ran targeted queries against collected knowledge to
supply scientists with answers to specific questions • What is known about the cell lines we use?
• What are transcription factors and targets in pathways of interest?
• What molecular processes are known to be disease specific?
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 5/23
The rest of this talk
• Present examples of how we compute with
knowledge now
–
Activity center algorithm for microarrays – Working with network statistics
– Query across integrated databases
• Discuss limitations and where we want to go
•
Talk about what’s needed to get there
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 6/23
PARIS: Activity center analysis
•
Goal: Use prior knowledge to extract higher qualitysignal from expression data.
•
Knowledge used: Pairs of interacting proteins, as
inferred from human, mouse and rat findings in KB,define a network where nodes are proteins andedges are interactions.
• Strategy: Score each gene using its activitycombined with activities of its neighbors; obtain P-
values by testing significance; display usingnetwork layout based on distance between genesin functional network.
Method described in Pradines et al., J Biopharm. Stat., 14 (3) 2004, 701-721
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 7/23
Activity center analysis
• Perturbed by a compound
• Downstream of a target• Involved in drug resistance
Full Interaction Network Data, definingactivity Active Sub-network
+ =
• Compound vs. Normal
• Knockout vs. Wild Type• Responders vs. Non-responders
Hints on the Cellular ProcessesActivityFunctional Interactionsinvolving Gene Products
• Binds• Phosphorylates• Regulates
• Cleaves…
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 8/23
Scoring activity
• Use Monte Carlo simulation to assess significance of scores
Neighborhood term ai Overlap term ij
•
Compute activity score si for each gene in the network
To yield a p-valueanswering: how unusual is
this level of activity?
Score 0
1
F r e q u e n
c y
si
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 9/23
Exploring an activity center in an
inflammation experiment using PARIS
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 10/23
Edge-count statistics
• Goals: Exploit interaction network structureto analyze connectivity between and withinsets; mine the network itself for novelrelationships and structure.
•
Knowledge used: Combinations ofnetworks and sets.
•
Strategy: Apply theory of random graphs tocategory scoring, module discovery, and listexpansion.
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 11/23
The problem with counting edges
About 2 edges/node
About 5 edges/node
Do the 3 edges that link these groups have the same significance?
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 12/23
Null model: Random network
with fixed degree sequence
1
2
At each step pick two edges and swapend nodes
25 swaps later
In this network thereare four edges
between pink andblue sets comparedto one in the initialnetworkEach node has the same
number of edges after a swap
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 13/23
Approximate (but fast) analytic formulas exist
L1 L2
X a=3
k=2
Fast enough to interactively score 10,000s of gene sets
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 14/23
Three statistics: P a P b P l
P a : Edges from a single node to a list ( a=attachment)
P b: Edges between two lists of genes ( b=bipartite)
P l : Number of edges within a list ( l=list)
Pradines, Farutin, Rowley & Dancik, J. Comp. Biol 12(2), 2005, 113-128
P a
P b
P l
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 15/23
P l profile
• Sort genes by expression data and evaluate how well the top n
genes map to known pathways.
Log(P l )
Time course of
treatment of
model cells
optimal number of genesfor mapping to pathways
•
Conclusion: perturbed pathways are best represented by 300genes at 1h and 3000 genes at 3h " important to take early (or
many) time points to study compound effect
Number of genes
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 16/23
Answering questions
• Goals: Get answers to questions posed tothe body of collected knowledge in aneffective way.
• Knowledge used: Publicly availabledatabases, text mining!
•
Strategy: Integrate knowledge using careful
modeling, exploiting open Semantic Webstandards and technologies
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 17/23
A simple target discovery
question
Signal transduction pathways areconsidered to be rich in “druggable” targets - proteins that might respond tochemical therapy
CA1 Pyramidal Neurons are known tobe particularly damaged in Alzheimer ’sdisease.
Casting a wide net, can we findcandidate genes known to be involvedin signal transduction and active inPyramidal Neurons?
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 18/23
There are a lot of high quality public databases
NeuronDB
BAMS
NC Annotations
Homologene
SWAN
EntrezGene
Gene
Ontology
MammalianPhenotype
PDSPki
BrainPharm
AlzGene
Antibodies
PubChem
MESH
Reactome
Allen Brain Atlas
Publications
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 19/23
A SPARQL query spanning four sources
prefix go: <http://purl.org/obo/owl/GO#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>prefix owl: <http://www.w3.org/2002/07/owl#>prefix mesh: <http://purl.org/commons/record/mesh/>prefix sc: <http://purl.org/science/owl/sciencecommons/>prefix ro: <http://www.obofoundry.org/ro/ro.owl#>
select ?genename ?processnamewhere
{ graph <http://purl.org/commons/hcls/pubmesh>{ ?paper ?p mesh:D017966 .?article sc:identified_by_pmid ?paper.?gene sc:describes_gene_or_gene_product_mentioned_by ?article.
}graph <http://purl.org/commons/hcls/goa>
{ ?protein rdfs:subClassOf ?res.?res owl:onProperty ro:has_function.?res owl:someValuesFrom ?res2.?res2 owl:onProperty ro:realized_as.?res2 owl:someValuesFrom ?process.
graph <http://purl.org/commons/hcls/20070416/classrelations>{{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166}union{?process rdfs:subClassOf go:GO_0007166 }}?protein rdfs:subClassOf ?parent.?parent owl:equivalentClass ?res3.?res3 owl:hasValue ?gene.}
graph <http://purl.org/commons/hcls/gene>{ ?gene rdfs:label ?genename }
graph <http://purl.org/commons/hcls/20070416>{ ?process rdfs:label ?processname}
}
Mesh: Pyramidal Neurons
Pubmed: Journal Articles
Entrez Gene: Genes
GO: Signal Transduction
Inference required
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 20/23
Results: genes, processes
DRD1, 1812 adenylate cyclase activation ADRB2, 154 adenylate cyclase activation ADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathwayDRD1IP, 50632 dopamine receptor signaling pathwayDRD1, 1812 dopamine receptor, adenylate cyclase activating pathwayDRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathwayGRM7, 2917 G-protein coupled receptor protein signaling pathwayGNG3, 2785 G-protein coupled receptor protein signaling pathwayGNG12, 55970 G-protein coupled receptor protein signaling pathway
DRD2, 1813 G-protein coupled receptor protein signaling pathway ADRB2, 154 G-protein coupled receptor protein signaling pathwayCALM3, 808 G-protein coupled receptor protein signaling pathwayHTR2A, 3356 G-protein coupled receptor protein signaling pathwayDRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messengerSSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messengerMTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messengerCNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messengerHTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messengerGRIK2, 2898 glutamate signaling pathwayGRIN1, 2902 glutamate signaling pathway
GRIN2A, 2903 glutamate signaling pathwayGRIN2B, 2904 glutamate signaling pathway ADAM10, 102 integrin-mediated signaling pathwayGRM7, 2917 negative regulation of adenylate cyclase activityLRP1, 4035 negative regulation of Wnt receptor signaling pathway ADAM10, 102 Notch receptor processing ASCL1, 429 Notch signaling pathwayHTR2A, 3356 serotonin receptor signaling pathway ADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization)PTPRG, 5793 ransmembrane receptor protein tyrosine kinase signaling pathwayEPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathway
NRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathwayCTNND1, 1500 Wnt receptor signaling pathway
Many of the genes areindeed related to
Alzheimer’s Disease
through gammasecretase (presenilin)activity
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 21/23
What we
d like to do better
• Broader knowledge base - cells, anatomy,physiology, behavior, protocols, reagents
•
Beyond simple interaction: More preciserepresentations of mechanism to be ableto query and exploit computationally
•
Built in a open, scalable, scientifically
credible way, to encourage sustainedcontribution, and to take advantage of“web effects”
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 22/23
How do we get there?
• Interoperation is paramount, but modeling ishard: Work with the OBO Foundry
•
Build a skilled community•
Use (open!) Semantic Web Technologies toenable web effects
• Support and nurture a growing and vigorous
community (SWAN, BIRN, OBI) all of whom buildon the rest and enable others to build more
• Work to advance key technologies andinfrastructure - text mining, structured abstracts,query, reasoning.
8/12/2019 Computing With Knowledge-20070717
http://slidepdf.com/reader/full/computing-with-knowledge-20070717 23/23
What KR in the trenches looks like