systems biology in polypharmacology: explaining and predicting drug secondary effects. - master...
TRANSCRIPT
NameProba Expected countStDev in case
Non-randomCount in random pollof random poll
PYGL_HUMAN96.58%40.79680.35122176
RHOC_HUMAN95.77%40.7680.442368
GLTP_HUMAN95.77%40.7680.442368
C43BP_HUMAN95.77%40.7680.442368
FUT8_HUMAN95.77%40.7680.442368
RET7_HUMAN95.77%40.7680.442368
CP2E1_HUMAN94.43%51.43040.70953984
2ABA_HUMAN93.88%40.99840.55148544
AUHM_HUMAN93.88%40.99840.55148544
DX39B_HUMAN93.66%41.35360.44411904
NGF_HUMAN93.49%115.01122.33312256
NTRK1_HUMAN93.49%115.01122.33312256
KIF11_HUMAN93.03%51.56480.82308096
???Page ??? (???)00/00/0000, 00:00:00Page /
Systems biology in polypharmacology:predicting and explaining off-target effects
Bourne lab at UCSDUnder the supervision of Pr. BourneUnder the direction of Pr. Bart Deplancke
Andrei Kucharavy, EPFL SV 2013, Computational Biology minor
Problem
Image courtesy of Scannell et al. 2012 : Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews. Drug Discovery 11, 191200
Astra ZencaGlaxoSmithKlineSanofiRoche Holding AGPfizer Inc
11.8 b$/drug8.2 b$/drug7.9 b$/drug7.8 b$/drug7.7 b$/drug
Pharma Big 5 drug design expenditures as of 2012 (Matthew Herper @ Forbes )
Problem
95% remaining leads fail here
>95% leads fail here
Image Courtesy of Alzheimers Drug Discovery Foundation
One disease one gene one drug
Step 1: find a gene relevant to a disease
Step 2: design small molecule inhibitor for it
Step 3: test it on cellular animal models
Step 4: discover secondary effects or absence of
therapeutic effect
Step 5: modify lead to control toxicity
Repeat steps 3-5 until no more funds available
If you are lucky, secondary effects are minor to absent get more funds and move to human trials.
Pay attention to unexpected sec. effects
Pay attention to absence of therapeutic effect in humans
Unexpected pharmacological effects
Absence of therapeutic effect:Main cause of rational drug desing failure in the 90th
Have been overcome with better understanding of biolgy
Secondary effects:Cyt-c : well understood and controlled
Unspecific binding:Very frequentHard to predictHard to interpret
Binding: absolutely no idea whatsoever about what is going on. The target was designed to bind one single target, but often binds many others. Due to protein conformation variation, existence of complex catalytic sites and post-translational modifications of different proteins, predicting off-target binding is a nightmarish job.
Polypharmacology
Specific agonist / antagonist design are rare:protein sites similarity
catalytic sites within complexes
Some drugs owe their pharmacological action to their unspecificity:Encaptone
Ibogaine
Chlorpromazine
Kanamycin
Polypharmacology:Use computational methods to predict all the targets a small molecule is likely perturb
Use systems biology to predict consequences of such perturbationSecondary effects
Unexpected therapeutic effect (repositioning)
Unexpected absence of therapeutic effect (animal model human difference)
Scope of master project
Prediction of perturbed targets set:Drugdesigntech since 2009
Bourne lab since 2007-2008
Analysis and interpretation of the perturbed targets set is still largely manual. The goal of this master project is to curb this.
Image courtesy of Xie et al. (2011). Drug discovery using chemical systems biology: weak inhibition of multiple kinases may contribute to the anti-cancer effect of nelfinavir. PLoS Computational Biology
Master project environment
EPFL engineering internship
Tool for integrative bioinformatics platform
Used for biotech consulting (polypharmacological effect prediction)
> 1000 drugs and ~1300 human proteins
The Bourne LabPDB RSCB, Supertarget, IEDB, BioLit
Reliable pipeline for drug off-target effect prediction (4530 protein models, 140 approved drugs)
7 publications in polypharmacology
Polypharmacological action mechanisms recovery
Source:List of proteins perturbed by a drug
Wanted:Mechanisms of unexpected pharmacological action, understandable for a biologistPathway, biological entities, mechanism names
Ordered by relevance
Unexpected pharmacological action mechanism model, usable for prediction on new drugs
Rigid structure of Interactions = Interactome
Knowledge access structure = GO + pathway names
Global idea
Global idea
Platelet activation
Immune response onset
Th17 activation
Polypharmacological effect model suited for prediction on new drugs
Polypharmacological effect mechanism understandable for biology expert
Devil is in the details
How to retrieve relevant annotation and sort it by relevance?
How to determine which targets are to be included in the model?
Missiuro's information flow and protein informativity
Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350.
Each protein transmits some information to all the other protein within interactome / set of interest (otherwise evolution would have eliminated it)
Information can only be transmitted through direct interaction (contact, co-complex, participation to the same biochemical reaction)
The information conductance of an edge is proportional to the interaction importance or confidence
The information flow is computed between all the pairs of protein within the set (Kirchoff laws + matrix operations).
Set-specific informativity score is defined for each element of interactome as sum of all pairwise information flows
If Time 1:
Math Behind Information Flow
Kirchoff law:For each node, except for sink and source sum entering currents equal exiting currents
For each edge, V = I*R = I/G
Conductance matrix M:
Current vector J: Voltage vector V:
Solve M*V=J; use V to determine information flow through each node
1243G2G1G3G4G1+G2-G1-G20
-G1G1+G30-G3
-G20G2+G4-G4
0-G3-G4G3+G4
I1=1
I2=0
I3=0
I4=-1
V1
V2
V3
V4
Missiuro's information flow and protein informativity
Advantage over betweenness degree and edge degree:recovers weak multi-hub regulators
Better at predicting essential genes
Better at predicting genes essential for a specific function (organ development)
Advantage over stocheomtric methods: No need to solve 64k differential equations (unstable!)
Reflects not only metabolism, but also regulation
Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350.
Model creation
Recover targets affected by drugs with a given polypharmacological effect
Compute the information circulation within interactome for these drugs
Include all the targets with a significant informativity=> hidden targets
Retrieve relevant annotation
Not all the GO terms
are equivalent
GO term informativity (~protein info for Missiuro et al.)Expand annotations:T-cell apoptosis regulation T cell + apoptosis +immune system +...
Define term informativity:
Use it to compute the flow through
each term in a pair of proteins:
Informativity = conductance
Compute total informativity within a group as a sum of flows through each term in each pair, decided by targets number squared
Total targets
Targets annotated with a given GO term
Same secondary effect might have distinct mechanisms
Cluster affected targets by their annotation similarity
Compute GO-based information circulation within each cluster and sort GO terms by informativity
Use clusters as additional polypharmacological action models
Fixed tension between sink and sourceEach GO term shared by the sink and the source passes information current
Clustering
Clustering
Advantages of the HSC clusteringNo a priori on the cluster number: depending on the secondary effect there might be 2 possible modes of action, just as 20
Based on the topological features of the network
Disadvantages of the HSC clusteringNaive implemetation twice exponential, NP-complex
Probabilist from required for > 20 targets
Clustering
Bond Energy AlgorithmRapid & simple
Markov clusteringDerived from radom walk (same foundation as Page Rank)
Uses eigenvectors/eigenvalues of a random transition matrix
Rapid and stable, but requires two arbitrarily chosen parameters
GO term informativity advantages
Map to the biological concepts Interpretation by expert biologists => biological sense ?(cf. Potti 2010 scandal at Duke over metagene signature)
Molecular relation databases typically do badly in some cases:Systemic effects (T-cell maturation, circadian rhythm, )Endocrine regulationCentral Nervous System (GO however isn't the best ontology for this)
Ability to plug in additional data from literature analysis (just account for confidence)
Render Bioinformatics 100 prots name vectors disease signatures readable and understandable for biologists: cf. Nature Medecine 2010 retraction scandalComplementarity with pure information circulation methods for the endocrine system: concepts such as increase of blood pressure might be pretty good signals interpreted by cell membranes, but impossible to encode in the conventional interactomes
Implementation: case of pancreatitis and cirrhosis
Sec. Effects from SIDER (EMBL)
Drug-target interaction from Bourne lab and Drugdesigntech simulation results
Group drugs by secondary effect
Filter out targets that are frequently affected in random drug collections (Student T-test)
Pancreatitis: clustering results
Clustering:BEA at UCSD1 major clusterRHOC, NGF,
NTRK1
Cirrhosis: clustering results
Clustering:HSC at Drugdesign tech4 major clusters1 of them (all
4 were informative and relevant) most informative of
them:KSYK_HUMAN,
CSK_HUMAN
Quantitative polypharmacological effect prediction
Outline:Compute the information circulation for pharmacological effect specific targets
Measure dicrease of information circulation within the all targets model and the cluster models
Quantitative polypharmacological effect prediction
Outline:Compute the information circulation for pharmacological effect specific targets
Measure dicrease of information circulation within the all targets model and the cluster models
Backbone for the interactome information flow computation
NIC-Nature Pathway Interaction DatabaseNo, too small coverage
Kegg Patwhay database No, pathway-oriented and non-connex for atomic interactions
UnipathwayNo, too small coverage
Reactome.orgYay
Reactome.org : idea
Reactome.org structure: BioPax : xml / RDF / OWL
Physical entities:Proteins, small molecules, Complexes, RNA, DNA
Fragments of physical entities
Interaction:Degradation / polymerisation / Biochemical reactions
Molecular interaction
Genetic interaction
Pathways, Genes, Post-translational modifications...
Reactome.org : reality
Reality of Reactome.org: Main connex element: ~ 22 000 entities, but 3 other with >50 elements
Presence of generic classes : groups of objects
Proteins = mix between proteins, domains, groups, groups of domains
15 000 proteins, 5000 UNIPROT references
156 genes, 56 RNA moleculestranslation / transcription regulation is not well described
Reactome.org: incompleteness
Still incomplete and reliant on comments:Case of SRC=>HiNT database added
Verification of pipeline:
Information routing decay
Image courtesy Wintermute et al. (2010). Emergent cooperation in microbial metabolism. Molecular Systems Biology 6, 407.
Verfication of pipeline
Predicting target drugability
186 oral small-molecule drug targets from Overington's 2006 How many drugs are there?
77 plasma membrane targets
1289 total plasma membrane proteins with Uniprot references in Reactome.org
Use the following to predict drugability:Overall informativityGO-term specific informativityTarget abundance (higher abidance, more off-target action in case of total inhibition)
Valid targets
Non-targets
Drugability prediction with
some complexity
Raw prediction is little better then random:65% specificity, 60% selectivity
However, if we account for:Non-oral, Non small molecule drugs
Drugs developed or in development since 2006
GO-specific informativity
The fact Reactome.org / HiNT are bad in representing CNS functions
The prediction results are rather encouraging:75% specificity, 90% selectivity
Before we can conclude
The methods required for the information circulation have been coded Information circulation for the target set
Calculation of information variation in case of perturbed interactome alteration
However, before this project can be deemed concludedmodel creation and model utilization parts have to be assembled into a single pipeline (right now they are separate)
Run model creation prediction on several secondary effects with random training / testing set validation
Conclusions
GO-based information circulation method seems to work well for secondary effect mechanism retrieval
Reactome.org / HiNT dataset based information circulation method seems to be potentially useful for computationally assisted drug design
Information circulation methods for secondary effects quantitative prediction must be tested before this project can be concluded
Moving further
Finding datasets and people interested in further development of the method:SNP cumulative effectRequires ability to project on the protein 3D structure and estimate protein activity inhibition in different contexts
Drug Design : secondary effect predictionTypical pharmaceutical firms datastores contain way more information about toxicity of different compounds and allow much more finely tuned modeling of pharmacological effects
Difference between animal and human interactomes:Predict unexpected polypharmacological effects upon transition from animal to human trials
Acknowledgements
Pr. Philip BournePr. Bart DeplanckeCedric MerlotLi Xie Spencer BlievenRoland Diggelmann
Andreas PrlicJulia PonomarenkoLilia IakouchevaJiang WangCole ChristieAudrey Schenker
THE END
QUESTIONS?
THE END
QUESTIONS?
Graph databases
Random matrix theory
Method improvement
If time: Improvements
For retrieving statistically significant targets, abandon nave statistical drug target filtering
build drug-specific information flows
recover all sufficiently informative proteins for each drug
use that proteins to get statistically significant targets=> avoids close miss errors
When sorting targets:Sort the most significant GO terms not by their informativity,
but by how much information flow associated to them is perturbed by the given target set=> avoid need to tune GO term informativity=> better interpretability
If time: Improvements
When computing the information flowNot consider the information flow between any pair of proteins as constant
Consider associated tension (voltage) as constant
Unrelated proteins are likely to exchange less information
To avoid information circulation distortion due to GO terms correlation:Don't use Tanimoto distance / conductance model for GO-based term circulation
Use the real point-to-point routing within the GO terms graph
If time 1:
Random matrix theory
Molecular evolution:Adaptive mutations = survival of the fittestRandom mutations = Kimura's driftTools to separate the two
Protein interaction network evolution:Adaptative topology modificationsRandom topology artefacts phosphorilation pattern modification due to random mutations
Separating the 2=????
Nothing in biology makes sense except in the light of evolution.Theodius Dobjansky
If time 1:
Random matrix theory
In sparse matrices (~=Graphs):Random matrices have specific eigenvaluesAll eignevalues exceeding these values are non-random
Clustering can later be performed in the space generated by the associated eigenvectors of non-random eigenvalues
If time 2:
Graph Databases
neo4j
Titan DB
If time 2:
Graph Databases
Tinkerpop stack: ~ SQL for Graph databases
If time 3:
Conclusions general
Graph databases are worth a try for systems biology applications
We need to assemble one comprehensive, complete and WELL DOCUMENTED resource for computational systems biology
If time2: Non-scientific lessons learned
Project advancement:Never try to reach one huge goal at once
Split the goal in series of small subgoals, each goal resulting in a publication / conference paper
Large bioinformatics resource maintenance
Mechanisms of inter-laboratory collaborations and PhD student supervision