systems biology in polypharmacology: explaining and predicting drug secondary effects. - master...

NameProba Expected countStDev in case

Non-randomCount in random pollof random poll

PYGL_HUMAN96.58%40.79680.35122176

RHOC_HUMAN95.77%40.7680.442368

GLTP_HUMAN95.77%40.7680.442368

C43BP_HUMAN95.77%40.7680.442368

FUT8_HUMAN95.77%40.7680.442368

RET7_HUMAN95.77%40.7680.442368

CP2E1_HUMAN94.43%51.43040.70953984

2ABA_HUMAN93.88%40.99840.55148544

AUHM_HUMAN93.88%40.99840.55148544

DX39B_HUMAN93.66%41.35360.44411904

NGF_HUMAN93.49%115.01122.33312256

NTRK1_HUMAN93.49%115.01122.33312256

KIF11_HUMAN93.03%51.56480.82308096

???Page ??? (???)00/00/0000, 00:00:00Page /

Systems biology in polypharmacology:predicting and explaining off-target effects

Bourne lab at UCSDUnder the supervision of Pr. BourneUnder the direction of Pr. Bart Deplancke

Andrei Kucharavy, EPFL SV 2013, Computational Biology minor

Problem

Image courtesy of Scannell et al. 2012 : Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews. Drug Discovery 11, 191200

Astra ZencaGlaxoSmithKlineSanofiRoche Holding AGPfizer Inc

11.8 b$/drug8.2 b$/drug7.9 b$/drug7.8 b$/drug7.7 b$/drug

Pharma Big 5 drug design expenditures as of 2012 (Matthew Herper @ Forbes )

Problem

95% remaining leads fail here

>95% leads fail here

Image Courtesy of Alzheimers Drug Discovery Foundation

One disease one gene one drug

Step 1: find a gene relevant to a disease

Step 2: design small molecule inhibitor for it

Step 3: test it on cellular animal models

Step 4: discover secondary effects or absence of
therapeutic effect

Step 5: modify lead to control toxicity

Repeat steps 3-5 until no more funds available

If you are lucky, secondary effects are minor to absent get more funds and move to human trials.

Pay attention to unexpected sec. effects

Pay attention to absence of therapeutic effect in humans

Unexpected pharmacological effects

Absence of therapeutic effect:Main cause of rational drug desing failure in the 90th

Have been overcome with better understanding of biolgy

Secondary effects:Cyt-c : well understood and controlled

Unspecific binding:Very frequentHard to predictHard to interpret

Binding: absolutely no idea whatsoever about what is going on. The target was designed to bind one single target, but often binds many others. Due to protein conformation variation, existence of complex catalytic sites and post-translational modifications of different proteins, predicting off-target binding is a nightmarish job.

Polypharmacology

Specific agonist / antagonist design are rare:protein sites similarity

catalytic sites within complexes

Some drugs owe their pharmacological action to their unspecificity:Encaptone

Ibogaine

Chlorpromazine

Kanamycin

Polypharmacology:Use computational methods to predict all the targets a small molecule is likely perturb

Use systems biology to predict consequences of such perturbationSecondary effects

Unexpected therapeutic effect (repositioning)

Unexpected absence of therapeutic effect (animal model human difference)

Scope of master project

Prediction of perturbed targets set:Drugdesigntech since 2009

Bourne lab since 2007-2008

Analysis and interpretation of the perturbed targets set is still largely manual. The goal of this master project is to curb this.

Image courtesy of Xie et al. (2011). Drug discovery using chemical systems biology: weak inhibition of multiple kinases may contribute to the anti-cancer effect of nelfinavir. PLoS Computational Biology

Master project environment

EPFL engineering internship

Tool for integrative bioinformatics platform

Used for biotech consulting (polypharmacological effect prediction)

> 1000 drugs and ~1300 human proteins

The Bourne LabPDB RSCB, Supertarget, IEDB, BioLit

Reliable pipeline for drug off-target effect prediction (4530 protein models, 140 approved drugs)

7 publications in polypharmacology

Polypharmacological action mechanisms recovery

Source:List of proteins perturbed by a drug

Wanted:Mechanisms of unexpected pharmacological action, understandable for a biologistPathway, biological entities, mechanism names

Ordered by relevance

Unexpected pharmacological action mechanism model, usable for prediction on new drugs

Rigid structure of Interactions = Interactome

Knowledge access structure = GO + pathway names

Global idea

Global idea

Platelet activation

Immune response onset

Th17 activation

Polypharmacological effect model suited for prediction on new drugs

Polypharmacological effect mechanism understandable for biology expert

Devil is in the details

How to retrieve relevant annotation and sort it by relevance?

How to determine which targets are to be included in the model?

Missiuro's information flow and protein informativity

Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350.

Each protein transmits some information to all the other protein within interactome / set of interest (otherwise evolution would have eliminated it)

Information can only be transmitted through direct interaction (contact, co-complex, participation to the same biochemical reaction)

The information conductance of an edge is proportional to the interaction importance or confidence

The information flow is computed between all the pairs of protein within the set (Kirchoff laws + matrix operations).

Set-specific informativity score is defined for each element of interactome as sum of all pairwise information flows

If Time 1:
Math Behind Information Flow

Kirchoff law:For each node, except for sink and source sum entering currents equal exiting currents

For each edge, V = I*R = I/G

Conductance matrix M:

Current vector J: Voltage vector V:

Solve M*V=J; use V to determine information flow through each node

1243G2G1G3G4G1+G2-G1-G20

-G1G1+G30-G3

-G20G2+G4-G4

0-G3-G4G3+G4

I1=1

I2=0

I3=0

I4=-1

V1

V2

V3

V4

Missiuro's information flow and protein informativity

Advantage over betweenness degree and edge degree:recovers weak multi-hub regulators

Better at predicting essential genes

Better at predicting genes essential for a specific function (organ development)

Advantage over stocheomtric methods: No need to solve 64k differential equations (unstable!)

Reflects not only metabolism, but also regulation

Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350.

Model creation

Recover targets affected by drugs with a given polypharmacological effect

Compute the information circulation within interactome for these drugs

Include all the targets with a significant informativity=> hidden targets

Retrieve relevant annotation

Not all the GO terms
are equivalent

GO term informativity (~protein info for Missiuro et al.)Expand annotations:T-cell apoptosis regulation T cell + apoptosis +immune system +...

Define term informativity:

Use it to compute the flow through
each term in a pair of proteins:
Informativity = conductance

Compute total informativity within a group as a sum of flows through each term in each pair, decided by targets number squared

Total targets

Targets annotated with a given GO term

Same secondary effect might have distinct mechanisms

Cluster affected targets by their annotation similarity

Compute GO-based information circulation within each cluster and sort GO terms by informativity

Use clusters as additional polypharmacological action models

Fixed tension between sink and sourceEach GO term shared by the sink and the source passes information current

Clustering

Clustering

Advantages of the HSC clusteringNo a priori on the cluster number: depending on the secondary effect there might be 2 possible modes of action, just as 20

Based on the topological features of the network

Disadvantages of the HSC clusteringNaive implemetation twice exponential, NP-complex

Probabilist from required for > 20 targets

Clustering

Bond Energy AlgorithmRapid & simple

Markov clusteringDerived from radom walk (same foundation as Page Rank)

Uses eigenvectors/eigenvalues of a random transition matrix

Rapid and stable, but requires two arbitrarily chosen parameters

GO term informativity advantages

Map to the biological concepts Interpretation by expert biologists => biological sense ?(cf. Potti 2010 scandal at Duke over metagene signature)

Molecular relation databases typically do badly in some cases:Systemic effects (T-cell maturation, circadian rhythm, )Endocrine regulationCentral Nervous System (GO however isn't the best ontology for this)

Ability to plug in additional data from literature analysis (just account for confidence)

Render Bioinformatics 100 prots name vectors disease signatures readable and understandable for biologists: cf. Nature Medecine 2010 retraction scandalComplementarity with pure information circulation methods for the endocrine system: concepts such as increase of blood pressure might be pretty good signals interpreted by cell membranes, but impossible to encode in the conventional interactomes

Implementation: case of pancreatitis and cirrhosis

Sec. Effects from SIDER (EMBL)

Drug-target interaction from Bourne lab and Drugdesigntech simulation results

Group drugs by secondary effect

Filter out targets that are frequently affected in random drug collections (Student T-test)

Pancreatitis: clustering results

Clustering:BEA at UCSD1 major clusterRHOC, NGF,
NTRK1

Cirrhosis: clustering results

Clustering:HSC at Drugdesign tech4 major clusters1 of them (all 4 were informative and relevant) most informative of them:KSYK_HUMAN,
CSK_HUMAN

Quantitative polypharmacological effect prediction

Outline:Compute the information circulation for pharmacological effect specific targets

Measure dicrease of information circulation within the all targets model and the cluster models

Quantitative polypharmacological effect prediction

Outline:Compute the information circulation for pharmacological effect specific targets

Measure dicrease of information circulation within the all targets model and the cluster models

Backbone for the interactome information flow computation

NIC-Nature Pathway Interaction DatabaseNo, too small coverage

Kegg Patwhay database No, pathway-oriented and non-connex for atomic interactions

UnipathwayNo, too small coverage

Reactome.orgYay

Reactome.org : idea

Reactome.org structure: BioPax : xml / RDF / OWL

Physical entities:Proteins, small molecules, Complexes, RNA, DNA

Fragments of physical entities

Interaction:Degradation / polymerisation / Biochemical reactions

Molecular interaction

Genetic interaction

Pathways, Genes, Post-translational modifications...

Reactome.org : reality

Reality of Reactome.org: Main connex element: ~ 22 000 entities, but 3 other with >50 elements

Presence of generic classes : groups of objects

Proteins = mix between proteins, domains, groups, groups of domains

15 000 proteins, 5000 UNIPROT references

156 genes, 56 RNA moleculestranslation / transcription regulation is not well described

Reactome.org: incompleteness

Still incomplete and reliant on comments:Case of SRC=>HiNT database added

Verification of pipeline:
Information routing decay

Image courtesy Wintermute et al. (2010). Emergent cooperation in microbial metabolism. Molecular Systems Biology 6, 407.

Verfication of pipeline
Predicting target drugability

186 oral small-molecule drug targets from Overington's 2006 How many drugs are there?

77 plasma membrane targets

1289 total plasma membrane proteins with Uniprot references in Reactome.org

Use the following to predict drugability:Overall informativityGO-term specific informativityTarget abundance (higher abidance, more off-target action in case of total inhibition)

Valid targets

Non-targets

Drugability prediction with
some complexity

Raw prediction is little better then random:65% specificity, 60% selectivity

However, if we account for:Non-oral, Non small molecule drugs

Drugs developed or in development since 2006

GO-specific informativity

The fact Reactome.org / HiNT are bad in representing CNS functions

The prediction results are rather encouraging:75% specificity, 90% selectivity

Before we can conclude

The methods required for the information circulation have been coded Information circulation for the target set

Calculation of information variation in case of perturbed interactome alteration

However, before this project can be deemed concludedmodel creation and model utilization parts have to be assembled into a single pipeline (right now they are separate)

Run model creation prediction on several secondary effects with random training / testing set validation

Conclusions

GO-based information circulation method seems to work well for secondary effect mechanism retrieval

Reactome.org / HiNT dataset based information circulation method seems to be potentially useful for computationally assisted drug design

Information circulation methods for secondary effects quantitative prediction must be tested before this project can be concluded

Moving further

Finding datasets and people interested in further development of the method:SNP cumulative effectRequires ability to project on the protein 3D structure and estimate protein activity inhibition in different contexts

Drug Design : secondary effect predictionTypical pharmaceutical firms datastores contain way more information about toxicity of different compounds and allow much more finely tuned modeling of pharmacological effects

Difference between animal and human interactomes:Predict unexpected polypharmacological effects upon transition from animal to human trials

Acknowledgements

Pr. Philip BournePr. Bart DeplanckeCedric MerlotLi Xie Spencer BlievenRoland Diggelmann

Andreas PrlicJulia PonomarenkoLilia IakouchevaJiang WangCole ChristieAudrey Schenker

THE END

QUESTIONS?

THE END

QUESTIONS?

Graph databases

Random matrix theory

Method improvement

If time: Improvements

For retrieving statistically significant targets, abandon nave statistical drug target filtering

build drug-specific information flows

recover all sufficiently informative proteins for each drug

use that proteins to get statistically significant targets=> avoids close miss errors

When sorting targets:Sort the most significant GO terms not by their informativity,

but by how much information flow associated to them is perturbed by the given target set=> avoid need to tune GO term informativity=> better interpretability

If time: Improvements

When computing the information flowNot consider the information flow between any pair of proteins as constant

Consider associated tension (voltage) as constant

Unrelated proteins are likely to exchange less information

To avoid information circulation distortion due to GO terms correlation:Don't use Tanimoto distance / conductance model for GO-based term circulation

Use the real point-to-point routing within the GO terms graph

If time 1:

Molecular evolution:Adaptive mutations = survival of the fittestRandom mutations = Kimura's driftTools to separate the two

Protein interaction network evolution:Adaptative topology modificationsRandom topology artefacts phosphorilation pattern modification due to random mutations

Separating the 2=????

Nothing in biology makes sense except in the light of evolution.Theodius Dobjansky

If time 1:

In sparse matrices (~=Graphs):Random matrices have specific eigenvaluesAll eignevalues exceeding these values are non-random

Clustering can later be performed in the space generated by the associated eigenvectors of non-random eigenvalues

If time 2:
Graph Databases

neo4j

Titan DB

If time 2:
Graph Databases

Tinkerpop stack: ~ SQL for Graph databases

If time 3:
Conclusions general

Graph databases are worth a try for systems biology applications

We need to assemble one comprehensive, complete and WELL DOCUMENTED resource for computational systems biology

If time2: Non-scientific lessons learned

Project advancement:Never try to reach one huge goal at once

Split the goal in series of small subgoals, each goal resulting in a publication / conference paper

Large bioinformatics resource maintenance

Mechanisms of inter-laboratory collaborations and PhD student supervision

systems biology in polypharmacology: explaining and predicting drug secondary effects. - master...

Technology