multi target bioactivity models in pipeline pilot

Post on 16-Apr-2017

541 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Multi-target bioactivity models in Pipeline PilotUsing ligand and target information

Gerard JP van Westen

Pipeline Pilot UGM (17-1-2013)

Cool things to do with PP•Multi-target bioactivity models▫ The why…▫ The how…▫ The results… (time permitting)

The why.. a target is never alone…•Drug targets often have similar paralogs ▫Selectivity is required

• Viral targets often mutate leading to resistance▫Broad activity is required

•Non-similar proteins have been shown to share ligands▫E.g. acetylcholine and serotonin

Molecular Similarity

Efavirenz, EFV (NNRTI)

Emtricitabine, FTC(NRTI)

Lamivudine, 3TC (NRTI)

2

Molecular Similarity

Emtricitabine, FTC(NRTI)

1.0 0.9 0.3

0.9 1.0 0.4

0.3 0.4 1.0

2

2

Sequence Similarity

Emtricitabine, FTC(NRTI)

1.0 0.9 0.3

0.9 1.0 0.4

0.3 0.4 1.0

Phenylalanine

Tyrosine

Arginine

Sequence Similarity

Emtricitabine, FTC(NRTI)

1.0 0.9 0.3

0.9 1.0 0.4

0.3 0.4 1.0

FYI

IYF

WTF

FYI IYF WTF

The how… what is PCM ?• Proteochemometric modeling combines both a ligand

descriptor and target descriptor

GJP van Westen, JK Wegner et al. MedChemComm (2011),16-30, 10.1039/C0MD00165A

What is PCM ?• Proteochemometric modeling combines both a ligand

descriptor and target descriptor

GJP van Westen, JK Wegner et al. MedChemComm (2011),16-30, 10.1039/C0MD00165A

Bio-Informatics

What is PCM ?• Proteochemometric modeling combines both a ligand

descriptor and target descriptor

Bio-Informatics

GJP van Westen, JK Wegner et al. MedChemComm (2011),16-30, 10.1039/C0MD00165A

PCM using Pipeline Pilot• For this work we use mostly: ▫Chemistry (circular fingerprints) ▫Data modeling ▫R statistics components (machine learning)

• Lacking was a protein descriptor type component…

• (In addition I missed some validation components…)▫Matthews Correlation Coefficient▫R2 to a line through the origin (R2 zero)

Target descriptors• Simple way to derive protein descriptors

1. Select the binding pocket 2. Align the relevant residues3. Convert to physicochemical properties

Target descriptors

•PP component can create different protein descriptors1. ProtFP Feature: J. Med. Chem. 2012, 55, 7010-7020 ; BMC

Bioinformatics 2012, Submitted2. ProtFP PCA: BMC Bioinformatics 2012, Submitted3. Z-Scales : J. Med. Chem. 1998, 41, 2481-2491 4. VHSE : Biopolymers 2005, 80, 775-7865. ST-Scales : Amino Acids 2010, 38, 805-8166. T-Scales : J. Mol. Struct. 2007, 830, 106-1157. MS-WHIM J. Chem. Inf. Comp. Sci. 1999 39, 525-5338. FASGAI : Eur. J. Med. Chem. 2009, 44, 1144-11549. Blosum62 : J. Comp. Biol. 2009, 16, 5, 703-723

Target Descriptors

Revised version of paper to be submitted

Visualized in PP

Visualized in PP

41

2

3

5

Feature Based

Visualized in PP

41

2

3

5

Feature Based

4

53

21

Physicochemical Properties

•The example is using Z-scales by Sandberg et al.

•Uses a PCA to derive 5 principal components that describe amino acid similarity ▫Based on side chain physicochemical properties

•We use first 3▫1 – Lipophilicity▫2 – Size▫3 – Charge / Polarity

M Sandberg, L Eriksson J Med Chem (1998) 41: 2481 - 2491

Target Descriptors

• Dataset Provide by Tibotec and Virco• Antivirogram® assay• Patient data• Reverse Transcriptase and Protease sequences• Fold Change in –logIC50

Target Amino acids Binding Site Drug Class Drugs Mutant Sequences Data points

Reverse Transcriptase 400* Orthosteric NRTI 8 10,501 72,727

Reverse Transcriptase 400* Allosteric NNRTI 4 10,723 35,249

Protease 99 Orthosteric PI 9 27,081 180,162

Example Data set

GJP van Westen, A Hendriks et al. PLoS Comp Biol (2013) Accepted / In press

Example Data set

Methods

Results

• What is important to our models?• What residue position?• What mutation is present at that position?• How much is contributed to resistance?

• Bioactivity spectra can be obtained from these models

Feature Importance

Feature Importance

• Currently we have applied the technique using PP to:• Adenosine receptors (human + rat)• HIV inhibitors (preclinical lead optimization)• HIV inhibitors (clinical drugs) • OATP1 inhibitors• Aminergic GPCRs• …

Data sets

Acknowledgements

• Ad IJzerman• Andreas Bender• Alwin Hendriks

•Herman van Vlijmen• Joerg Wegner• Anik Peeters

• John Overington•George Papadatos

Multi-target bioactivity models in Pipeline PilotUsing ligand and target information

Gerard JP van Westenwww.gjpvanwesten.nl

Pipeline Pilot UGM (17-1-2013)

Model validation (classification)• PP lacked a component to calculate correlation

coefficients between two properties in the data stream in (binary) classification.

Model validation (regression)• PP lacked a component to calculate correlation

coefficients between two properties in the data stream in regression. (R2 zero, etc)

A. Tropsha; Predictive Quantitative Structure-Activity Relationships Modeling; in Handbook of Chemoinformatics Algorithms (2010) J. Faulon and A. Bender; Editors.

Ligand Descriptors• Scitegic Circular Fingerprints▫Circular, substructure based

fingerprints ▫Maximal radius of 3 bonds from

central atom▫Each substructure is converted to a

molecular feature

Carbon

Oxygen

Substructure

FingerprintsCarbon

Oxygen

Substructure

Fingerprints

CC

C

Carbon

Oxygen

Substructure

Fingerprints

CAA

A

CC

C

N

OC

C

Carbon

Oxygen

Substructure

Fingerprints

CAA

A

CC

C

C

N

O

Carbon

Oxygen

Substructure

top related