machine learning for prediction of poly-specificity from sequence tushar … · 2018. 7. 17. ·...

21
MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar Jain June 27, 2018

Upload: others

Post on 21-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

MACHINE LEARNING FOR

PREDICTION OF POLY-SPECIFICITY

FROM SEQUENCE

Tushar Jain

June 27, 2018

Page 2: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

2

LESSONS FROM CLINICAL ANTIBODIES

HIGH THROUGHPUT EXPERIMENTAL SURROGATES FOR PK

MACHINE LEARNING METHODOLOGY

PREDICTING POLY-SPECIFICIT Y

CONCLUSIONS

Page 3: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

3COPYRIGHT | © 2018 Ad imab, LLC

PROPERTIES OF CLINICAL ANTIBODIES

• Twelve biophysical measurements cluster into related subgroups

• Poly-specificity - PSR, CSI, ACSINS, CIC

• Poly-specificity - BVP, ELISA

• Hydrophobicity - HIC, SMAC, SGAC100

• Stability - Titer, Tm

Jain et al., PNAS, 2017

Page 4: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

4COPYRIGHT | © 2018 Ad imab, LLC 4

SOLUBILIZED MEMBRANE PROTEINS AS A POLY-SPECIFICITY

REAGENT (PSR)

Cell ExtractEnriched Membranes

Detergent

Lyse the cells

without detergent

Random biotinylation

Membrane Proteins

Non-target Cell

Cytosolic Proteins

Solubilized Membrane

Proteins (SMP)

Biotinylated SMPs can be used as screening

(one off) or selection (batch) tool

Xu et al., PEDS, 2013

Page 5: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

5

LESSONS FROM CLINICAL ANTIBODIES

HIGH THROUGHPUT EXPERIMENTAL SURROGATES FOR PK

MACHINE LEARNING METHODOLOGY

PREDICTING POLY-SPECIFICIT Y

CONCLUSIONS

Page 6: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

6COPYRIGHT | © 2018 Ad imab, LLC 6

POLY-SPECIFICITY AS AN INDICATOR FOR POOR PK

Kelly et al., mAbs, 2015Hotzel et al., mAbs, 2012

Page 7: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

7COPYRIGHT | © 2018 Ad imab, LLC 7

FCRN BINDING AS A PREDICTOR OF POOR PKB R I A K I N U M A B V S U S T E K I N U M A B

Schoch et al., PNAS, 2015

Jain et al., PNAS, 2017

FcRn retention time(RT) data from Kettenberger et al.

Kelly et al., mAbs, 2016

FcRn knockout

mouse

Page 8: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

8COPYRIGHT | © 2018 Ad imab, LLC 8

BEHAVIOR IN SEVERAL ASSAYS CORRELATED WITH

ACCELERATED CLEARANCE

Avery et al., mAbs, 2018

Page 9: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

9COPYRIGHT | © 2018 Ad imab, LLC 9

PREDICTION OF 25% BOTTOM MABS IN AN ASSAY USING

ANOTHER MEASUREMENT

Areas under ROC curve

Predict bottom 25% of

PSR using other assay

measurements

Predict bottom 25% of

other assays using PSR

Jain et al., PNAS, 2017

FcRn and Heparin retention time

(RT) data from Kettenberger et al.

Page 10: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

10COPYRIGHT | © 2018 Ad imab, LLC 10

PREDICTION OF 25% BOTTOM MABS IN AN ASSAY USING

ANOTHER MEASUREMENT

Areas under ROC curve ROC for predicting bottom 25%

FcRn RT using PSR assay

AUC : 0.85

N = 133

Page 11: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

11

LESSONS FROM CLINICAL ANTIBODIES

HIGH THROUGHPUT EXPERIMENTAL SURROGATES FOR PK

MACHINE LEARNING METHODOLOGY

PREDICTING POLY-SPECIFICIT Y

CONCLUSIONS

Page 12: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

12COPYRIGHT | © 2018 Ad imab, LLC

MODELS FOR DEVELOPABILITY PREDICTION

INPUT ANTIBODY DATA

SEQUENCE

• Aligned antibody sequences

• Germline information

• CDR lengths, etc

• Amino-acid property scales

• Hydrophobicity

• Size, charge, etc

STRUCTURAL PROPERTIES

• Structural metrics important for developability assay under consideration

• Solvent-accessible surface-area (SASA)

• Residue contact probabilities

• Local flexibility

• Isoelectric point, etc

MACHINE LEARNING ALGORITHMS

• Logistic Regression with LASSO regularization

• Tree-based method: XGBoost

• Feed-forward neural networks

Page 13: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

13COPYRIGHT | © 2018 Ad imab, LLC 13

EXAMPLE OF PREDICTING SASA FROM SEQUENCE

Computed fractional SASA from PDBs

Estim

ate

d fra

ctional S

AS

ARMSE = 9.8% 9% 14.6%

8.2% 8.3% 8.9%

Jain et al., Bioinformatics, 2017

Yang et al., mAbs, 2017

Page 14: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

14COPYRIGHT | © 2018 Ad imab, LLC

ENCODING ANTIBODY DATA FOR MACHINE LEARNING

H1 H113 A R N D ……….V W Y A R N D …….... V W Y A R N D …….... V W Y

CDR H1

SASA

CDR H2

SASA

CDR H3

SASAAligned

sequences

0.1 3.2 0.4 2.2 1.9 4.1 2.2

0.0 1.0 2.4 0.2 1.7 2.1 6.2

0.4 3.5 1.4 1.2 0.9 1.1 1.2

0.1 0.2 3.1 2.2 1.2 2.0 4.2

1.1 1.2 3.4 3.2 1.7 2.1 6.2

1.0 0.1 4.4 1.2 0.0 0.1 2.2

+ LC information

CDR

lengthsVHF

10 17 14

12 17 11

VH1

VH4

HC information

0

1

Desirable?

Page 15: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

15

LESSONS FROM CLINICAL ANTIBODIES

HIGH THROUGHPUT EXPERIMENTAL SURROGATES FOR PK

MACHINE LEARNING METHODOLOGY

PREDICTING POLY-SPECIFICITY

CONCLUSIONS

Page 16: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

16COPYRIGHT | © 2018 Ad imab, LLC 16

PREDICTION OF ANTIBODIES WITH POOR PSR SCORES

• Different machine learning methods perform comparably, though XGBoost is

slightly better

• Simpler models using only aggregated SASA in CDRs by amino-acid type

show reasonable performance when compared to models with full sequence

information

• Logistic regression on the SASA models enables assessment of amino

acid propensities for poly-specificity

Experimental PSR data

• ~30000 antibodies with

~13000 distinct H3s

• Training and test splits done

on the basis of H3s

Area under ROC curve10-fold cross-validation

ModelPSR Score >0.1

XGBoostLogistic

Regression

Neural

Network

Sequence 0.74 0.74 0.74

Sequence

+ AA SASA

per CDR

0.77 0.74 0.76

AA SASA

per CDR0.76 0.72 0.72

Page 17: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

17COPYRIGHT | © 2018 Ad imab, LLC 17

AMINO-ACID COEFFICIENTS FROM LOGISTIC REGRESSION FOR

PREDICTION OF PSR>0.1

Aromatic and positively-charged amino-acids show propensity for poor PSR

Page 18: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

18COPYRIGHT | © 2018 Ad imab, LLC 18

ELECTROSTATIC POTENTIAL MAPPED ONTO MAB SURFACE

basiliximabbococizumab guselkumab

gevokizumab ibalizumabranibizumab

High

PSR

Low

PSR

APBS electrostatics

Large positive patches seen in mAbs showing binding to PSR

Page 19: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

19COPYRIGHT | © 2018 Ad imab, LLC 19

CONCLUSIONS

• Training on known crystal structures enables prediction of structural metrics from sequence

• Machine learning methods can successfully predict, from sequence, antibodies exhibiting poor behavior in these assays

• Cross-validation AUCs for PSR assay is 0.72 - 0.77

• Amino-acid propensities determined for PSR correlate with observations from other studies

• Predictions from sequence enable:

• Rapid predictions on millions of sequences to help design libraries enriched in the desired biophysical properties

• Improving lead clones, since determined amino-acid coefficients can identify individual positions that contribute to unfavorable developability

Page 20: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

THANK YOU

Page 21: MACHINE LEARNING FOR PREDICTION OF POLY-SPECIFICITY FROM SEQUENCE Tushar … · 2018. 7. 17. · machine learning for prediction of poly-specificity from sequence tushar jain june

21COPYRIGHT | © 2018 Ad imab, LLC 21

LEARNING STRUCTURAL PROPERTIES FROM SEQUENCE

For each position i along the sequence,

where, Pi = structural property of amino-acid i

e.g. SASA

aai = amino-acid type at i,

aan1…nN = amino-acid types at N neighbors,

VHF, VLF = heavy and light chain germline family

CDR lengths

𝑃𝑖 = 𝑓 𝑎𝑎𝑖, 𝑎𝑎𝑛1,…,𝑎𝑎𝑛𝑁, 𝑉𝐻𝐹, 𝑉𝐿𝐹, 𝐶𝐷𝑅 𝑙𝑒𝑛𝑔𝑡ℎ𝑠

Global

LocalSequence

information

Train models using a database of ~1200 antibody structures curated from the PDB