machine learning for prediction of poly-specificity from sequence tushar … · 2018. 7. 17. ·...
TRANSCRIPT
MACHINE LEARNING FOR
PREDICTION OF POLY-SPECIFICITY
FROM SEQUENCE
Tushar Jain
June 27, 2018
2
LESSONS FROM CLINICAL ANTIBODIES
HIGH THROUGHPUT EXPERIMENTAL SURROGATES FOR PK
MACHINE LEARNING METHODOLOGY
PREDICTING POLY-SPECIFICIT Y
CONCLUSIONS
3COPYRIGHT | © 2018 Ad imab, LLC
PROPERTIES OF CLINICAL ANTIBODIES
• Twelve biophysical measurements cluster into related subgroups
• Poly-specificity - PSR, CSI, ACSINS, CIC
• Poly-specificity - BVP, ELISA
• Hydrophobicity - HIC, SMAC, SGAC100
• Stability - Titer, Tm
Jain et al., PNAS, 2017
4COPYRIGHT | © 2018 Ad imab, LLC 4
SOLUBILIZED MEMBRANE PROTEINS AS A POLY-SPECIFICITY
REAGENT (PSR)
Cell ExtractEnriched Membranes
Detergent
Lyse the cells
without detergent
Random biotinylation
Membrane Proteins
Non-target Cell
Cytosolic Proteins
Solubilized Membrane
Proteins (SMP)
Biotinylated SMPs can be used as screening
(one off) or selection (batch) tool
Xu et al., PEDS, 2013
5
LESSONS FROM CLINICAL ANTIBODIES
HIGH THROUGHPUT EXPERIMENTAL SURROGATES FOR PK
MACHINE LEARNING METHODOLOGY
PREDICTING POLY-SPECIFICIT Y
CONCLUSIONS
6COPYRIGHT | © 2018 Ad imab, LLC 6
POLY-SPECIFICITY AS AN INDICATOR FOR POOR PK
Kelly et al., mAbs, 2015Hotzel et al., mAbs, 2012
7COPYRIGHT | © 2018 Ad imab, LLC 7
FCRN BINDING AS A PREDICTOR OF POOR PKB R I A K I N U M A B V S U S T E K I N U M A B
Schoch et al., PNAS, 2015
Jain et al., PNAS, 2017
FcRn retention time(RT) data from Kettenberger et al.
Kelly et al., mAbs, 2016
FcRn knockout
mouse
8COPYRIGHT | © 2018 Ad imab, LLC 8
BEHAVIOR IN SEVERAL ASSAYS CORRELATED WITH
ACCELERATED CLEARANCE
Avery et al., mAbs, 2018
9COPYRIGHT | © 2018 Ad imab, LLC 9
PREDICTION OF 25% BOTTOM MABS IN AN ASSAY USING
ANOTHER MEASUREMENT
Areas under ROC curve
Predict bottom 25% of
PSR using other assay
measurements
Predict bottom 25% of
other assays using PSR
Jain et al., PNAS, 2017
FcRn and Heparin retention time
(RT) data from Kettenberger et al.
10COPYRIGHT | © 2018 Ad imab, LLC 10
PREDICTION OF 25% BOTTOM MABS IN AN ASSAY USING
ANOTHER MEASUREMENT
Areas under ROC curve ROC for predicting bottom 25%
FcRn RT using PSR assay
AUC : 0.85
N = 133
11
LESSONS FROM CLINICAL ANTIBODIES
HIGH THROUGHPUT EXPERIMENTAL SURROGATES FOR PK
MACHINE LEARNING METHODOLOGY
PREDICTING POLY-SPECIFICIT Y
CONCLUSIONS
12COPYRIGHT | © 2018 Ad imab, LLC
MODELS FOR DEVELOPABILITY PREDICTION
INPUT ANTIBODY DATA
SEQUENCE
• Aligned antibody sequences
• Germline information
• CDR lengths, etc
• Amino-acid property scales
• Hydrophobicity
• Size, charge, etc
STRUCTURAL PROPERTIES
• Structural metrics important for developability assay under consideration
• Solvent-accessible surface-area (SASA)
• Residue contact probabilities
• Local flexibility
• Isoelectric point, etc
MACHINE LEARNING ALGORITHMS
• Logistic Regression with LASSO regularization
• Tree-based method: XGBoost
• Feed-forward neural networks
13COPYRIGHT | © 2018 Ad imab, LLC 13
EXAMPLE OF PREDICTING SASA FROM SEQUENCE
Computed fractional SASA from PDBs
Estim
ate
d fra
ctional S
AS
ARMSE = 9.8% 9% 14.6%
8.2% 8.3% 8.9%
Jain et al., Bioinformatics, 2017
Yang et al., mAbs, 2017
14COPYRIGHT | © 2018 Ad imab, LLC
ENCODING ANTIBODY DATA FOR MACHINE LEARNING
H1 H113 A R N D ……….V W Y A R N D …….... V W Y A R N D …….... V W Y
CDR H1
SASA
CDR H2
SASA
CDR H3
SASAAligned
sequences
0.1 3.2 0.4 2.2 1.9 4.1 2.2
0.0 1.0 2.4 0.2 1.7 2.1 6.2
0.4 3.5 1.4 1.2 0.9 1.1 1.2
0.1 0.2 3.1 2.2 1.2 2.0 4.2
1.1 1.2 3.4 3.2 1.7 2.1 6.2
1.0 0.1 4.4 1.2 0.0 0.1 2.2
+ LC information
CDR
lengthsVHF
10 17 14
12 17 11
VH1
VH4
HC information
0
1
Desirable?
15
LESSONS FROM CLINICAL ANTIBODIES
HIGH THROUGHPUT EXPERIMENTAL SURROGATES FOR PK
MACHINE LEARNING METHODOLOGY
PREDICTING POLY-SPECIFICITY
CONCLUSIONS
16COPYRIGHT | © 2018 Ad imab, LLC 16
PREDICTION OF ANTIBODIES WITH POOR PSR SCORES
• Different machine learning methods perform comparably, though XGBoost is
slightly better
• Simpler models using only aggregated SASA in CDRs by amino-acid type
show reasonable performance when compared to models with full sequence
information
• Logistic regression on the SASA models enables assessment of amino
acid propensities for poly-specificity
Experimental PSR data
• ~30000 antibodies with
~13000 distinct H3s
• Training and test splits done
on the basis of H3s
Area under ROC curve10-fold cross-validation
ModelPSR Score >0.1
XGBoostLogistic
Regression
Neural
Network
Sequence 0.74 0.74 0.74
Sequence
+ AA SASA
per CDR
0.77 0.74 0.76
AA SASA
per CDR0.76 0.72 0.72
17COPYRIGHT | © 2018 Ad imab, LLC 17
AMINO-ACID COEFFICIENTS FROM LOGISTIC REGRESSION FOR
PREDICTION OF PSR>0.1
Aromatic and positively-charged amino-acids show propensity for poor PSR
18COPYRIGHT | © 2018 Ad imab, LLC 18
ELECTROSTATIC POTENTIAL MAPPED ONTO MAB SURFACE
basiliximabbococizumab guselkumab
gevokizumab ibalizumabranibizumab
High
PSR
Low
PSR
APBS electrostatics
Large positive patches seen in mAbs showing binding to PSR
19COPYRIGHT | © 2018 Ad imab, LLC 19
CONCLUSIONS
• Training on known crystal structures enables prediction of structural metrics from sequence
• Machine learning methods can successfully predict, from sequence, antibodies exhibiting poor behavior in these assays
• Cross-validation AUCs for PSR assay is 0.72 - 0.77
• Amino-acid propensities determined for PSR correlate with observations from other studies
• Predictions from sequence enable:
• Rapid predictions on millions of sequences to help design libraries enriched in the desired biophysical properties
• Improving lead clones, since determined amino-acid coefficients can identify individual positions that contribute to unfavorable developability
THANK YOU
21COPYRIGHT | © 2018 Ad imab, LLC 21
LEARNING STRUCTURAL PROPERTIES FROM SEQUENCE
For each position i along the sequence,
where, Pi = structural property of amino-acid i
e.g. SASA
aai = amino-acid type at i,
aan1…nN = amino-acid types at N neighbors,
VHF, VLF = heavy and light chain germline family
CDR lengths
𝑃𝑖 = 𝑓 𝑎𝑎𝑖, 𝑎𝑎𝑛1,…,𝑎𝑎𝑛𝑁, 𝑉𝐻𝐹, 𝑉𝐿𝐹, 𝐶𝐷𝑅 𝑙𝑒𝑛𝑔𝑡ℎ𝑠
Global
LocalSequence
information
Train models using a database of ~1200 antibody structures curated from the PDB