physics based protein ionization and pk estimation in...
TRANSCRIPT
Physics Based Protein Ionization and pK Estimation in
Discovery StudioDiscovery Studio®®
June 19, 2008
Francisco Hernandez-Guzman, Ph.D.Lead Solutions [email protected]
Start time: 7am PST and 10am PST
This presentation and/or any related documents contains statements regarding our plans or expectations for future features, enhancements or functionalities of current or future products (collectively "Enhancements"). Our plans or expectations are subject to change at any time at our discretion. Accordingly, Accelrys is making no representation, undertaking no commitment or legal obligation to create, develop or license any product or Enhancements. The presentation, documents or any related statements are not intended to, nor shall, create any legal obligation upon Accelrys, and shall not be relied upon in purchasing any product. Any such obligation shall only result from a written agreement executed by both parties. In addition, information disclosed in this presentation and related documents, whether oral or written, is confidential or proprietary information of Accelrys. It shall be used only for the purpose of furthering our business relationship, and shall not be disclosed to third parties.
© 2008 Accelrys, Inc. 2
Agenda:
• What is Discovery Studio?• Protein ionization and pK prediction
– Why do we care about accurate pK prediction? • General methodology
– Overall workflow for pK prediction• System preparation• Parameter and topology definition
– General Validation– Application areas
• Demo– Setting up a pK prediction job– Analysis of the results
• Validation test case OMTKY3
• Q&A
© 2008 Accelrys, Inc. 3
What is Discovery Studio?
• Discovery Studio is a unified Life Science integrated environment to perform molecular modeling and simulations calculations such as:
– In silico Structure Based Drug Design– Rational Drug Design– Protein Modeling– Sequence Analysis– Molecular Dynamics Simulations– X-ray crystallography refinement
• Discovery Studio is tightly integrated to the Pipeline Pilot platform technology for superior technology development and customization
Discovery Studio Client
Discovery Studio Client ISV
Client
ISV Client
PP Web Client
PP Web Client PP Editor
Client
PP Editor Client
Statistics Chemistry Biology ISV CollectionMat SciReportingIntegration
Workflow PlatformWorkflow Platform
Scientific Workflow Solutions 3rdParty Applications
ScientificApplications
(Pipeline Pilot)
Discovery Studio®
Pipeline Pilot
© 2008 Accelrys, Inc. 4
Protein ionization and pK prediction
Why do we care about accurate pK prediction?General Background:• Titratable residues: exist in protonated and
deprotonated forms
• Protonation state can be predicted from a titration curve (pK)
• Electrostatic properties of proteins change as a function of pH
Scientific implications in:
Electrostatic component of protein-ligand, protein-protein binding energy
pH dependent folding stability
Unusual titration curves functional relevant residues
Stable MD simulations
Site directed mutagenesis studies
pI estimation for crystallography
Example of titratable groups in proteins
Deprotonated Protonated Deprotonated Protonated
Arg Lys
Asp
N-terCys
Glu
Tyr His
C-ter
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6 8 10 12 14
*:HIS26*:HIS95*:HIS100*:HIS115*:HIS185*:HIS195*:HIS224*:HIS248
His 95
© 2008 Accelrys, Inc. 5
Protein ionization and pK prediction method
A B C
o o o
+ o o
o + o
o o +
+ + o
+ o +
o + +
+ + +
• Exp. pK determination of penta-peptides model compounds
N-terminal –Ala- Ala – X – Ala – Ala - C-terminal
X = Asp, Glu, Arg, Lys, His, Tyr, or Lys
• N titratable sites 2N titration states (E.g. 3 sites => 8 titration states)
• Consider interaction of the titratable sites GBORN energy cutoff– Interaction of remaining titratable sites mean-field potential1
• Iterate through all N sites & calculate partial protonation until converge1
Titratable site
Other sites within cluster treated with full Boltzmann statistics
Sites outside clusterApproximated by mean-field potential
1. Spassov, V.Z. and Bashford, D. J. Comput. Chem. 1999, 20, 1091-1111
© 2008 Accelrys, Inc. 6
Protein ionization and pK prediction method (cnt’d)
• New protocol to “Calculate Protein Ionization and pK” a.k.a. GBpK
– Based on CHARMm Generalized-Born methods
– Creates titration curves and pK estimates for residues using 3D environment of protein structure
– Automatically sets the protonation state of each residue at given pH, based on pK
– Calculates pH dependent electrostatic energy
– Calculates pH dependent folding energy
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6 8 10 12 14
*:GLU23*:GLU38*:GLU77*:GLU97*:GLU104*:GLU107*:GLU119*:GLU129*:GLU133*:GLU135*:GLU140*:GLU145*:GLU165*:GLU183*:GLU186*:GLU219*:GLU239
Reference including validation: Spassov. et al, 2008, manuscript submitted (available upon request)
© 2008 Accelrys, Inc. 7
pK prediction of selected Proteins
• Comparison of experimental pK1/2 with calculated values for select PDB files
• All computations about 1 minute per system on a single CPU
PDB code System Resolution
Å Sites GBpK Polar H
GBpKAll H
4pti 0.36
0.57
0.68
0.71
0.57
0.35
0.53
0.94
0.70
0.65
0.59
0.53
0.27
0.99
206
0.60
2lzt
2rn2
3rn3
1pga
3icb
1hng
1a2p
1omu
9rnt
3rnt
1bi6
1bi6
1rgg
PROPKA* MCCE**
(ε = 8)
1 1.50
1.97
1.48
1.45
2.07
2.30
2.80
1.50
50 mods
1.50
1.80
1 mod
1 mod
1.20
Trypsin 14 0.36 0.6 0.47
2 Lysozyme 21 0.45 0.66 0.74
3 Ribonuclease H (E.Coli) 25 0.59 0.72 0.87
4 Robonuclease A (Bovine) 16 0.47 0.67 0.66
5 Streptococcal Prot. G B1 binding domain - 15 0.50 0.72 0.63
6 Calcium binding protein (bovine) - 10 0.33 0.9 0.38
7 Cell adhesion molecule CD2 - 14 0.55 0.83 0.76
8 Barnase 14 0.86 0.68 0.89
9 OMTKY3 15 0.64 0.44 1.10
10 Ribonuclease T1 - 15 0.54 1.51
Ribonuclease T1 - 4 0.28 0.54
11 Bromelain Inhibitor IV - heavy chain 18 0.54 0.56
12 Bromelain Inhibitor IV -light chain 4 0.18 0.38
13 HydrolaseGuanyloribonuclease 24 0.84 0.97
Total sites: 206 206 184
Average RMSD 0.52 0.74 0.72
* Li et al. 2005 Proteins, 61, 704-721.** Georgescu et al. 2002 Biophysical Journal, 83, 1731-1748.
RMSD = SQRT{SUM [ [ pKexp(i)- pKcalc(i) ]^2] /Nsites]}Sum is over all residues for which experimental pK values are known
0.520.60
0.74 0.72
00.10.20.30.40.50.60.70.80.9
1
RMSD Comparison of pK prediction
GBpK (polar H)
GBpK (all H)
PROPKA
MCCE
© 2008 Accelrys, Inc. 8
Lysozyme Validation1
pKhalf
Residue Experimental CHARMm polar H
CHARMm
NTR1 7.9 7.81 8.00 LYS1 10.6 10.01 10.01 GLU7 2.9 3.17 3.39 LYS13 10.3 10.49 10.56 HIS15 5.4 6.20 5.87 ASP18 2.7 2.87 3.11 TYR20 10.3 10.85 11.18 TYR23 9.8 10.16 10.87 LYS33 10.4 10.58 10.79 GLU35 6.2 5.05 5.90 ASP48 2.5 2.96 2.91 ASP52 3.7 4.32 4.67 TYR53 >12 11.71 >12 ASP66 <2.0 2.15 2.87 ASP87 2.1 2.43 2.97 LYS96 10.7 11.18 11.42 LYS97 10.1 10.79 10.85 ASP101 4.1 3.89 3.92 LYS116 10.2 10.12 10.09 ASP119 3.2 3.08 3.28 CTR129 2.8 2.73 2.83 RMSD 0.45 0.57
Comparison of the computed pKhalf values with the experimental pKa values of the acidic and basicgroups of hen-egg white lysozyme (PDB ID 2lzt). Experimental data from Kuramitsu and
Hamaguchi,1980 and Bratik et al., 1994.
1. Results are from “A Fast and Accurate Computational Approach to Protein Ionization”, Spassov et al, submitted
© 2008 Accelrys, Inc. 9
Lysozyme and HIV-1 protease Validation1
-2
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12pH
Tota
l cha
rge
The pH-dependence of lysozyme total charge at different ionic strengths. Open circles and solid line: experiment and computed values at 0.1 M ionic strength; filled circles and broken line: the same at 1.0 M ionic strength.
12
13
14
15
16
0 2 4 6 8 10pH
Ene
rgy
[kca
l/mol
]The pH dependent contribution to the binding energy of KNI-272 inhibitor to HIV-1 protease, compared to the experimental values of association constant Ka taken from Velazquez-Campoy et al.2. The solid line represents the computed values of -∆∆Gbind . The triangles correspond to the values of 2.303RTlogKa
Lysozyme HIV-1 protease
1. Results are from “A Fast and Accurate Computational Approach to Protein Ionization”, Spassov et al, submitted2. Protein Sci. 2000 Sep;9(9):1801-9
© 2008 Accelrys, Inc. 10
Comparative Validation1
Ngr
GB/IMC
MCCE Const. pH FD/DH SCP PROPKA
4pti 14 0.36 0.47 NA 0.35 0.33 0.6 2lzt 21 0.45 0.76 0.6 0.47 0.49 0.66 2rn2 25 0.59 0.87 0.9 1.17 0.57 0.72 3rn3 16 0.44 0.66 1.2 0.87 0.55 0.94 1pga 15 0.42 0.63 NA 0.80 0.59 0.72 3icb 10 0.33 0.38 NA 0.37 0.39 0.9 3rnt 4 0.28 0.54 NA NA 0.41 NA
average 0.41 0.63 - 0.67 0.49 0.76
MCCE: Georgescu et al. 2002;Const. pH: Khandogin and Brooks,2006; FD/DH: Warwiker,2004; SCP: Mehler,1999; PROPKA:Li et al., 2005.
comparison of the accuracy of Accelrys pK predictions with other methods.
0
1
2
3
4
5
6
0 100 200 300 400 500 600 700 800
residues
Tim
e [m
in]
The CPU time used to calculate the ionization of proteins with different chain lengths. The data were generated on an
Intel Pentium4 3.0 GHz machine
MCCE – FDPB based method with Monte Carlo
Const. pH – Molecular Dynamics
FD/DH – Finite Difference/Debye-Hückel interactions
SCP – Sigmoidally Screened Columb potentials
PROPKA – Empirical relationships of position and chemistry of residues closed to pK sites
1. Results are from “A Fast and Accurate Computational Approach to Protein Ionization”, Spassov et al, submitted
© 2008 Accelrys, Inc. 11
Application 1: Accurate Protein-Ligand Binding Energies
• Estimate the electrostatic contribution to binding energy for protein-ligand and protein-protein docking– Protonation states of proteins may differ in docked
and undocked form• This method predicts protonation states
accurately taking into account of the local environment changes upon ligand docking
– Faster than DelPhi methods• Only two calculations required
Ligand
Fast and accurate electrostatic component of binding energy due to consideration of proper
protonation states
Fast and accurate electrostatic component of binding energy due to consideration of proper
protonation states
pH-Dependent binding energy of HIV protease
© 2008 Accelrys, Inc. 12
Application 2: pH Dependent Folding Energy
• Calculation done for HIV protease
• Optimal pH for protein stability– Experimental ~ 5– Predicted ~ 4.6
• Folding free energy change from pH 3.5 to 5– ∆∆Gcalc(pH=3.4->5.0) = 3.5
kcal/mol– ∆∆Gexp (pH=3.4->5.0) ~ 4.5
kcal/mol
0
5
10
15
20
25
30
35
40
0 1 2 3 4 5 6 7 8 9 10 11 12
pH
∆∆
Gfo
ld
[kca
l/mol
]
Ile50His
wild type
folded
unfolded
Optimal pH for stability critical for all biochemical assays
Optimal pH for stability critical for all biochemical assays
© 2008 Accelrys, Inc. 13
Application 2: pH Dependent Folding Energy
• Calculation done for HIV protease
• Optimal pH for protein stability– Experimental ~ 5– Predicted ~ 4.6
• Folding free energy change from pH 3.5 to 5– ∆∆Gcalc(pH=3.4->5.0) = 3.5
kcal/mol– ∆∆Gexp (pH=3.4->5.0) ~ 4.5
kcal/mol
0
5
10
15
20
25
30
35
40
0 1 2 3 4 5 6 7 8 9 10 11 12
pH
∆∆
Gfo
ld
[kca
l/mol
]
Ile50His
wild type
folded
unfolded
Optimal pH for stability critical for all biochemical assays
Optimal pH for stability critical for all biochemical assays
© 2008 Accelrys, Inc. 14
Application 3: Active Site Residue prediction
• Predict potential active sites based on analysis of titration curve for specific residues
– A cluster of two or more perturbed residues in physical proximity is a reliable predictor of active site location1
• pK prediction experiment with Triose Isomerase (TIM) structure (PDB ID 1tph) – (demo)
1. MJ. Ondrechen, JG. Clifton, and D. Ringe, PNAS, 98: 12473-12478 (2001)
© 2008 Accelrys, Inc. 15
Protein Ionization and pK prediction
Overall workflow for pK prediction1. System preparation
– Extremely rare to find experimentally determined protein structures ready for simulations calculations (main reason – missing hydrogens)
– Protein Report useful to find out what is “missing”– Automatic cleaning available
• Need to optimize newly built side-chains– Parameter and topology definition – “Typing”
• Addition of hydrogens• Atom type assignment
2. Experiment options and submission– Protein dielectric constant– Ionic strength
© 2008 Accelrys, Inc. 16
DEMO
– Setting up a pK prediction job• Report, cleaning and typing
– Analysis of the results• Titration plots• 1tph example
© 2008 Accelrys, Inc. 17
Validation Case Study
• Turkey Ovomucoid Third Domain (OMTKY3) is a small natural serine protease inhibitor.
– Inhibits: bovine chymotrypsins A& B, porcine pancreatic elastase I, protease K, Streptomyces griseus protease A & B and substilisins BPN’ & Carlsberg.(Biochemistry 1985, 24, 53-13 – 5320)
– Well characterized structure (56 residues, 814 atoms)(PNAS 1982, 79, 4868 – 4872)(Protein Science 1995, 4, 2289 – 2299)
– Widely studied small protein for pKa studies, including QM/MM methods. (J. Phys. Chem. B 2002, 106(13), 3486 – 3494)
– Lots of experimentally available pKa data for most of its ionizable residues (PPD Protein pKa Database). Very stable at high pH.
© 2008 Accelrys, Inc. 18
Accuracy - Experimental vs Calculated (GBpK)
• Experimental pKa data available for 1omu.pdb (NMR structure – 50models)
Experimental
N-term
ASP 7
GLU 10
TYR 11
LYS 13
GLU 19
TYR 20
ARG 21
ASP 27
LYS 29
TYR 31
LYS 34
GLU 43
HIS 52
LYS 55
CYS 56
Calculated (CHARMm) – model 01
0.015M 0.2M 0.5M diff 0.5M diff
7.74
1.45
0.81
1.48
0.21
0.04
0.02
3.31
4.27
11.58
10.68
4.13
9.62
12.53
3.85
11.12
11.36
10.16
4.55
6.83
10.77
3.54
0.90
0.73
10.13
9.87
11.1
-
10.91
10.12
10.79
-
12.5
7.5
0.015M diff 0.2M
7.96 7.70 0.26 7.71
2.7 2.73 0.03 3.18
4.2 3.91 0.29 4.18
10.16 12.45 2.29 11.80
9.86 11.05 1.19 10.74
3.2 3.57 0.37 3.98
11.07 9.70 1.37 9.62
- 13.29 12.70
2.3 3.36 1.06 3.73
11.12 11.76 0.64 11.28
12.26 11.60
10.13 10.23 0.10 10.12
4.8 4.52 0.28 4.58
6.65 6.77
11.1 11.17 0.07 10.86
2.5 2.92 0.42 3.39
Absolute difference in prediction(GBpK all H)
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
titratable residues
Bold indicates predictions >1.5 pH units from experiment.
© 2008 Accelrys, Inc. 19
Accuracy - Experimental vs Calculated (GBpK) cnt’d
Exp to Calc pKa prediction (0.015M) - model 01CHARMm all H
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
0 2 4 6 8 10 12
Experimental
Cal
cula
ted
1omu.pdb (model 01/50)
R = 0.973
R2 = 0.947
Slope = 1.006Y-intercept = 0.261
RMSD = 0.90 pH unitsObservation:Higher variation amongst basic groups…
© 2008 Accelrys, Inc. 20
Accuracy - Experimental vs Calculated (GBpK) cnt’d
1omu.pdb (model 01/50)
R = 0.985
R2 = 0.971
Slope = 1.017Y-intercept = 0.046
RMSD = 0.66 pH unitsObservation:Better correlation with CHARMm polar H
Exp to Calc pKa prediction (0.015M) - model 01CHARMm - polar-H
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
0 2 4 6 8 10 12Experimental
Cal
cula
ted
© 2008 Accelrys, Inc. 21
Accuracy - Experimental vs Calculated (GBpK) cnt’d
• Other interesting observations 1omu.pdb (model 01):
– Significant pKa shifts: (>1)
Residue: Standard pKa*
Experimental
pKa (15mM)
Predicted
Forsyth, et al. 1998
GBpK (CHARMm H)
3.3 2.73
3.57
3.366.77
2.83.46.2
Asp 7 3.67 2.7 2.63
Asp 27 3.67 2.3 3.13
Glu 19
His 52
GBpK (CHARMm
polar-H)
4.25 3.2 3.52
6.54 7.5 6.42
GBpK (full H and polar-H) predicts 3/4 correctly within 1 pH unit!!!
* Updated values according to Thurlkill, et al (2006)
© 2008 Accelrys, Inc. 22
LYS 55 – QM/MM vs GBpK (model 01)
• Experimental value for LYS 55 = 11.10
• QM/MM calculated value = 11.00
(model 01)
10 days computation time (4GB, 3/4 nodes RS/6000 44P 270 workstation)
• GBpK calculated value (model 01): CHARMm full H = 11.17
CHARMm polar H = 11.24
~30 secs/run computation time (2GB, 1 node, 2.8Ghz, Win XP IBM workstation)
© 2008 Accelrys, Inc. 23
LYS 55 –GBpK (CHARMm, all models)
• Experimental value for LYS 55 = 11.10
Histogram - LYS 55 (GBpK)
0
2
4
6
8
10
12
14
16
10 10.2
10.4
10.6
10.8 11 11.2
11.4
11.6
11.8 12
pH
Freq
uenc
y
Histogram - LYS 55 (GBpK - polar-H)
0
2
4
6
8
10
12
10 10.2
10.4
10.6
10.8 11 11.2
11.4
11.6
11.8 12
pHFr
eque
ncy
Average = 10.99 ± 0.27 Average = 11.02 ± 0.23
© 2008 Accelrys, Inc. 24
Conclusion
• GBpK is a GBORN-based method that estimates pK within just a few minutes– More accurate and rigorous than template-based methods– Faster than existing Poison-Boltzmann methods
• Very simple to use– Requires a known structure– Only one parameterized variable: dielectric constant (ε) of protein interior
• Method can be used to identify optimal pH for proteins
• Correct protein protonation based on calculated pK can improve quality of:– Molecular dynamics simulations– Protein-ligand docking – Protein-ligand binding energy calculation (electrostatic part)
• Methods can be used to identify and characterize active site residues
• Predictions (OMTKY3) show high correlation with exp. data & good RMSD deviation (< than 1 pH unit)
• Calculations for OMTKY3 - LYS 55 are in good agreement with experimental data and QM/MM predicted values
• Unusual pK shifts of OMTKY3 are well predicted
© 2008 Accelrys, Inc. 25
Acknowledgements
• Velin Spassov
• Lisa Yan
• Paul Flook
• Dipesh Risal
• Shikha O’Brien
© 2008 Accelrys, Inc. 26
Thank you!!!
• Thank You for attending today’s webinar. If you have any further questions please e-mail me at: [email protected]
• You can also contact us using the form on our website: http://accelrys.com/company/contact/
• We will be exhibiting at the following upcoming events:– CHI Protein Kinase Targets (June 23 – 25, Boston, Booth #4)– CHI Structure Based Design (June 25 – 27, Boston, Booth #7)– Drug Discovery Technology and Development (August 4 – 7, Boston, Booth #512)– ACS Fall 2008 (August 17 – 21, Philadelphia, Booth #211)
• Reminder: the next webinar in this series will be:
“Towards Increased Accuracy in Computational Drug Discovery with QM/MM”
June 26, 2008 at 7am PST and 10am PST
© 2008 Accelrys, Inc. 27
References:
• Bashford D. and Karplus M. Multiple-site titration curves of proteins: an analysis of exact and approximate methods for their calculation. J. Phys. Chem 1991,95: 9556 –9561
• Brooks,B.R.,Bruccoleri,R.E.,Olafson,B.D.,States,D.J.,Swaminathan,S.,and Karplus, M., “CHARMM: A program for macromolecular energy, minimization, and dynamics calculations,” J. Comput. Chem. , 1983 , 4 , 187-217
• Hui Li, et al. The Prediction of Protein pKa’s Using QM/MM: the pKa of Lysine 55 in Turkey Ovomucoid Third Domain. J. Phys. Chem. B. 2002, 106(13): 3486 – 3494
• Thurlkill R, et al. pK values of the ionizable groups of proteins. Protein Science 2006, 15: 1214 – 1218
• Spassov, V., Yan, L. and Flook, P. “The Dominant Role of Side-chain Backbone Interactions in Structural Realization of Amino-acid Code. ChiRotor: a Side-chain Prediction Algorithm Based on Side-chain Backbone Interactions,” Protein Science , 2007
• Spassov, V. et al. “LOOPER: A Molecular Mechanics Based Algorithm for Protein Loop Prediction”, Accepted, Protein Engineering, Design and Selection
• Spassov, V. and Yan, L. “A Fast and Accurate Computational Approach to Protein Ionization” Submitted, 2008