poisson-boltzmann, generalized born, and calculating pka values lecture 19

Poisson-Boltzmann, Generalized Born, and Calculating pKa Values

Lecture 19

Outline

Part 1: Macromolecular electrostaticsPart 2: Coulomb’s lawPart 3: Poisson equationPart 4: Poisson-Boltzmann equationPart 5: Calculating pKa values via continuum electrostaticsPart 6: The generalized Born solvent model

Outline

Part 1: Macromolecular electrostaticsPart 2: Coulomb’s lawPart 3: Poisson equationPart 4: Poisson-Boltzmann equationPart 5: Calculating pKa values via continuum electrostatics Part 6: The generalized Born solvent model

Macromolecular Electrostatics

Why do we care about electrostatics in the first place?

(1) Electrostatic interactions are very long-ranged. In general, they are the only chemical interaction that can extend past a few Angstroms.

(2) The can accelerate the rates at which molecules associate (e.g. Cu,Zn- superoxide dimutase).

(3) Proteins and DNA (our favorite macromolecules) are highly charged.

(4) The cytoplasm of cells contain a relatively high concentration of dissolved ions.

Intramolecular elec interactions are critical to protein stability;nevertheless, they must compete with solvent interactions

Electrostatics define substrate recognition, and they are evolutionarily selected for

Electrostatic potentials and fields

Q: What is an electrostatic potential?

The electrostatic potential is a scalar quantity, meaning it has no direction.

Suppose we place a charged particle into an electric field, the electrostatic potential is the quantity that when multiplied by the charge on the particle tells us the energy required to place the particle in the field.

Q: Then what is an electrostatic field?

The electrostatic field is a vector (which does have direction) that tells us the gradient of the electrostatic potential.

When multiplied by the charge on the particle, it tells us the force acting on the particle.

Couldn’t I just calculate the electrostaticpotential from an MD simulation?

In principle, this is feasible. For example, we could calculate the work required to bring a unit charge from far away towards the protein. The work required would provide the electrostatic potential.

In practice, this would be tough to do because of the many number of solvent configurations required to be sampled during the simulation.

Instead of an explicit solvent simulation, we use an implicit (continuum) model. Here the molecular details of the solvent are ignored. The solvent is modeled as a uniform high dielectric. (The dielectric of water is 80, so that’s what we use.)

Study Hint: Make sure you understand what a dielectric is!

Solvation model hierarchy

Polarizable explicit solvent

Fixed charge explicit solvent

Nonlinear Poisson-Boltzmann

Linear Poisson-Boltzmann

Generalized Born

Distance dependent dielectric

Surface-area based models

Com

puta

tiona

l Exp

ense

Phy

sica

l Rea

lity

Outline


Electrostatic potential vs. force via Coulomb’s law

Coulomb’s law is most commonly expressed as:

where the electrostatic force (as presented) is a scalar quantity. Note that this expression is 1/r2. The electrostatic force is the product of the charge and electric field (F=qE). (Quick aside: vector descriptions of the above expression can also be constructed.)

Juxtaposed to E&M problems, molecular modeling generally requires the electrostatic potential, which is 1/r. In the same way as above, the electrostatic energy is the product of the charge and electrostatic potential (U=q)

Electrostatic potential via Coulomb’s law

The first expression we have seen already, the second gives the Coulomb potential at a distance r from an ion charge zie:

The above expressions give the potential due to isolated charges in a vacuum.

Q: How do we then calculate potentials in solution?

Electrostatic potential via Coulomb’s lawHow does in vacuo behavior translate to solution?

Study Question: Above we stated that Coulomb’s law (as implemented in a MM force field) is based on a vacuum. Then, how can we calculate a realistic electrostatic potential via an MD simulation (which uses Coulomb’s law for it’s electrostatic contributions)?!

Electrostatic potential via Coulomb’s law

The Coulomb potential at a distance r from an ion charge zie is

This is the potential due to an isolated ion in a vacuum.

In solution: two modifications are needed: (1) PERMITTIVITY: the solvent decreases the strength of the potential, and (2) DEBYE LENGTH: the ionic atmosphere increasingly shields the interaction at larger

distances.

Relative permittivity

SUBSTANCE RELATIVE PERMITTIVITY (r)

Water 78.54

Ammonia 16.90

Ethanol 24.30

Benzene 2.27

Vacuum 1.00

Often, the observed permittivity is expressed in terms ofthe relative permittivity (aka dielectric constant), through e = ere0.

Note: you should approximately know these numbers!

The Debye length describes the maximal separation at which a given electronwill be influenced by the electric field of a given positive ion

Coulomb’s law

Accounting for relative permittivity

Accounting for the Debye length (rD)

One equation, two unknowns

One equation, two unknowns

Recall, before we had one equation, two unknowns…

I’m not going to go through the algebra (it’s in any P-Chem text), but if you can define the Debye length as…

meaning we can easily solve for the Debye length in terms of quantities we already know. I’m also not going to go into the all the terms, but they are also in any P-Chem text.

The Debye lengths dampens the electrostaticpotential at a given distance

• The above equations represent the Coulomb potential in a medium of permittivity.

• However, we rarely only concern ourselves with a single charged particle.

• In electrostatics, the potential arising from a charge distribution is given by Poisson’s equation.

Outline


The Poisson equation

The Poisson Equation is one of the fundamental equations of classical electrostatics.

is the electrostatic potential, is the dielectric constant of the medium, is the charge density, and r is the position vector. Note that the electrostatic potential, dielectric constant, and charge density are all position dependent.

Finite-difference methods

Finite-difference methods approximate the solutions to differential equations by replacing the derivative expressions with approximately equivalent difference quotients.

Without going into the details, FD methods do a first order Taylor expansion on the problem.

The error of FD is related to: (i.) machine precision, and --more importantly-- (ii.) truncation of the expansion.

To use a FD method to approximate a problem, one must first discretize the problem’s domain. Note that this means that FD methods produce sets of discrete numerical approximations to the derivative.


Assigning charges to the grid: It is very probable that the positions of charges do NOT correspond to positions on the grid. In order to overcome this, a given charge (partial or otherwise) is split up and parts of it are assigned to the 8 nearest grid points.

Assigning the dielectric values: A sphere is “rolled” over the surface of the protein. Every atom that the sphere “touches” is considered to be solvent accessible and part of the high dielectric (80). All others are considered solvent inaccessible and are assigned to the low dielectric. This is typically what is known as a Connolly Surface.


Focusing: The accuracy of the finite-difference method is dependent on grid spacing. Generally speaking, the smaller the grid spacing, the more accurate the result.

However, as the number of grid points increases, so does CPU time. Focusing is used to circumvent this problem.

…for example, using a 503 grid, first we calculate using a spacing of 2Å, then we focus in the molecule with a finer grid (say 1Å), and finally we use still finer grid(s) to improve the results.

This is actually computationally less expensive than doing a single calculation with, say, a 1503 grid with a fine resolution.

+ ++

++

+

+

++

-

--

-

-

-

The Poisson equation

So, for a given and , the finite-difference solutions to the PE allows you to efficiently determine the electrostatic potential.

Brilliant, we can now do almost everything we need. However, something critical is still missing. What is it?

High dielectric continuum

Outline


The PE lacks counter ion information!

+ ++

++

+

+

++

-

--

-

- +

+

+

+

+

+

+

+

+

+

-

-

-

-

-

-

--

- -

-

+

+

+

-

-

-

Note that the counter ion [+] = [-], whereas this is not true for the protein charge concentrations.

The Poisson-Boltzmann equation

charge distributiondue to counter ions

What’s in a name???

Poisson – as in the the Poisson Equation, which relates the electrostatic potential to the charge density.

Boltzmann – as in the Boltzmann distribution, which is employed here to provide the density of mobile counter ions.

These two equations are “married” together to relate the electrostatic potential in a uniform dielectric to the charge density in the presence of mobile counter ions.

charge distributiondue to macromolecule


Charge distributiondue to macromolecule

Implicit charge distributiondue to counter ions

What’s in a name???

Poisson – as in the the Poisson Equation, which relates the electrostatic potential to the charge density.

Boltzmann – as in the Boltzmann distribution, which is employed here to provide the density of mobile counter ions.

These two equations are brought together to relate the electrostatic potential in a uniform dielectric to the charge density in the presence of mobile counter ions.


The basic principle: recall that in biological settings, macromolecules are not dissolved in pure water, but instead are immersed in (dilute) saline solutions.

Now, to solve the Poisson equation, we need to know (among other things) how the charges are distributed – where they’re at.

For the macromolecule, that’s easy – the structure tells the location of every atom and our force field assigns the charges in an appropriate way.

But what about the counter ions? I don’t necessarily need a force field to tell me the charge of sodium ion (+1), but I do need some help locating it.

The answer to this seemingly unanswerable conundrum again lies in statistical mechanics. The number of ions (of type i) per unit volume of a region of space is given by the Boltzmann distribution:

A Boltzmann probability distributionis used to describe ion density

Where n0i is the number density of ions in bulk solution, qi is the charge on the ion, f is the

electrostatic potential in that region of space, kB is Boltzmann’s constant, and T is the temperature.

The Boltzmann distribution tells us…

…the anions accumulate where the potential is positive, and…the cations accumulate where the potential is negative.

Multiplying the counter ion number density and charge results in the desired charge density (r).

Note: recall charge x potential = energy.

The linear and nonlinear forms of the PBE

Inserting the sinh term gives…

Note, the hyperbolic sine term can be expanded as a Taylor series…

If x (in our case, ) is small, we can ignore all except the first term of the series. This linearization approximation results in the Linear Poisson-Boltzmann Equation (LPBE).

Note: this can only be done for small values of potential. For highly charged systems, the potential is likely to be large and the full Non-Linear Poisson-Boltzmann Equation (NLPBE) is required.

Rewriting the PBE…

Often, the PBE is seen in the following form:

K’ is related to the Debye-Huckle inverse length (K) by:

Where e is the electronic charge, I is the ionic strength of the solution, and NA is Avogadro’s number.

1/K is known as the Debye-Huckle length, which gives a measure of how far the electrostatic effects due to a charged molecule extend into solution. The Debye-Huckle length is related to the Debye length (rD) that we discussed previously.

Study Question: Which would be greater, K of chloride in water or gasoline?

Okay, we know how to solve the PBE, but so what???

• Most commonly, the PBE is used to solve for the electrostatic component of the solvation free energy.

• Also, it is common to use the PBE to compute the electrostatic energy of a particular molecular system. Sometimes this is converted into an electrostatics force for use in a stochastic dynamics simulation (note: we will see this soon).

• And at other times, we simply use it to describe the spatial distribution of charge about a molecular system.

Calculating the solvation free energy

Calculating the (electrostatic contribution of) solvation free energy for the following ionization reaction is relatively straightforward…

CH3OH CH3O-

Begin by constructing a free energy cycle…

CH3OH(vacuum) CH3OH(solution)

CH3O- (vacuum) CH3O-

(solution)

Thus, the SFE = GCH3OH - GCH3O(-), where a PBE is solved for each of the four states.

Note that SFE is related to the pKa through the ubiquitous relation G = -RTlnK.

Relative mutant stability: mutation of a single surfaceresidue can confer an appreciable increase in stability

Several recent studies have successfully increased mesophilic protein stability by mutagenesis of a single solvent exposed residue, presumably through optimization of the protein’s electrostatic surface (Grimsley et al, 1999; Loladze et al, 1999; Loladze & Makhatadze, 2002; Martin et al, 2001; Pedone et al, 2001; Perl et al, 2000; Perl & Schmid, 2001; Spector et al, 2000; Strop & Mayo, 2000).

From these experimental results, it is apparent that surface electrostatics are intimately related to protein stability, and, in some instances, mutation of a only few solvent exposed residues is sufficient for conferring thermostability to mesophilic proteins.

The observed stability gains often fail to reach those predicted by simple Coulombic interactions, and can, at times, lead to the opposite of the predicted effect.

For example, the stability of T4 lysozyme is generally decreased, despite a projected increase, by charge changing mutations on the protein surface (Dao-pin et al, 1991).

Pace et al. (2000) correctly point out that favorable charge-charge interactions are equally important to determining the denatured state ensemble conformations as the native protein structure.

unfolded native

Begin by constructing a free energy cycle…

WT(native) WT(unfolded)

Mutant(native) Mutant(unfolded)

Thus, the G = GWT - GMut, meaning G > 0 indicates a stabilizing mutation.

Again, a PBE is solved for each of the four states.

However, this poses a real computational challenge. What is it???

Predicting mutation stability

Modeling the denatured ensemble

REF: Torrez et al., Biophys J (2003) 85, 2845-2853

The denatured state ensemble is strongly influenced, especially denatured state electrostatic interactions, by the native structure.

As such, it might be possible to construct a single (denatured) structure that captures the average electrostatic properties of the denatured state.

To do so, we use the method of Elcock that “blows up” the native structure by performing a series of MM minimizations where ij from the LJ potential is systematically increased by 1 Å (up to 6 Å).

Testing the model: relative stability of mesophilicvs. thermophilic cold shock proteins

MLEGKVKWFNSEKGFGFIEVEGQDDVFVHFSAIQGEGFKTLEEGQVSFEIVEGNRGPQAANVTKEAMQRGKVKWFNNEKGYGFIEVEGGSDVFVHFTAIQGEGFKTLEEGQVSFEIVQGNRGPQAANVVKL-

Experimental data from Perl and Schmid, J Mol Biol (2001) 313, 343-357

R = 0.86

Electrostatic potential maps can aidin understanding of relative stabilities

Insights into the relative mutant stabilities can be explained by scrutinizing electrostatic potentials.

The E3A/A46E mutant is destabilized (vs. E3A), largely due to the E46:E66 and E46:CT repulsions.

The E3A/A46K mutant is one of the most stable CSP mutants investigated. The stability gain arise from the K46:CT ion pairs on the protein surface.

Electrostatic potentials are rendered in blue and red at ±3.0 kcal/mol/e, respectively. The above results are quantified using UHBD calculated electrostatic energies (next slide).


Without accompanying numbers, the electrostaticpotential maps are virtually useless

NT

K5

CT

E66

NT -0.1 0.2Lys5 -0.8 0.4Lys7 -0.1 0.0Glu12 0.0 0.0Lys13 0.0 0.0Glu19 0.3 -0.1Asp24 0.0 0.0Asp35 0.1 -0.1His29 0.0 0.0Lys39 0.0 0.0Glu42 0.1 -0.1Glu43 0.1 -0.1Glu50 0.1 -0.1Glu53 0.0 0.0Arg56 0.0 0.0Lys65 -0.1 0.1Glu66 1.1 -3.7CT 0.8 -0.8SUM +1.5 -4.3

Glu46:X Lys46:X

A3E-A3E/A46E A3E-A3E/A46K


Quick aside: empirical studies reinforce the notion that surface electrostatics are critical to protein stability

REF: Alsop et al., Prot Engr (2003) 16, 871-874.

Average occurrence vs. distance (Å) for all mesophilic and thermophilic charge-charge ion pairs. The increase in acid-base pairs in thermophilic protein structures suggests that optimizing the electrostatic surface of the protein is a robust evolutionary theme. 95% confidence intervals are presented as error estimates.

Because there are more charged residues in thermophilic proteomes, one would expect more acid-base pairs...

The drastic increase in thermophilic acid-base pairs (compared to the small acid-acid and base-base increase) indicates that their location optimizes their stabilizing effect.

Quick aside continued: local optimizationof surfaces not directly related to function

Mesophilic vs Thermophilic

REF: Alsop et al., Prot Engr (2003) 16, 871-874.

More electrostatic potential maps

Electrostatic interactions are very long-ranged. They can accelerate the rates at which molecules associate (e.g. acetylcholinesterase or antibodies).

Electrostatic potential maps can also be very informative when applied to small molecules.

Note that the red/white/blue coloring scheme is generally used, but this is an arbitrary choice. Sometimes using a larger pallet is more informative.

More electrostatic potential maps

Visualizing the formation of the CH3NH3+/18-crown-6 complex

CuZnSOD: a case study on the importanceof protein electrostatics to function

Cu++ + O2- → Cu+ + O2

Cu+ + O2- + 2H+ → Cu++ + H2O2

2O2- + 2H+ → O2 + H2O2

Electrostatics make CuZnSOD “better than perfect”

Electrostatic potential maps across the CuZnSODprotein family are evolutionarily conserved

REF: Livesay et al., Biochemistry, 2003, 42(12):3464-3473.

The conserved electrostatic potentials result in quantitatively conserved encounter rate constants (calculated via BD1)

REF: Livesay et al., Biochemistry, 2003, 42(12):3464-3473. 1 Brownian dynamics simulation will be our next topic.

Outline


“Houston, we have a problem.”

In all previous examples (except for methanol/methoxide), we have assumed a fixed ionization state. The ionization state was assumed based on the pKa values of individual amino acids in solvent (see table to right).

Q: Is this a always a good assumption? (Hint: NO!!!)

Q: When is this approximation the most critical?

Residue pKa valuesCT: 3.8

Asp: 4.0Glu: 4.4His: 6.5NT: 8.0 Cys: 8.5Tyr: 10.0Lys: 10.0Arg: 12.0

Let’s back up a bit…

What is a pKa? The negative log (p) of a Ka, which is the acid equilibrium constant.

I know this seems like boring chemistry stuff, but understanding what we are calculating is essential to doing it correctly!

Why do we care about pKa values?

Knowing the pKa values of amino acid residues is essential for accurately calculating electrostatic potentials around proteins --- if we have an unrealistic ionization state, the calculated potential we be unrealistic too.

Also, we can use pKa shifts (pKa) to understand the chemical significance of particular residues to certain types protein chemical reactions (i.e, protein folding; protein/ligand binding; etc).

Note that the methods are generally too crude to understand the precise electronic rearrangements involved in most enzyme catalyzed reactions --- for such an understanding we must employ more computationally expensive (ab initio) methods.

As a rule of thumb, the error on the methods that we’re about to discuss is +/- 1 pH unit, which is a lot! However, the situation can get even worse. Errors > 5 pH units in difficult cases is not uncommon.

The “p” function

p is the -log10 of something.

for example, pH = -log [H+] and pKa = -log Ka

often we express the pKa in terms of natural logs

pKa = -log Ka = -2.303 ln Ka

The Henderson-Hasselbach equation

Relating pKa values to free energy

Now, because we can relate pKa values to changes in free energies, we can build an energy cycle that relates changes in pKa values to changes in the free energy changes.

Given Given ???

Relating pKa values to free energy

Let’s (again!) build a free energy cycle…


model pKa

(empirical data)


model pKa

(empirical data)

Change from aqueous amino acid to hypothetical neutral protein... -desolvation -dipole/dipole interactions

Intrinsic pKa

The intrinsic pKa value is calculated as a perturbationaway from model (analogous to ideal) behavior

G is calculated from PE electrostatic potentials

1i is the potentials at location of charge i created by a unit charge at site 1, q*1 and q1 is the charge, respectively, before and after ionization, and n and N are number of atoms in the model amino acid and protein.

Ionization of multiple residues

So far, we have investigated the combined effects of desolvation and electrostatic interactions with nonionizable residues on (intrinsic) pKa values.

However, this is far from a realistic model of a protein b/c it ignores the interactions of other ionized groups on our particular site.

Consider our ASP (now buried within the core of the protein). Suppose that interacts with a LYS. Is that LYS charged or neutral? We don’t know!

The ionization states of the two residues are linked, so in principle we need to consider four different ionization states:

ASP0/LYS0 ; ASP-1/LYS0 ; ASP0/LYS+1 ; ASP-1/LYS+1


ASP0, LYS0

ASP0, LYS+1

ASP-1, LYS0

ASP-1, LYS+1

The average charge of ASP is a sum over the four possible states (see to the right):

For N ionizable residues, we must consider 2N possible ionization states.

This is computationally intractable for moderately-sized proteins so smart (heuristic) methods are required to avoid this problem.


What we really require is the interaction energy between the target residue and each of the other N-1 ionizable residues.

We obtain this with an electrostatics calculation with only the target residue ionized and then reading the electrostatic potential generated by this residue at all other N-1 sites.

We repeat this process at each of the next N-1 sites.

Obviously, the closer the two sites are, the stronger their interaction (whether it be attractive or repulsive). If they are far away, we can treat them as independent.


model pKa

(empirical data)

Change from aqueous amino acid to hypothetical neutral protein... -desolvation -dipole/dipole interactions

Realistic charge environment. -the effects of every titratable residue i on j.

Intrinsic pKa

Apparent pKa

The apparent pKa value is calculated as a perturbationaway from intrinsic behavior

Calculating G of the charged protein

Once the potentials have been calculated, the ionization polynomial must be solved…

dosbs.sc Black box pKa output

UHBD

potentials file

• Monte Carlo sampling• Mean-field (hybrid) approach

The mean-field approach, as implemented within hybrid.h, exactly enumerates the ionization polynomial within small clusters of residues, and treats cluster to cluster interaction with a mean-field approximation.

Q: Based on what we learned while discussing Monte Carlo methods, how might the MC sampling approach work? Note: this will be a homework question.

An easy to use implementation of the pKa calculationhttp://biophysics.cs.vt.edu/H++

Dielectric values

Recall that we have a high dielectric exterior & low dielectric interior. What values should we use?

When calculating electrostatic potential maps, it is common to use in = 4 and out = 78. However, reduced errors (compared to known pKa values) when a higher in is used.

The default value of H++ is in = 6; however, others use much larger in values. In fact, I generally use a value of in = 20! Q: So, which value is correct? (Answer: neither!)

However, a value of 20 (which admittedly seems quite high) gives the best overall results on tyrosine residues, which are generally (in part) buried.

How sensitive are calculated pKa values to input structure?

The answer is quite interesting, but not totally unexpected… The most solvent exposed residues (which have the largest structural RMSD) have the lowest pKa variability, whereas buried positions (that don’t move around too much) have the largest pKa variability. This result arises due to the complex electrostatic microenvironments present within the core of a native protein structure. REF: Livesay et al., JCTC, 2006, 2:927-938.

With respect to its model value, would you expectthe pKa value of Asp to be shifted up or down?

Push

Application of the pKa calculation

Common uses of the pKa calculation include:

• More accurate description of the ionization state of a macromolecule

• Identification (and quantification) of the specific residues involved in the electrostatic aspects of receptor-substrate association

• Improved understanding of functional mechanisms

• Functional site prediction

The ionization state can vary much more than you think…


Despite all the conservation discussed earlier within the CuZnSOD protein family, there is much diversity as well


Quantifying electrostatic interactionsbetween receptor and substrate

REF: Livesay et al., Mol Immunol, 1999, 36:397-410.

pKa calculation example: Antibody-hapten interaction


pKa shifts on association

Titration curves for N-(p-cyanophenyl)-N’-(diphenylmethyl)-guanidiniumacetic acid


Quantitating electrostatic interactions b/t proteinand ligand in a pH-dependent way via pair potentials


Quantitating electrostatic interactions b/t proteinand ligand in a pH-dependent way via Gelec


Applying the previous techniques to a familyof Ab’s reveals electrostatic “hot-spots”

REF: Livesay & Subramaniam, PEDS, 2004, 17:463-472.

Identification of “electrostatic hot-spots” via pKa calculations


Schematic representation of combining site region for each antibody–hapten complex. Analysis of the antibody–hapten interface reveals similar numbers of interfacial tyrosine residues, hydrogen bonds and salt bridges.

Phylogenetic trees of known antibody structural proteome. The distribution of protein structures analyzed here, indicated by hash marks to the right of the trees, clearly demonstrates the representative nature of our dataset.

Identification of “electrostatic hot-spots” via pKa calculations


pKa values are frequently evolutionarily conserved

REF: Livesay & La, Prot Sci, 2005, 14:1158-1170.

The TIM reaction mechanism


Triosephosphate isomerase


The conserved pKa of Glu165 is defined by anevolutionarily conserved electrostatic network


Aside: In fact, my lab’s phylogenetic motif functional site prediction method predicts all of these “secondary catalytic sites” but one (His185).

THEMATICS: theoretical microscopic titration curves

REF: Ondrechen et al, PNAS, 2001, 98:12473-12478.

Atypical calculated titration curves have been suggestedas a means to reliably predict protein functional sites

REF: Ondrechen et al, PNAS, 2001, 98:12473-12478.

Theoretical Microscopic Titration Curves:

• Starting from known structure, calculate per-residue titration curves.

• Identify (manual or automated) perturbed curves.

• A structural cluster of two or more perturbed curves is used to predict active site location.

Advantages of the THEMATICS approach:

• Gives chemical information that indicates why a particular residue might be functional.

• Claimed to be reliable for active site identification (but, to best of my knowledge, this hasn’t been demonstrated in a satisfactory way).

• Conceptually simple, and (relatively) compute efficient.

Electrostatic strain energy

REF: Elcok, JMB, 2001, 312:885-896.

Similarly, electrostatically destabilizing residues have beenshown to also reliably predict protein functional sites

REF: Elcok, JMB, 2001, 312:885-896.

• Catalytic and other functionally important residues in proteins can often be mutated to yield more stable proteins.

• Many of these residues are charged that are located in electrostatically destabilizing microenvironments.

• Because PB continuum theory can identify these destabilizing residues, the same methods should be able to identify functionally important residues in otherwise uncharacterized proteins

• Application of the method to six proteins for which good structural and mutation data is available suggests that the approach has merit (alanine racemase is shown to the right).

• This approach highlights the notion that multiple evolutionary constraints are acting upon a protein family throughout the course of evolution!

Outline


The generalized Born solvent model

The generalized Born (GB) solvent model is widely used to represent the electrostatic contribution to the solvation free energy.

The GB model is an approximation to the exact LBPE.

The model comprises a system of particles with radii ai and charges qi.

In a medium of permittivity , the total electrostatic free energy of a GB system is given by the sum of the Coulomb energy and the Born solvation free energy…

Coulombic term Correction to Coulombic term to account for screening

The generalized Born solvent model

The Coulombic term can be written as the sum of the in vacuo and solvent interactions:

Subtracting the in vacuo term, gives solvation free energy:

Note: The model is called Generalized Born because the Born equation that describes the solvation FE of a single ion is generalized to N charges.

A closer look at the GB part

Combining the two parts, the GB model now has the common form

where…

rij is the distance between particles i and j, and ai is a quantity (with the dimension of length) known as the effective Born radius.

The effective Born radius of an atom characterizes its degree of burial inside the solute; qualitatively, it can be thought of as the distance from the atom to the molecular surface.

Accurate estimation of the effective Born radii is absolutely critical for the GB model.

Application of GB

Based on compute cost, calculating Gelec from exact solutions of the (L or NL)PBE are limited to a finite number of structures (i.e., those from a free energy cycle).

However, this limitation is removed in GB.

As such, GB can even be used in a molecular dynamics simulation (actually, this is quite common).

Additionally, GB is commonly used within the calculation of pKa values.

In fact, GB can be used in any problem that PB is relevant (assuming increased error is tolerated).

PB vs. GB

Performance of GB and PB methods as measured by average percent error against a test set (the details are unimportant) versus the time required for a single calculation of the solvation free energy on the native structure. Both time and percent error axes are shown in logarithmic scale.

REF: Feig et al., J Comp Chem, 2003, 25:265-284.

Effective Born radii obtained from PBE (x-axis) and various GB models (y-axis). Note that there is generally very poor agreement; however, as the GB models get more sophisticated, the agreement improves.

REF: Zhu et al., J Phys Chem B, 2005, 109:3008-3022.

PB vs. GB

H++ calculated pKa values on 1RGG using GB vs. PB. The black solid line indicates the diagonal (perfect agreement), and the grey lines indicate +/- 1 pH unit, which is a commonly accepted error estimate of the PB pKa predictions.

PB vs. GB

Residue accessibility (1RGG) vs. the difference between PB- and GB-calculated pKa values. Note that the largest (and thus most significant) differences occur within buried residues where there is the most complexity within the electrostatic microenvironments.

1RGG residues are color-coded by pKa.

poisson-boltzmann, generalized born, and calculating pka values lecture 19

Documents

poisson equation

electrostatic field

poissonboltzmann equation

electrostatic interactions

solvent interactions

macromolecular electrostatics

continuum electrostatics

electrostatic potentials