molecular modeling the compendium of methods for mimicking the behavior of molecules or molecular...

67
Molecular Modeling The compendium of methods for mimicking the behavior of molecules or molecular systems

Upload: ethan-marshall

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Molecular Modeling

The compendium of methods for mimicking the behavior of molecules or molecular systems

Points for Consideration

Remember• Molecular modeling forms a model of the real world• Thus we are studying the model, not the world• A model is valid as long as it reproduces the real world

Why Use Molecular Modeling?(and not deal directly with the real world?)

Fast, accurate and relatively cheap way to:• Study molecular properties• Rationalize and interpret experimental results• Make predictions for yet unstudied systems• Study hypothetical systems• Design new molecules

And• Understand

reactants products?

probe

Some Molecular PropertiesEnergy

• Free energy (G)• Enthalpy (H)• Entropy (S)• Steric energy (G)

Molecular Spectroscopy• NMR (Nuclear Magnetic

Resonance)• IR (Infra Red)• UV (Ultra Violate)• MW (Microwave)

Kinetics• Reaction Mechanisms• Rate constants

Electronic Properties• Molecular orbitals• Charge Distributions• Dipole Moments

3D Structure• Distances• Angles• Torsions

Molecular Structure: Saccharin

C7H5NO3SA single 1D structure

S

N

O

O O

HA single 2D structure

Many 3D structures

Descriptors1D: e.g., Molecular weight2D: e.g., # of rotatable bonds3D: e.g., Molecular volume

StructureProperty

ActivityCell PermeabilityToxicity

Molecular Structure and Molecular Properties

Molecular mechanics

Introduction to Force Fields

Molecular mechanics

• Ball and spring description of molecules• Better representation of equilibrium geometries than

plastic models• Able to compute relative strain energies• Cheap to compute• Lots of empirical parameters that have to be carefully

tested and calibrated• Limited to equilibrium geometries• Does not take electronic interactions into account • No information on properties or reactivity• Cannot readily handle reactions involving the making

and breaking of bonds

Energy as a function of geometry• Polyatomic molecule:

– N-degrees of freedom– N-dimensional potential energy surface

http://www.chem.wayne.edu/~hbs/chm6440/PES.html

A Molecule is a Collection of Atoms Held Together by ForcesIntuitively forces act between bonded atomsForces act to return structural parameters to their equilibrium values

stretch

bend torsion

And More Forces…

Forces also act between non-bonded atomsCross terms couple the different types of interactions

Stretch-bend Non-bonded

Forces are Described by Potential Energy Functions

20 )(

2)( llk

lv

20 )(

2)( kv )]cos(1(

2)(

0

nV

vN

n

n

And More Functions…

))](()[(2

),,( 00,220,112121 llll

kllv ll

N

i

N

ij

ij

ji

ij

ij

ij

ijij

r

qq

rrv

1 1

0

612

4

4

A Force Field is a Collection of Potential Energy Functions

A force field is defined by the functional forms of the energy functions and by the values of their parameters.

))cos(1(2

)(2

)(2

)( 20,

20, n

Vkll

krV

torsions

nii

angles

iii

bonds

iN

termscross4

41 1 0

612

N

i

N

ij ij

ji

ij

ij

ij

ijij r

qq

rr

Molecular Mechanics Energy (Steric Energy) is Calculated from the Force

FieldA molecular mechanics program will return an energy value for every conformation of the system.

Steric energy is the energy of the system relative to a reference point. This reference point depends on the bonded interactions and is both force field dependent and molecule dependent.

Thus, steric energy can only be used to compare the relative stabilities of different conformations of the same molecule and can not be used to compare the relative stabilities of different molecules.

Further, all conformational energies must be calculated with the same force field.

Steric Energy

Steric Energy = Estretch + Ebend + Etorsion + EVdW + Eelectrostatic +

Estretch-bend + Etorsion-stretch + …

Estretch Stretch energy (over all bonds)Ebend Bending energy (over all angles)Etorsion Torsional (dihedral) energy (over all dihedral angles)EVdW Van Der Waals energy (over all atom pairs > 1,3)Eelectrostatic Electrostatic energy (over all charged atom pairs >1,3)Estretch-bend Stretch-bend energyEtorsion-stretch Torsion-stretch energy

EVdW + Eelectrostatic are often referred to as non-bonded energies

Force Field and Potential Energy Surface

A force field defines for each molecule a unique PES.Each point on the PES represents a molecular conformation characterized by its structure and energy.Energy is a function of the coordinates.Coordinates are function of the energy.

ener

gy

coordinates

CH3

CH3

CH3

Moving on (Sampling) the PES

Each point on the PES is represents a molecular conformation characterized by its structure and energy.

By sampling the PES we can derive molecular properties.

Sampling energy minima only (energy minimization) will lead to molecular properties reflecting the enthalpy only.

Sampling the entire PES (molecular simulations) will lead to molecular properties reflecting the free energy.

In both cases, molecular properties will be derived from the PES.

Force Fields: General Features

Force field definition• Functional form (usually a compromise between accuracy and

ease of calculation.• Parameters (transferability assumed).

Force fields are empirical• There is no “correct” form of a force field.• Force fields are evaluated based solely on their performance.

Force field are parameterized for specific properties• Structural properties• Energy• Spectra

Force Fields: Atom Types

C

CMM2 type 2

O

C MM2 type 3

In molecular mechanics atoms are given types - there are often several types for each element.

Atom types depend on:• Atomic number (e.g., C, N, O, H).• Hybridization (e.g., SP3, SP2, SP).• Environment (e.g., cyclopropane, cyclobutane).

“Transferability” is assumed - for example that a C=O bond will behave more or less the same in all molecules.

A General Force Field Calculation

Input• Atom Types• Starting geometry• Connectivity

Energy minimization / geometry optimizationCalculate molecular properties at final geometryOutput

• Molecular structure• Molecular energy• Dipole moments• etc. etc. etc.

A Simple Molecular Mechanics Force Field Calculation

))cos(1(2

)(2

)(2

)( 20,

20, n

Vkll

krV

torsions

nii

angles

iii

bonds

iN

ij

ji

ij

ij

ij

ijN

i

N

ijij r

qq

rr 0

612

1 1 44

Torsions• H-C-C-H x 12• H-C-C-C x 6

Non-bonded• H-H x 21• H-C x 6

Bonds• C-C x 2• C-H x 8

Angles• C-C-C x 1• C-C-H x 10• H-C-H x 7

Bond Stretching

20 )]}(exp[1{)( llaDelv

Dea 2

k

Morse potential

E

r

• Accurate.• Computationally inefficient (form, parameters).• Catastrophic bond elongation.

De = Depth of the potential energy minimuml0 = Equilibrium bond length (v(l)=0)= Frequency of the bond vibration = Reduced massk = Stretch constant

Bond Stretching

• Inaccurate.• Coincides with the Morse potential at the bottom of the well.• Computationally efficient.

20 )(

2)( llk

lv

Harmonic potential (AMBER)

r

E

Bond Stretching

30

20 )(

22

)(21

)( llk

llk

lv

Cubic (MM2) and quadratic (MM3) potentials

40

30

20 )(

23

)(22

)(21

)( llk

llk

llk

lv

harmoniccubicquadratic

Bond Stretching Parameters (MM2)

Bond l0 (A) k (kcal mol-1 A-2)

Csp3-Csp3 1.523 317

Csp3-Csp2 1.497 317

Csp2=Csp2 1.337 690

Csp2=O 1.208 777

Csp3-Nsp3 1.438 367C-N (amide) 1.345 719

Hard mode.Bond types correlate with l0 and k values.A 0.2Å deviation from l0 when k=300 leads to an energy increase of 12 kcal/mol.

ener

gy

angle

Angle Bending

MM3MM2AMBER

20 )(

2)( kvAMBER:

60

20 )(

2

2)(

2

1)( kk

vMM2:

60

50

40

30

20 )(

25

)(24

)(23

)(22

)(21

)( kkkkkvMM3:

Angle Bending Parameters (MM2)

Hard mode (softer than bond stretching)

Angle 0 k (kcal mol-1 deg-1)

Csp3-Csp3-Csp3 109.47 0.0099

Csp3-Csp3-H 109.47 0.0079

H-Csp3-H 109.47 0.007

Csp3-Csp2-Csp3 117.2 0.0099

Csp3-Csp2=Csp2 121.4 0.0121

Csp3-Csp2=O 122.5 0.0101

Torsional (Dihedral) Terms

Reflect the existence of barriers to rotation around chemical bonds.

Used to set the relative energies of the rotational minima and maxima.

Together with the non-bonded terms are responsible for most of the structural and energetic changes (soft mode).

Usually parameterized last.

Soft mode.

General Functional Form)]cos(1(

2)(

0

nV

vN

n

n

: Torsional angle.n (multiplicity): Number of minima in a 360º cycle.Vn: Correlates with the barrier height.(phase factor): Determines where the torsion passes through its minimum value.

Vn = 4, n = 2, = 180

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 60 120 180 240 300 360

Torsion angle

To

rsio

n e

ne

rgy

Vn = 2, n = 3, = 60

Functional Form: AMBER

)]cos(1(2

)(0

nV

vN

n

n

Preference to single terms.Usage of general torsional parameters (i.e., torsional potential solely depends on the two central atoms of the torsion).

• C-C-C-C = O-C-C-C = O-C-C-N = …

Example: Butane

# Type V1 V2 V31 C-C-C–C 0.200 0.270 0.0934 C-C-C–H 0.000 0.000 0.2674 H-C-C–H 0.000 0.000 0.237

The barrier to rotation around the C-C bond in butane is ~20kJ/mol.All 9 torsional interactions around the central C-C bond should be considered for an appropriate reproduction of the torsional barrier.

Out-of-Plane BendingAllows for non-planarity when required (e.g., cyclobutanone).Prevents inversion about chiral centers (e.g., for united atoms). Induces planarity when required.

United AtomsAbsorb non-polar hydrogen atoms into respective carbons.Greatly reduces computational cost.

O

O

correct

Out-of-Plane Bending

i

j

k

l

j

k

i

l

Wilson Angle

Improper torsion

Pyramidal Distance

20 )()( kv

20 )()( ddkdv

)]cos(1[)( 0 nkv

Between ijk plane and i-l bond

1-2-3-4 torsion

Between atom i and jkl plane

Cross Terms: Stretch-BendCoupling between internal coordinates.Important for reproducing structures of unusual (e.g., highly strained ) systems and of vibrational spectra.

Stretch-Bend: As a bond angle decreases, the adjacent bonds stretch to reduce the interaction between the 1,3 atoms.

))](()[(2

),,( 00,220,112121 llll

kllv ll

Cross Terms: Stretch-Torsion

Stretch-Torsion: For an A-B-C-D torsion, the central B-C bond elongates in response to eclipsing of the A-B and C-D terminal bonds.

nllklv cos)(),( 0

Non-Bonded Interactions

Operate within molecules and between molecules.

Through space interactions.

Modeled as a function of an inverse power of the distance.

Soft mode.

Divided into:• Electrostatic interactions.• VdW interactions.

Electrostatic Interactions

A

i

B

j ij

jilarintermolec r

qqV

1 1 04

A

i

A

ij ij

jilarintramolec r

qqV

1 1 04

qi, qj are point charges.Charge-charge interactions are long ranged (decay as r-1).When qi, qj are centered on the nuclei they are referred to as partial atomic charges.

• Fit to known electric moments (e.g., dipole, quadrupole etc.)• Fit to thermodynamic properties.• Ab initio Calculations

– Electrostatic potential

Rapid Methods for Calculating Atomic Charges

Partial Equalization of Orbital Electronegativity (Gasteiger and Marsili)

http://www2.chemie.uni-erlangen.de/software/petra/manual/manual.pdf

Electronegativity (Pauling): “The power of an atom to attract an electron to itself”

Orbital electronegativity depends upon:• Valence state (e.g., sp > sp3).• Occupancy (e.g., empty > single > double).• Charges in other orbitals.

Electrons flow from the less electronegative atoms to the more electronegative atoms thereby equalizing the electronegativities.

For the dependency of orbital electronegativity on charge assume:

Theoretically correct for orbitals but formally applied to atoms.Values of a, b and c were obtained for common elements in their usual valence states (e.g., for atom types).

Apply an interactive process:• Assign each atom its formal charge.• Calculate atomic electronegativities based on above assumption.• Calculate the electron charge transferred from atom A to the

more electronegative atom B bonded to it by:

Partial Equalization of Orbital Electronegativity

2AAAAAA QcQba

k

A

kA

kBkQ

)()(

)(Dumping factor

Cation Electronegativity of the less electronegative atom

Solvent Dielectric Models

A

i

A

ij ij

ji

r

qqV

1 1 04

0 = Dielectric constant of vacuum (1).For a given set of charges and distance, 0 determines the strength of the electrostatic interactions.Solvent effect dampen the electrostatic interactions and so can be modeled by varying 0: eff = 0r

r(protein interior) = 2-4 r(water) = 80

Solvent Dielectric Models: Distance Dependence

A

i

A

ij ij

jieffeff r

qqVrr

1 12

04)(

rSrreff erSrS ]22)[(

21 2

Smooth increase in eff form 1 to r as the distance increases

Dielectric constant increases as a function of distance

distance

eff

eff varies from 1 at zero separation to the bulk permitivity of the solvent at large separation. As the separation between the interacting atoms increases, more solvent can “penetrate” between them.

Van Der Waals (VdW) Interactions

Electrostatic interactions can’t account for all non-bonded interactions within a system (e.g., rare gases).

VdW interactions:• Attractive (dispersive) contribution (London forces)

– Instantaneous dipoles due to electron cloud fluctuations.– Decays as r6.

• Repulsive contribution– Nuclei repulsion.– At short distances (r < 1) rises as 1/r.– At large distances decays as exp(-2r/a0); a0 the Bohr radius.

Van Der Waals (VdW) Interactions

The observed VdW potential results from a balance between the attractive and repulsive forces.

Modeling VdW Interactions

radiiVdW of sum2 61 mr

})(2){()( 612 rrrrrv mm 612)( rCrArv 1212 4 mrA 66 42 mrC

Alternative forms

separationen

ergy

rm

612

4)(rr

rv

Lennard-Jones Potential

• Rapid to calculate• Attractive part theoretically sound• Repulsive part easy to calculate but

too steep

Hydrogen Bonding

H-bond geometry independent:

1012)(rC

rA

rv

• r is the distance between the H-bond donor and H-bond acceptor.

H-bond geometry dependent (co-linearity with lp preferred):

LPAccHAccHDonAccHAccH

HB rA

rA

v ......4

......2

10...

12...

coscos

N

H

OC DonAcc

Force Field ParameterizationChoosing values for the parameters in the potential function equations to best reproduce experimental data.

Parameterization techniques• Trial and error• Least square methods

Types of parameters• Stretch: natural bond length (l0) and force constants (k).• Bend: natural bond angles (0) and force constants (k).• Torsions: Vi’s.• VdW: (, VdW radii).• Electrostatic: Partial atomic charges.• Cross-terms: Cross term parameters.

Parameters - Quantity

For N atom types require:• N non-bonded parameters• N*N stretch parameters• N*N*N bend parameters• N*N*N*N torsion parameters

ExampleMacroModel MM2*: 39 atom types

Required ActualStretchBendTorsion

152158,3192,313,441

164357508

Parameters - Source

Experiment: geometries and non-bonded parameters • X-Ray crystallography• Electron diffraction• Microwave spectroscopy• Lattice energies

Advantages• Real

Disadvantages• Hard to obtain• Non-uniform• Limited availability

A force field parameterized according to data from one source (e.g., experimental gas phase, experimental solid phase, ab initio) will fit data from other sources only qualitatively.

Parameterization - Example

0.00

1.00

2.00

3.00

4.00

5.00

0 90 180 270 360torsion

En

ergy

(k

cal/m

ol)

0.00

1.00

2.00

3.00

4.00

5.00

0 90 180 270 360torsion

En

ergy

(k

cal/m

ol)

relative energy (kcal/mol)

ab initio -0.69 ABMER -1.25parameterized AMBER -0.68

O

OH

O

O

H

Transferability of ParametersAssumption: The same set of parameters can be used to model a related series of compounds:

• Prediction• Missing parameters

– Educated guess– Parameters derived from atomic properties (e.g., UFF)

• Reducing number of parameters– Generalized torsions (e.g., depend only on two central atoms)– Generalized VdW parameters (same for SP, SP2, SP3)

Above assumption breaks down for close interaction function groups:

OO O

X

Existing Force FieldsAMBER (Assisted Model Building with Energy Refinement)

• Parameterized specifically for proteins and nucleic acids.• Uses only 5 bonding and non-bonding terms along with a

sophisticated electrostatic treatment.• No cross terms are included.• Results can be very good for proteins and nucleic acids, less so

for other systems.

CHARMM (Chemistry at Harvard Macromolecular Mechanics)• Originally devised for proteins and nucleic acids.• Now used for a range of macromolecules, molecular dynamics,

solvation, crystal packing, vibrational analysis and QM/MM studies.

• Uses 5 valence terms, one of which is electrostatic term.• Basis for other force fields (e.g., MOIL).

Existing Force FieldsGROMOS (Gronigen molecular simulation)

• Popular for predicting the dynamical motion of molecules and bulk liquids.

• Also used for modeling biomolecules.• Uses 5 valence terms, one of which is an electrostatic term.

MM1, 2, 3, 4• General purpose force fields for (mono-functional) organic

molecules.• MM2 was parameterized for a lot of functional groups.• MM3 is probably one of the most accurate ways of modeling

hydrocarbons.• MM4 is very new and little is known about its performance.• The force fields use 5 to 6 valence terms, one of which is an

electrostatic term and one to nine cross terms.

Existing Force FieldsMMFF (Merck Molecular Force Field)

• General purpose force fields mainly for organic molecules.• MMFF94 was originally designed for molecular dynamics

simulations but is also widely used for geometry optimization.• Uses 5 valence terms, one of which is an electrostatic term and

one cross term.• MMFF was parameterized based on high level ab initio

calculations.

OPLS (Optimized Potential for Liquid Simulations)• Designed for modeling bulk liquids.• Has been extensively used for modeling the molecular dynamics

of biomolecules.• Uses 5 valence terms, one of which is an electrostatic term but no

cross terms.

Existing Force FieldsTripos (SYBYL force field)

• Designed for modeling organic and biomolecules.• Often used for CoMFA analysis (QSAR method).• Uses 5 valence terms, one of which is an electrostatic term.

MOMEC• A force field for describing transition metals coordination

compounds.• Originally parameterized to use 4 valence terms but not an

electrostatic term.• Metal-ligand interactions consist of bond stretch only.• Coordination sphere is maintained by non-bonding interactions

between ligands.• Work reasonable well for octahedrally coordinated compounds.

Existing Force FieldsUFF (Universal Force Field)

• Designed for coverage of the entire periodic table.• All force field parameters are atomic based.• Parameters for specific interactions are derived from atomic

parameters based on a series of mixing rules, thereby addressing the problem of parameter explosion.

• Uses 4 valence terms but not an electrostatic term.

YATI• Designed for the accurate representation of non-bonded

interactions.• Most often used for modeling interactions between biomolecules

and small substrate molecules.

Existing Force Fields

CVFF (Consistent Valence Force Field)• Parameterized for small organic (amides, carboxylic acids, etc.)

crystals and gas phase structures. • Handles peptides, proteins, and a wide range of organic

systems. • Primarily intended for studies of structures and binding

energies, although it predicts vibrational frequencies and conformational energies reasonably well.

Existing Force FieldsCFF, CFF91, CFF95 (Consistent Force Field)

• Parameterized (based on quantum mechanics calculations and molecular simulations) for acetals, acids, alcohols, alkanes, alkenes, amides, amines, aromatics, esters, and ethers.

• Useful for predicting:– Small molecules: gas-phase geometries, vibrational

frequencies, conformational energies, torsion barriers, crystal structures.

– Liquids: cohesive energy densities; for crystals: lattice parameters, RMS atomic coordinates, sublimation energies.

– Macromolecules: protein crystal structures, polycarbonates and polysaccharides (CFF91,95).

Existing Force Fields

Dreiding• Parameters derived by a rule based approach.• Good coverage of the periodic table.• Good coverage for organic, biological and main-group inorganic

molecules. • Only moderately accurate for geometries, conformational energies,

intermolecular binding energies, and crystal packing.

Energy Minimization

IntroductionA system of N atoms is defined by 3N Cartesian coordinates or 3N-6 internal coordinates. These define a multi-dimensional potential energy surface (PES).

A PES is characterized by stationary points:

• Minima (stable conformations)• Maxima• Saddle points (transition states)

ener

gy

coordinates

Classification of Stationary Points

0.0

4.0

8.0

12.0

16.0

20.0

0 90 180 270 360

transition state

local minimum

global minimum

ener

gy

coordinate

TypeMinimum MaximumSaddle point

1st Derivative000

2nd Derivative*positivenegative

1 negative

* Refers to the eigenvalues of the second derivatives (Hessian) matrix

Population of Minima

Most minimization method can only go downhill and so locate the closest (downhill sense) minimum.No minimization method can guarantee the location of the global energy minimum.No method has proven the best for all problems.

Global minimum

Most populated minimum

Active Structure

A General Minimization Scheme

Starting point x0

Minimum?

Calculatexk+1 = f(xk)

Stopyes

No

Sequential Univariate MethodFor each coordinate:

• Generate two new points (xi+xi, xi+2xi) and calculate their energies.

• Fit a parabola to xi (1), xi+xi (2), xi+2xi (3) and locate its minimum (4).

• Set the coordinate of xi to the parabola minimum.• When the change in all coordinates is small enough, stop.

This method requires fewer function evaluations than the Simplex method but is still slow to converge.

SummarySimplex, Sequential Univariate (memory requirements scale as N)

• Robust (i.e., can be initiated from poor starting geometries).• Slow to converge even on quadratic surfaces.• Require many function evaluations (especially Simplex).

Steepest Descent (memory requirements scale as 3N)• Robust.• Convergence guaranteed on quadratic surfaces.• Slow to converge especially near the minimum.

Conjugate gradient, Quasi NR (memory requirements scale as (3N)2)• Converges in approximately N steps for N degrees of freedom.• Convergence guaranteed on quadratic surfaces.• Unstable away from the minimum (i.e., when surface not quadratic).• CG is the method of choice for large systems.

Newton-Raphson (memory requirements scale as (3N)2)• Converges in one step on quadratic surfaces.• Unstable away from the minimum.• Computationally expensive.