cs612 - algorithms in bioinformaticsnurith/cs612/sampling.pdfnurit haspel cs612 - algorithms in...

CS612 - Algorithms in Bioinformatics

Sampling

April 23, 2019

From a Rigid Ligand to a Flexible Ligand

Torsional (Dihedral) Degrees of Freedom (DOF)

Nurit Haspel CS612 - Algorithms in Bioinformatics

Kinematics

Kinematics is a branch of classical mechanics that describesthe motion of points, bodies (objects), and systems of bodies(groups of objects) without considering the forces that causedthe motion.

A kinematics problem begins by describing the geometry of asystem and the initial conditions of any known values ofposition, velocity and/or acceleration of points in the system.

Then, using geometric methods, the position, velocity andacceleration of any unknown parts of the system can bedetermined.

Forward kinematics is the use of the kinematic equations of arobot to compute the position of the end-effector fromspecified values for the joint parameters.

In protein motion, the problem becomes computing the newlocations of the atoms given a set of dihedral rotations.


Robotics-inspired Approach to Protein Flexibility

Similarity between proteins and robots: exploration ofcomplex high-dimensional space

Similarity exploited to sample conformations with spatialconstraints

Articulated manipulator Protein Extended Backbone



Exploration of protein conformational space has parallels inrobotics

0/1 collisions for robots versus energy field for proteins

adapted from J.-C.Latombe, Stanford

adapted from P. Smith,KSU



Dimensionality of configuration space

DOFs (rigid-body transformations and DOFs of the ligand)Too many DOFs mean that the configuration space of theligand is high-dimensional and difficult to searchSimilar issue when planning motions for an articulated roboticchain in a cluttered environment

Geometric complexity of the free space

Difficult to determine whether a ligand conformation andspecific position and orientation result in a good fitSimilar issue for an articulated robot

Address: Plan motions in the configuration space but compute inworkspace (protein surface or cavity)!


Probabilistic Roadmap Motion Planning (PRM)

Conf. space Forbidden space Free space



Configurations are sampled by picking coordinates at random



Sampled configurations are tested for collision (in workspace!)



The collision-free configurations are retained as “milestones”



Each milestone is linked by straight paths to its k-nearest neighbors



The collision-free links are retained to form the PRM



Finding paths in the map.


Application of PRM to Protein-Ligand Docking

Protein is assumed to berigid

A fixed coordinate system Pis attached to the protein

Ligand is a small flexiblemolecule

A moving coordinate systemL is defined using threebonded atoms in the ligand

A conformation of the ligandis defined by the positionand orientation of L relativeto P and the torsional anglesof the ligand

x y

z

x y

z

A.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning Approach to Flexible Ligand Binding. Proc. 7thISMB, pp. 252-261, 1999


Roadmap Construction: Node Generation

The nodes of the roadmap aregenerated by samplingconformations of the liganduniformly at random in theparameter space (around theprotein)

The energy of each sampledconformation is E = Einteraction(electrostatic) + Einternal (vdw)A sampled conformation isretained with probability:

p =

0 if E > Emax

Emax−EEmax−Emin

if Emin ≤ E ≤ Emax1 if E < Emin

x y

z

x y

z

Results in denser distribution ofnodes in low-energy regions ofconformational space


Roadmap Construction: Edge Generation

q q′qi qi+1

Each node is connected toits closest neighbors bystraight edges

Each edge is discretized sothat between qi and qi+1 noatom moves by more thansome ε = 1Å.

x y

z

x y

z

Results in denser distribution ofnodes in low-energy regions ofconformational space


Querying the Roadmap

For a given goal node qg(e.g., binding conformation),the Dijkstras single-sourceshortest-path algorithmcomputes the lowest-weightpaths from qg to each node(in either direction) inO(N logN) time, where N= number of nodes

Various quantities can thenbe easily computed in O(N)time, e.g., average weightsof all paths entering qg andof all paths leaving qg(binding and dissociationrates Kon and Koff )


Computing Binding Conformations

Sample many (several1000s) ligand’sconformations at randomaround protein

Repeat several times:

Select lowest-energyconformations that are closeto protein surface

Re-sample around them

Retain k (approx. 10)lowest-energy conformationswhose centers of mass are atleast 5Å apart

Active site

?

lactate dehydrogenase


Testing on Three Complexes

PDB ID: 1ldm Receptor: Lactate Dehydrogenase (2386atoms, 309 residues) Ligand: Oxamate (6 atoms, 7 dofs)

PDB ID: 4ts1 Receptor: Mutant of tyrosyl-transfer-RNAsynthetase (2423 atoms, 319 residues) Ligand: L-leucyl-hydroxylamine (13 atoms, 9 dofs)

PDB ID: 1stp Receptor: Streptavidin (901 atoms, 121residues) Ligand: Biotin (16 atoms, 11 dofs)


Finding Folding Pathways Using RPM

Degrees of freedom – number of rotatable backbone dihedralangles (approx. 2N, number of amino acids)

Nodes generated in a similar manner as the docking schemeabove.

Sampling cannot be done at random due to highdimensionality – sampling is done from a set of distributionsaround the native state.

Edges connect neighboring nodes in a similar manner to theone described above.

Can be used to discover folding pathways, intermediatestructures and other folding events.

G. Song, N. Amato, RECOMB 2001


From Flexible Ligand to Flexible Receptor?

Modeling full receptor flexibility is very difficult!

In order for this process to become efficient, we must find arepresentation for protein flexibility that avoids the directsearch of a solution space comprised of thousands of degreesof freedom.

There are several methods available, and the accuracy of theresults is usually directly proportional to the computationalcomplexity of the representation.


From Flexible Ligand to Flexible Receptor?

The dimensionality of the proteinconformational space is much larger thanthat of a small ligand

PRM-based methods that samplethousands of conformations to get a goodview of the ligand conformational spaceare not sufficient

Challenge: from 7-10 DOFs to thousandsof DOFs

Goal: Model protein flexibility to capturerelevant conformations of the flexible receptor


Receptor Flexibility – Soft Receptor

Soft receptors can be easily generated by relaxing the highVdW energy penalty

The rationale is that the receptor structure has some inherentflexibility which allows it to adapt to slightly differentlyshaped ligands.

If the change in the receptor conformation is small enough, itis assumed that the receptor is capable of such aconformational change.

It is also assumed that the change in protein conformationdoes not incur a sufficiently high energetic penalty to offsetthe improved interaction energy between the ligand and thereceptor.

It is also quite easy to implement (relax the collisioncomponent).


Receptor Flexibility – Selecting Specific DOFs

is it possible to select only a few degrees of freedom to modelexplicitly.

They usually correspond to rotations around single bonds

These degrees of freedom are usually considered the naturaldegrees of freedom in molecules.

Rotations around bonds lead to deviations from idealgeometry that result in a small energy penalty when comparedto deviations from ideality in bond lengths and bond angles.

Selection of which torsional degrees of freedom to model isusually the most difficult part of this method because itrequires a considerable amount of a priori knowledge.

The torsions chosen are usually rotations of side chains in thebinding site of the receptor protein.

It is also common to further reduce the search space by usingrotamer libraries.


Receptor Flexibility – Ensemble Docking

One possible way to represent a flexible receptor for drugdesign applications is the use of multiple static receptorstructures

The best description for a protein structure is that of aconformational ensemble of slightly different protein structurescoexisting in a low energy region of the potential energysurface.

The structures can be determined experimentally either fromX-ray crystallography or NMR, or generated via computationalmethods such as Monte Carlo or MD simulations.


Modeling Limited Receptor Flexibility

Selection of specific degreesof freedom such as ondesignated amino acids onbinding site

Shown here:Acetylcholinesterase:Phe330 flexible – acts asswinging gate


Modeling Limited Receptor Flexibility

Moving larger number of amino acids (illustration onacetylcholinesterase)


Receptor Flexibility – Collective DOF

Collective DOF allows therepresentation of full proteinflexibility without a dramaticincrease in computationalcost.

One method is thecalculation of normal modesfor the receptor.

Alternatively, we can usedimensionality reductionmethods.

The most commonly usedmethod for the study ofprotein motions is principalcomponent analysis (PCA).


Inverse Kinematics (IK)

Inverse kinematics is the problem of finding the right valuesfor the underlying degrees of freedom of a chain.

In the case of a protein chain these degrees of freedom of thedihedral angles, so that the chain satisfies certain spatialconstraints.

For example, in some applications, it is necessary to findrotations that can steer certain atoms to desired locations inspace.

The applications of inverse kinematics to protein structureinclude mainly loop modeling and generating ensembles ofstructures.

In this case - manipulate the rotational degrees of freedom ofa loop region to find possible loop conformations that attachto the rest of the protein.


Modeling Loops Using Inverse Kinematics

Goal: Model the ensemble of conformations of a protein.

It is known that proteins are not rigid but fluctuate about anensemble of structures under equilibrium conditions.

Focus mostly on loop regions, as they are the most flexibleones.



Inverse kinematics: Manipulate the degrees of freedom of anarticulated chain to satisfy some end-constraints.

In this case - manipulate the rotational degrees of freedom ofa loop region to find possible loop conformations that attachto the rest of the protein.

Cyclic Coordinate Descent (CCD): solve for and rotate onedihedral at a time.

Canutescu A. A., and Dunbrack R. L. Protein Science 12, 2003


CCD for Inverse Kinematics

Goal: find optimal values tosimultaneously steer thethree backbone atoms of theend of the fragment to theirtarget positions.

Current positions beforerotation - M0, after rotationM and target positions F .

S is the sum of squareddistances between currentpositions and targetpositions

Steering these three atomsto their target positionsrequires minimizing S .



S is defined as:

S = |~F1M1|2 + |~F2M2|2 + |~F3M3|2

Where~F1M1 = ~O1M1 − ~O1F1

Notice that it is a 2D rotation around the plane defined by ther̂ and ŝ local axes.

The squared norm of the vector M − F (denoted FM) has thisvalue for each of the three atoms, so we can sum the threecontributions to S .

We can express the rotation with respect to the r̂ and ŝ planeas:

~O1M1 = r1 cos θr̂1 + r1 sin θŝ1

r1 is the vector between O and M01, which we want to rotateby θ.



From the previous equations above it follows that:

~FiMi = ri cos θr̂i + ri sin θŝi − ~fi ≡ ~di , i = 1, 2, 3

Calculating the squared distances between the moving atomsand the fixed target atoms, we obtain:

|~di |2 = r2i + f 2i − 2ri cos θ(~fi 1 · r̂i )− 2ri sin θ(~fi · ŝi )Putting it all together, we can express S as the sum of thesquared distances above.

Differentiating with respect to θ gives us:

dS

dθ=

d |~d1|2dθ

+d |~d2|2dθ

+d |~d3|2dθ

whered |~di |2dθ

= 2ri sin θ(~fi · r̂i )− 2ri cos θ(~fi · ŝi )



After a little bit of math, S can be written as:

S = a−√

b2 + c2 cos(θ − α)

S is minimum when θ = α. Now we have explicit values forsine and cosine.

Notice that the Time complexity is linear time on the numberof DOFs to solve for all dihedrals of a chain.



Cyclic Coordinate Descent:solve for and rotate onedihedral at a time

Given: atom at currentposition M, target position F

Goal: Solve for dihedral θs.t.|F −M|2 = S(θ) < εthreshold

Time complexity: Lineartime on the nr. DOFs tosolve for all dihedrals of achain



Since there is redundancy, many solutions are feasible.

Find rotations to satisfy spatial constraints on atoms Combinewith energy minimization to obtain physical structures

Example: Chymotrypsin inhibitor 2


Equilibrium Fluctuations

More DOFs than spatial constraints can be exploited to generatefragment fluctuations

Example: Chymotrypsin inhibitor 2



Sample equilibrium fluctuations:

Spatially constrained through Cyclic Coordinate Descent

Energetically constrained to be feasible

Local Fluctuations inα-Lactalbumin

Boltzmann ensemble average

RMSDx =∑

Confs

RMSD(C ,Cnative)e−β∆Ec

Q

∆Ec = Ec − EnativeQ =

∑Confs

e−β∆Ec



α-Lactalbumin (α-Lac)

123 residues

Hydrogen exchangeprotection factors available

Ubiquitin

76 residues NMRinformation on fluctuationsavailable


cs612 - algorithms in bioinformaticsnurith/cs612/sampling.pdfnurit haspel cs612 - algorithms in...

Documents