Protein modelling
● Protein structure is the key to understanding protein function
● Protein structure● Topics in modelling and computational methods
– Comparative/homology modelling– Fold recognition– Fold prediction– Dynamics of proteins
Motivation
● Protein structure determines protein function● For the majority of proteins the structure is not
known
structures
sequences
0 250000 500000 750000 1000000 1250000 1500000
Structural coverage
Correlation structure & sequence
● Chothia & Lesk (1986): Correlation between structural divergence and sequence similarity
Fold space
Tim
e
Fold 1 Fold 2
Evolution
Comparative/homology modelling
Template sequenceTemplate structure
Target sequence
Alignment
Model
The crucial importance of the alignment
● An alignment defines structurally equivalent positions!
Template sequence
Template structure
Target sequence
Alignment
Model
Steps in comparative modelling
● Find suitable template(s)● Build alignment between target and template(s)● Build model(s)
– Replace sidechains– Resolve conflicts in the structure– Model loops (regions without an alignment)
● Evaluate and select model(s)
State of the art in homology modelling
● Template search
– (iterative) sequence database searches (PSIBLAST)● Alignment step
– multiple alignment of close to fairly distant homologues● Modelling step
– rigid body assembly– segment matching– satisfaction of spatial constraints
Modelling by spatial restraints
● Generate many constraints:– Homology derived constraints
● Distances and angles between aligned positions should be similar
– Stereochemical constraints● Bond lengths, bond angles, dihedral angles, nonbonded
atom-atom contacts
● Model derived by minimizing restraints
Modeller: Sali & Blundell (1993)
Loop modelling
● Exposed loop regions usually more variable than protein core
● Often very important for protein function● Loops longer than 5 residues difficult to built● Mini-protein folding problem
Model evaluation
● Check of stereochemistry– bond lengths & angles, peptide bond planarity, side-
chain ring planarity, chirality, torsion angles, clashes● Check of spatial features
– hydrophobic core, solvent accessibility, distribution of charged groups, atom-atom-distances, atomic volumes, main-chain hydrogen bonding
● 3D profiles/mean force potentials– residue environment
Knowledge-based mean force potentials
Melo & Feytmanns (1997)
● Compute typical atomic/residue environments based on known protein structures
● Sequence from different species
● Is binding to ligand conserved?
Ligand
DNA
Modelling a transcription factor
Ligand binding domain
hydrogen bonds to ligand homo-serine lactone moiety binding acyl moiety binding
DNA binding domain
Linker DNA binding domain
Template
Target
Variable loops
New Loop
MODELLER output
Ligand binding pocket
Errors in comparative modelling
Marti-Renom et al. (2000)
a)Side chain packing
b)Distortions and shifts
c)Loops
d)Misalignments
e)Incorrect template
TemplateModel
True structure
Modelling accuracy
Marti-Renom et al. (2000)
Applications of homology modelling
Marti-Renom et al. (2000)
Structural genomics
● Post-genomics:– many new sequences, no function
● Aim: a structure for every protein● High-throughput structure determination
– robotics– standard protocols for
cloning/expression/crystallization
Structural coverage
Vitkup et al. (2001)
high quality models
Complete models
Total = 43 %
Target selection
Protein modelling
● Protein structure is the key to understanding protein function
● Protein structure● Topics in modelling and computational methods
– Comparative/homology modelling– Fold recognition– Fold prediction– Dynamics of proteins
Fold recognition
● Structure is more conserved than sequence
Limit of sequence similarity searches
Structural similarity
Fold space
Target
Protein structures
Fold recognition / Threading
● Is a sequence compatible with a structure?● The idea: evolutionary related proteins share
common folding motifs● Contact matrix = motif● Mean-force potentials
to score every contact● Optimize alignment to
minimize pseudo-energy
AAGGT YAAT YAAGGTYAATY
Protein modelling
● Protein structure is the key to understanding protein function
● Protein structure● Topics in modelling and computational methods
– Comparative/homology modelling– Fold recognition– Fold prediction– Dynamics of proteins
Fold prediction – Rosetta method
● Knowledge based scoring function
P(structure) * P(sequence|structure)
P(sequence)P(structure|sequence) =
P(structure) = probability of a protein-like structure(no clashes, globular shape)
P(sequence|structure) = f(residue contacts in native structures)
Simons et al. (1997)
Bayes' law:
protein-likestructures
sequence consistentlocal structure
near-native structures
Environment specific scoring function
● Environment Ei specific interactions
● Environment – defined by the number of neighbours– implicitely distinguishes between buried and exposed
residuesP žaa1, aa2,‹ , aan#structureŸ= P žaai#E iŸ
P žaa i , aa j#rij , E i , E jŸP žaai#rij , E i , E jŸ P žaa j#r ij , E i , E jŸi i<j
cf. mean force potential
Simons et al. (1997)
Collection of putative backbone conformations
Protein sequence
Library of small segments
sequences structures
... ...For each window of 9 residues:
lookup 25 closest (sequence) neighbours in library
...
Simons et al. (1997)
MC-SA optimization
Simons et al. (1997)
● for each random position– pick a random neighbour– replace backbone conformation– calculate probability of new structure
● MC: Monte-Carlo– accept up-hill moves with a certain probability
● SA: simulated annealing– first allow many changes, later less changes
Results
● Small molecules: ok● Proteins with mostly
α-helices: ok● Proteins with mostly
β-sheets: not so ok
Simons et al. (1997)
Dynamics of proteins
● Protein structure is the key to understanding protein function
● Protein structure● Topics in modelling and computational methods
– Comparative/homology modelling– Fold recognition– Fold prediction– Dynamics of proteins
Dynamics of proteins
● Local Motions (0.01 to 5 Å, 10-15 to 10-1 s)
– Atomic fluctuations
– Sidechain Motions
– Loop Motions
● Rigid Body Motions (1 to 10Å, 10-9 to 1s)
– Helix Motions
– Domain Motions (hinge bending)
– Subunit motions
● Large-Scale Motions (> 5Å, 10-7 to 104 s)
– Helix coil transitions
– Dissociation/Association
– Folding and Unfolding
Molecular dynamics/molecular modelling
● Molecular mechanics● Normal mode analysis● Quantum mechanical simulations● ...
Molecular mechanics● Atom representation
– sphere– charge– topology
● Forces– Bonded interactions– Non-bonded interactions
● Electrostatic interactions● Van-der-Waals interactions
– Forcefields: AMBER, GROMOS, ...● Newton's law of mechanics
http://cmm.info.nih.gov/modeling/guide_documents/molecular_mechanics_document.html
Molecular mechanics
● Molecular mechanics simulations take long!– because of the size of the system
● Proteins are large ● Water molecules to consider solvent effects● 10.000 to millions of atoms
– because of the number of iterations● update atom positions according to time-scale of fastest
fluctuations: bond vibrations ca. 1 fs● movements of interest frequently have long time-scale,
e.g. folding● 1s => 1015 iterations!
Benefit of simulations
● Result is an ensemble of structures– Time-averaged statistical quantities– e.g., relative free energies of different conformations
● Protein engineering– e.g., relative free energies of different mutants
● Physical accuracy of models?– chemical reactions?– cutoff and long-range interactions? – dielectric constant?
movie from: C. Letner, G. Alter Journal of Molecular Structure (Theochem) 368 (1996) 205–212
The end
Proteins are beautiful!
www.holmgroup.org