5. ab initio modeling. and today… introduction to ab initio modeling: the basic principles rosetta...

Post on 27-Dec-2015

317 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

5. Ab initio modeling

And today…

• Introduction to ab initio modeling: the basic principles

• Rosetta ab initio modeling protocol• Grid-based large-scale modeling & FOLDIT• I-Tasser• CASP

Types of structure prediction

• Comparative modeling– Structural template detected from sequence

similarity• Fold recognition

– Structural template detected from fitness to fold (threading)

• Ab initio modeling (Free Modeling)– No obvious structural template: model whole

folding process….RosettaI-Tasser

Similarity to know

n structure

1. Select fragments consistent with local sequence preferences

2. Assemble fragments into models with native-like global properties

3. Identify the best model from the population of decoys

Basic Ab Initio Rosetta protocol

Figures adapted from Charlie Strauss;Protein structure prediction using ROSETTARohl et al (2004) Methods in Enzymology, 383:66

Fragment libraries

• 25-200 fragments for each trimer and nonamers

• Recent improvement was obtained by using fragments of additional sizes:• For a helix: length 5-19 & 3-12• For b sheet: length 4-10 & 3-7

• Selected from PDB < 2.5Å resolution & < 50% seq id

• Ranked by sequence similarity and similarity of predicted and known secondary structure

• Discard improbable conformations

1. Select fragments: local sampling C

2. Create compact decoys using fragment assembly

Advantages of approach• Fragment library

approximates Gibbs sampling

• Fragments allow an accurate, but implicit, representation of the potential energy surface for local interactions.

• Computer power can be invested in optimization of global features (e.g. compactness)

loca

l

global

Structure Representation:• Equilibrium bonds and

angles (Engh & Huber 1991)

• Centroid: average location of center of mass of side-chain(Centroid | aa, f,)

• No modeling of side chains• Fast

Low-resolution step

Sss + SHS - sheet and helix-sheet geometries

compactness of structure• Scb

• Svdw no clashes

• Srgyr globular structure

Low-resolution parameters

• Senv - burial preference (number of neighbors)

• Spair - preferred amino acid pairs (e.g. cys-cys, glu-arg, etc)

small vs. largeradius of gyration (Rgyr)

MC search with simulated annealing – start with extended conformation

1. 28K-36K random 9-mer fragment insertions (from top25 fragments)

XK Only vdw score (until all f,y have changed)

2K Add strand pairing score (0.3 weight)

20K Compactness: Increase pairing score + add Cb and Rgyr : ±local strand pairing weight

6K/4K Full strand pairing; Full centroid function

2. 3-mer fragment insertions

8K gunn-type (select among least perturbing fragments)

2. Create compact decoys using fragment assembly

Further local refinement strategiesLocal moves: how to perturb the backbone with

minimal effect on remote regions1. random torsion angle

perturbation (helix - 0o,strand <2o, rest < 3o)

– Small move - random fi,yi pair

– Shear move - Dyi-1, -Dfi

compensatory movements, move peptide plane

2. selection of globally non-perturbing fragments– Chuck move – fragments

that minimize atom msd– Gunn move – fragments

that minimize , Dy Df

Further local refinement strategiesLocal moves: how to perturb the backbone with

minimal effect on remote regions3. adjacent - f y variation to offset global effect of fragment insertion

– Wobble move – fast analytical gradient calculation

– Crank shaft - combination of several wobble moves

Smaller moves are accepted with higher frequency

wobble crank shaft

Before insertionafter insertion insertNo changes Final conformation

Fragment exchange

Local moves

Initial global changes

Further refinement

Movie by Jens Meiler

Global sampling

3. Identify best structure

• Generate decoy population (103-105)

• Filter to correct sampling biases

• Cluster analysis identifies broadest minimum

• Fullatom refinement will identify lowest energy minimum

High-resolution step: parameters

• VdW – 12-6 Lennard Jones– linear repulsion– Cutoff within 5.0-5.5Å

• Solvation (Lazaridis-Karplus)

• Hydrogen bonds

rij

polarpolar

NH

O Cd

• Weak pair potential– Electrostatic interactions– -p p, p-+

• Backbone torsions (rama score)

+-

High-resolution refinement of models

MCM protocol: • 120 steps of small & shear

moves– Random perturbation of 5/10

torsions angles (2-3o)– Side chain optimization:

rotamer trial (each 10 steps full repacking)

– minimization

• steps 1-60: gradually ramp up vdw repulsive

• steps 60-120: add side chain minimization

Side chain optimization

Backboneoptimization

Small backbone moves and MCM

vdW repulsive

Side chain optimization+ minimization

Backboneoptimization

Target 0281 CASP6• Topology sampled by ab initio trajectory of homolog sequence

(rmsd=2.2Å) • Full atom refinement reduces rmsd to 1.5Å• Side chain packing accurately recovered

First atom-resolution model

Atom-resolution Ab Initio (I)

• Challenge: Sample near-native conformations (<~2.5A)

• Approach: Model set of homologs → diverse population samples basin of attraction

Example: exposed Leucine

Models starting from extended confModels starting from native conf

Toward high-resolution de novo structure prediction.Bradley et al (2005) Science 309:1868

Low-resolution homolog folding improves prediction

• Collect 50 homologs (psi-blast 2 rounds; 60% non-redundant)

• For each– create 2000 low-resolution models– cluster, retain large clusters (n>5), and

select 500 models

• Thread query sequence back onto ~20-30K models

• Proceed to fullatom refinement: evaluate also homolog sequences (2 rounds of MCM protocol)

… … …

Atom-resolution Ab Initio (II)

Step1: low resolution– model homologs

Step2: atom resolution– 103-104 models– Energy-based model

selection

Results:11/16 proteins of length <88

residues are modeled within <5Å

Hox-B1 Ubiquitin

Sampling of b sheet topologies

• Fold-tree representation of protein allows tailored optimization

• BOINC – donate idle time of many home computers for Rosetta runs

Tera=1012 strong desktop: ~ gigaflop (109)

How can we improve? (1) More computer time

More computer time – is sampling an issue?

Perform very long runs on the grid (>106 decoys)

3 categories(a) Near-native lowest energy

model (<3.5Å) ✔(b) Problem with sampling(E near-native structures <<E decoys)

(c) Problem with energy function (E near-native structures >E decoys)

Sampling bottlenecks in de novo protein structure predictionKim et al (2007) JMB 393:249

Why don’t we sample these conformations (b) ?????

“linchpin features” are rarely sampled

• Describe models as feature vectors • Identify native features not sampled in low-

energy models

Sampling bottlenecks in de novo protein structure prediction Kim et al (2007) JMB 393:249

Native torsion bin

Frequently sampled Native torsion bin

Never sampled Native torsion bin

Residue position

Tors

ion

bins

Position 23 never samples native helix conformation simulations never succeed

O: w=cisE: left-handed strandG: left-handed helixB: right-handed strandA: right handed helix

Enforcement of native-like value for feature Some simulations now succeed

Examples for “linchpin features”

• Near active site

• Regions that form late in folding

• Irregular b strand pairing (mostly in edge strands)

How can we improve? More brains

• FOLDIT – folding game• donate idle time of many brains to improvestructure prediction• Now as Android

application!“win the Nobel prize by just

playing a game”http://vimeo.com/focusforwardfilms/semifinalists/51888393

http://www.youtube.com/user/UWfoldit Look also for “black belt” foldit lessons

Foldit

Predicting protein structures with a multiplayer online game Cooper et al (2010) Nature 466:756

Players: human spatial reasoning• Explore also strategy space: new search algorithms• Excel in solving problems where substantial backbone

rearrangement is needed to bury hydrophobic residue

Challenge: Formulate problem as game• Easy to understand to non-scientists• Competition/Collaborations

Native structure

Starting structure

Foldit Model

LLG: log likelihood of a model: useful models must have better LLG than best random models (in shade)

Starting model

Solved structure

Nature Structure and Molecular Biology 2011

Example1: help in structure determination

Example 2: Foldit Puzzle #986875

Predicting protein structures with a multiplayer online game Cooper et al (2010) Nature 466:756

• Foldit detects better structures,

• … using trajectories that visit high energy structures on the way

Native structure

Starting structure

Foldit Model

Algorithm discovery by Foldit players

Algorithm discovery by protein folding game players Khatib et al (2012) PNAS 108:18949

• Added ability to create, edit, share and rate “recipes” (each player can create its own “cookbook”)

• Evaluated what strategies evolve and how they spread among players

Main strategies

Used at different stages in during

modeling

Top Players

All Players

Algorithm discovery by protein folding game players Khatib et al (2012) PNAS 108:18949

“Blue Fuse” and “Quake”

are most popular

Many new recipes evolve from “Blue Fuse”

Algorithm discovery by Foldit players

Foldit players detect algorithms that are similar to those used in Rosetta

Algorithm discovery by protein folding game players Khatib et al (2012) PNAS 108:18949

Foldit “Blue Fuse”:• very similar to new Rosetta

protocol “Fast Relax” (repeated decrease/increase of repulsive term)

• Comparable efficiency for short runs

ab initio modeling – summary:

Roy, Kukucural, Zhang (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 5:725

• Highly accurate• Computationally expensive (~150 CPU hours/protein)

Server of Rosetta http://robetta.bakerlab.org/

Good alternative: I-Tasser• Protocol developed by Zhang and Skolnick

• Based on threading of parts of sequence onto parts of known structures

• Very efficient and accurate (~5 CPU hours/protein)

Server of iTasser http://zhanglab.ccmb.med.umich.edu/I-tasser

I-Tasser Iterative Threading Assembly Refinement (Zhang, & Skolnick)

Separate training of protocol for: easy/ medium/ hard targets

i-Tasser (Zhang & Skolnick)

Threading: 1. Create profile:

1. Psiblast -> sequence profile2. Psipred -> secondary structure profile

2. LOMETS: Metaserver for threading (FUGUE, HHSEARCH, MUSTER, PROSPECT, PPA, SP3 & SPARKS)

3. Excise aligned structure elements from top-scoring templates for next step

(20/30/50, depending on difficulty of target)

Wu, Solnick, Zhang (2007) Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biology 5:17Roy, Kukucural, Zhang (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 5:725

i-Tasser (Zhang & Skolnick)

Tasser: Schematic representation of polypeptide chain in on- and off-lattice Ca model

Zhang Y., Skolnick J. PNAS 2004;101:7594-7599

©2004 by National Academy of Sciences

Structure assembly - efficient modeling:

• 2 points/residue (Ca + SG)

• on-lattice ab initio for unaligned regions

• off-lattice for aligned regions

i-Tasser (Zhang & Skolnick)

Monte Carlo Search by replica exchange• Exchange between simulations at different

temperatures: better samplingScoring function: separately trained for easy, medium and

hard targets– Secondary structure (PSIPRED & SAM)– Statistical terms: backbone hydrogen bonds; hydrophobicity

and Ca/side chain correlations– Spatial restraints from threading templates– Sequence-based contact predictions (SVM) (and accessible surface area prediction; NN)

i-Tasser (Zhang)

Example for improvement over template

Constraints from threading; contact prediction are located at different sites and complement each other

i-Tasser (Zhang)

Clustering

additional iteration of MC simulation starting from cluster centers

Final model created by optimizing hydrogen bonds

Contact-assisted structure prediction

Kim et al. (2013) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling Proteins 82:208

ab initio restricted to small (100aa), single domain proteins• + information about contacts -> dramatic increase of scope (… 500aa)• Info from:

• Contact prediction (bioinfo)

• Experiments: e.g. NMR chemical shifts, mutagenesis, etc

Contacts may assist in1. Determination of Topology:

• Filter fragments• Find fragment pairs

2. Refinement of Topology:• Refine structure by imposing

constraintsAssessment on CASP10 of Rosetta ab initio modeling: one reliable non-local contact every <12aa> needed for reliable modeling

Contact-assisted structure prediction

Kim et al. (2013) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling Proteins 82:208

Flowchart of protocol

Topology determination: from partial threading (SPARKS, Rosetta)

Topology refinement:RosettaCM recombination protocol (next week)

Contact-assisted structure prediction

Kim et al. (2013) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling Proteins 82:208

Improved models for large structures using contacts:

native

Ab initio

Assisted ab initio

CASP• Double-blind structure prediction

experiment allows assessment of different approaches

• every 2 years; summer 2014: CASP11• Steady improvement of methodology

Categories: • Template based modeling (TBM)• Free modeling (FM)• Refinement of initial models

http://www.predictioncenter.org/casp10/meeting/talks.htmlProteins special issue vol:82, S2

Identification of “winner strategies”: • Rosetta in CASP4-6 • iTasser in CASP7 & CASP8• servers• improved combination of

multiple templates in CASP9

• CASP10: refinement with MD

• CASP11: contact prediction methods & contact-assisted modeling• New: prediction of contacts, unstructured regions, ligand binding

sites

CASPAround 130 targets in last

roundsUntil CASP 9: Target difficulty

decreases

CASP10: • 131 domains

(20 free modeling)• Targets now more difficult

than previous CASPsProteins special issue vol:79, S10

Kryshtafovych et al.(2011). Proteins 79:S196–207

Measure of performance

Compare predicted to solved structure•superimpose short fragments (length n=3,5,7 residues; iteratively)•find maximal superimposed part N, where

– N Ca atom pairs are within xÅ– 4 thresholds: x=1.0, 2.0, 4.0, 8.0

•GDT_TS = ¼ (N1+N2+N4+N8)

• 18 newly solved structures predicted prior to publication of structure.

• none recognized by sequence similarity

• none with close structural homologs

Independently assessed scoring: 2=“Well Above Average”, 1=“okay”, 0=“lousy”

Rosetta

CASP4 ab initio summary

Improvement over the years

Improvement in each round

• CASP7: in difficult region• CASP8: accuracy in

template-based modeling (few difficult cases)

• CASP9: intermediate difficulty targets

• CASP10: refinement using MD (Michael Feig)

48

Free Modeling with Rosetta in CASP8

49

T0581• Server model: predicts kinked helix• Only model with 4 beta strands (most

predictions: all helical protein)

Free Modeling with Rosetta in CASP9

model best template

Kinch et al. (2011). Proteins, 79:S59–73

50

T0806

Free Modeling with Rosetta in CASP11

model

best template

http://www.predictioncenter.org/casp11/doc/presentations/CASP11_FM_NG.pdf

• longer fragments for alpha helical proteins (5-19; 3-12)

• shorter fragments for beta sheets (4-10; 3-7)

Rosetta in CASP8: modification of fragment size improves prediction

52

FM with ITasser

• increased contribution of automatic servers– predictions of

mostly similar quality

• improve now also difficult targets

Improved automatic servers

Hum

an+

serv

er +

CASP10: Foldit platform joins the game for coopetition

• Start from Foldit models; proceed with different approaches

• Joint forces produce best model

55

• steady improvement of structure prediction over the years

• impressing quality of current ab initio modeling– efficient combination of appropriate sampling

strategies and a tailored energy function• models now often better than template• automatic servers outperform now also FM

Summary

top related