11/11/05 d dobbs isu - bcb 444/544x: protein structure prediction1 11/11/05 protein structure...

43
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Predicti on 1 11/11/05 Protein Structure Prediction & Modeling

Upload: oswin-gibson

Post on 25-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 1

11/11/05

Protein StructurePrediction & Modeling

Page 2: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 2

Bioinformatics SeminarsNov 11 Fri 12:10 BCB Seminar in E164 Lago

Building Supertrees Using DistancesSteve Willson, Dept of Mathematicshttp://www.bcb.iastate.edu/courses/BCB691-F2005.html

Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link)

Nov 14 Mon 1:10 PM Doug Brutlag, StanfordDiscovering transcription factor binding sites

Nov 15 Tues 1:10 PM Ilya Vakser, Univ Kansas Modeling protein-protein interactions both seminars will be in Howe Hall Auditorium

Page 3: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 3

Protein Structure & Function:Analysis & Prediction

Mon Protein structure: basics; classification,databases,

visualization Wed Protein structure databases - cont.

Thurs Lab Protein structure databases Visualization software Secondary structure prediction

Fri Protein structure prediction Protein-nucleic acid interactions

Page 4: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 4

Reading Assignment (for Mon-Fri)

Mount Bioinformatics• Chp 10 Protein classification & structure prediction

http://www.bioinformaticsonline.org/ch/ch10/index.html

• pp. 409-491 • Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html

Page 5: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 5

BCB 544 Additional Reading

Required:• Gene Prediction: • Burge & Karlin 1997 JMB 268:78

Prediction of complete gene structures in human genomic DNA

Optional:

• Structure Prediction: • Schueler-Furman…Baker 2005 Science 310:638

Progress in modeling of protein structures and interactions

Page 6: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 6

Review last lecture:

Protein Structure: Databases, Classification &

Visualization

Page 7: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 7

Protein sequence databases

• UniProt (SwissProt, PIR, EBI)http://www.pir.uniprot.org

• NCBI Protein http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein

More on these later: protein function prediction

Page 8: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 8

Protein sequence & structure: analysis

• Diamond STING Millennium - many useful structure analysis tools, including Protein Dossier http://trantor.bioc.columbia.edu/SMS/

• SwissProt (UniProt)protein knowledgebase

http://us.expasy.org/sprot

• InterPROsequence analysis tools

http://www.ebi.ac.uk/interpro

Page 9: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 9

Protein structure databases

• PDB Protein Data Bank http://www.rcsb.org/pdb/ (RCSB) - THE protein structure database

• MMDB Molecular Modeling Databasehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure

(NCBI Entrez) - has "added" value

• MSD Molecular Structure Database http://www.ebi.ac.uk/msd

Especially good for interactions, binding sites

Page 10: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 10

Protein structure classification

• SCOP = Structural Classification of ProteinsLevels reflect both evolutionary and structural relationshipshttp://scop.mrc-lmb.cam.ac.uk/scop

• CATH = Classification by Class, Architecture, Topology & Homology

http://cathwww.biochem.ucl.ac.uk/latest/

• DALI/FSSP (recently moved to EBI & reorganized)• fully automated structure alignments

• DALI serverhttp://www.ebi.ac.uk/dali/index.html• DALI Database (fold classification)

http://ekhidna.biocenter.helsinki.fi/dali/start

Page 11: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 11

Protein structure visualization

• Molecular Visualization Freeware:http://www.umass.edu/microbio/rasmol

• MolviZ.Orghttp://www.umass.edu/microbio/chime

• Protein Explorer http://www.umass.edu/microbio/chime/pe/protexpl/frntdoor.htm

• RASMOL (& many decendents: Protein Explorer,PyMol, MolMol, etc.)http://www.umass.edu/microbio/rasmol/index2.htm

• CHIMEhttp://www.umass.edu/microbio/chime/

getchime.htm

• Cn3D http://www.biosino.org/mirror/www.ncbi.nlm.nih.gov/Structure/cn3d/

• Deep View = Swiss-PDB Viewerhttp://www.expasy.org/spdbv

Page 12: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 12

Protein structure visualization

Superb interactive structure visualization software

by Jane & Dave Richardson, Duke University

• KINIMAGEhttp://

kinemage.biochem.duke.edu/

• Fantastic research tools for structure analysis & refinement

Page 13: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 13

RCSB PDB - Beta site http://pdbbeta.rcsb.org/pdb/Welcome.do

Page 14: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 14

MMDBhttp://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml

Page 15: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 15

Cn3Dhttp://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml

Page 16: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 16

Cn3D : Displaying 2' Structures

Page 17: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 17

Cn3D: Structural Alignments

Page 18: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 18

SCOP - Structure Classification

http://scop.mrc-lmb.cam.ac.uk/scop

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 19: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 19

6 main classes of protein structure

1) Domains • Bundles of helices connected by loops

2) Domains• Mainly antiparallel sheets, usually with 2 sheets

forming sandwich

3) Domains• Mainly parallel sheets with intervening helices, also

mixed sheets

4) Domains • Mainly segregated helices and sheets

5) Multidomain (• Containing domains from more than one class

6) Membrane & cell-surface proteins

Page 20: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 20

CATH - Structure Classification

http://cathwww.biochem.ucl.ac.uk/latest/

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 21: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 21

Structural Genomics

~ 30,000 "traditional" genes in human genome

(not counting: ???)~ 3,000 proteins in a typical cell> 2 million sequences in UniProt> 33,000 protein structures in the PDB Experimental determination of protein

structure lags far behind sequence determination!

Goal: Determine structures of "all" protein folds in nature, using combination of experimental structure determination methods (X-ray crystallography, NMR, mass spectrometry) & structure prediction

Page 22: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 22

Structural Genomics Projects

TargetDB: database of structural genomics targetshttp://targetdb.pdb.org

Protein Structure Prediction?

Page 23: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 23

Protein Folding

"Major unsolved problem in molecular biology"

In cells: spontaneousassisted by enzymesassisted by chaperones

In vitro: many proteins fold spontaneously & many do not!

Page 24: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 24

Steps in Protein Folding

1- "Collapse"- driving force is burial of hydrophobic aa’s

(fast - msecs)2- Molten globule - helices & sheets form, but "loose"

(slow - secs)3- "Final" native folded state - compaction, some 2'

structures rearranged

Native state? - assumed to be lowest free energy - may be an ensemble of structures

Page 25: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 25

Protein Dynamics

• Protein in native state is NOT static• Function of many proteins depends on

conformational changes, sometimes large, sometimes small

• Globular proteins are inherently "unstable"(NOT evolved for maximum stability)

• Energy difference between native and denatured state is very small (5-15 kcal/mol)

(this is equivalent to 1 or 2 H-bonds!)• Folding involves changes in both entropy &

enthalpy

Page 26: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 26

Protein Structure Prediction

• Structure is largely determined by sequence

BUT:• Similar sequences can assume different structures• Dissimilar sequences can assume similar structures• Many proteins are multi-functional• Protein folding:

• determination of folding pathways • prediction of tertiary structure

still largely unsolved problems

Page 27: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 27

New today:

Protein Structure Prediction

Secondary structure(text focuses on this - I won't)

Tertiary structure(let's do this instead!)

Page 28: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 28

Deciphering the Protein Folding Code• Protein Structure

Prediction or "Protein Folding" Problem

given the amino acid sequence of a protein, predict its 3-dimensional structure (fold)

• "Inverse Folding" Problem

given a protein fold, identify every amino acid sequence that can adopt its 3-dimensional structure

Page 29: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 29

Protein Structure Determination?

High-resolution structure determination• X-ray crystallography (<1A)• Nuclear magnetic resonance (NMR) (~1-

2.5A)

Lower-resolution structure determination• Cryo-EM (electron-microscropy) ~10-15A

Theoretical Models?• Highly variable - now, some equiv to X-ray!

Page 30: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 30

Tertiary Structure PredictionFold or tertiary structure prediction

problem can be formulated as a search for minimum energy conformation• search space is defined by psi/phi angles of backbone

and side-chain rotamers• search space is enormous even for small proteins!• number of local minima increases exponentially

of the number of residues

Computationally it is an exceedingly difficult problem!

Page 31: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 31

Ab Initio Prediction1. Develop energy function

• bond energy• bond angle energy• dihedral angle energy• van der Waals energy• electrostatic energy

2. Calculate structure by minimizing energy function (usually Molecular Dynamics or Monte Carlo methods)

Ab initio prediction - not practical in general• Computationally? very expensive• Accuracy? Usually poor for all but short peptides

(but see Baker review!)

Provides both folding pathway & folded structure

Page 32: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 32

Comparative Modeling

Provide folded structure only

Two primary methods 1) Homology modeling2) Threading (fold recognition)

Note: both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target

Page 33: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 33

Homology Modeling

1. Identify homologous protein sequences• PSI-BLAST • multiple sequence alignment (MSA)

2. Among those with available structures, choose closest sequence match for template

3. Build model by placing residues into corresponding positions of homologous structure models & refine by "tweaking"

Homology modeling - works "well"• Computationally? not very expensive• Accuracy? higher sequence identity better

modelRequires >30% sequence identity

Page 34: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 34

Threading - Fold RecognitionIdentify “best” fit between target sequence & template structure

1. Develop energy function2. Develop template library3. Align target sequence with each template &

score4. Determine best score (1D to 3D alignment)5. Build refine structure as in homology

modeling Threading - works "sometimes"

1. Computationally? Can be expensive or cheap, depends on energy function & whether "all atom" or "backbone only" threading

2. Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck")

3. But, usually higher sequence identity better model

Page 35: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 35

1. Align target sequence with template structures(fold library) from the Protein Data Bank (PDB)

2. Calculate energy (score) to evaluate goodness of fit between target sequence & template structure

3. Rank models based on energy scores

Target Sequence

Structure Templates

ALKKGF…HFDTSE

Threading - a "local" example

Page 36: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 36

Threading Goals & Issues

Structure database - must be complete: no decent model if no good template in library!

Sequence-structure alignment algorithm:Bad alignment Bad score!

Energy function (scoring scheme): • must distinguish correct sequence-fold alignment from

incorrect sequence-fold alignments • must distinguish “correct” fold from close decoys

Prediction reliability assessment - how determine whether predicted structure is correct (or even close?)

Find “correct” sequence-structure alignment of a target sequence with its native-like fold in PDB

Page 37: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 37

Threading – Structure database

Build a template database (e.g., ASTRAL domain library derived from

PDB)

Supplement with additional decoys, e.g., generated usingab initio approach such as Rosetta (Baker)

Page 38: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 38

Threading - Energy function

Two main methods (and combinations of these)

• Structural profile (environmental) physico-chemical properties of aa’s• Contact potential (statistical)

based on contact statistics from PDB (Miyazawa & Jernigan - Jernigan

now at ISU)

Page 39: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 39

Protein Threading – typical energy function

MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

How well does a specific residue fit structural environment?

What is "probability" that two specific residues are in contact?

Alignment gap penalty?

Total energy: E_p + E_s + E_g

Find a sequence-structure alignment that minimizing the energy function

Page 40: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 40

A Rapid Threading Approach for Protein Structure Prediction

Kai-Ming Ho, Physics Haibo Cao

Yungok Ihm Zhong Gao

James MorrisCai-zhuang

Wang Drena Dobbs, GDCB

Jae-Hyung LeeMichael

TerribiliniJeff Sander

Page 41: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 41

Performance Evaluation? "Blind Test"

CASP5 Competition (Critical Assessment of Protein Structure

Prediction) Given: Amino acid sequence Goal: Predict 3-D structure (before experimental results published)

Page 42: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

Typical Results: well, actually, BEST Results: HO = #1 ranked CASP prediction for this

target

Target 174

PDB ID = 1MG7

Actual Structure

Predicted Structure

T174_1

T174_2

Page 43: 11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction1 11/11/05 Protein Structure Prediction & Modeling

FR Fold Recognition

(targets manually assessed by Nick Grishin)

-----------------------------------------------------------

Rank Z-Score Ngood Npred NgNW NpNW Group-name 1 24.26 9.00 12.00 9 12 Ginalski

2 21.64 7.00 12.00 7 12 Skolnick Kolinski

3 19.55 8.00 12.50 9 14 Baker

4 16.88 6.00 10.00 6 10 BIOINFO.PL

5 15.25 7.00 7.00 7 7 Shortle

6 14.56 6.50 11.50 7 13 BAKER-ROBETTA

7 13.49 4.00 11.00 4 11 Brooks

8 11.34 3.00 6.00 3 6 Ho-Kai-Ming

9 10.45 3.00 5.50 3 6 Jones-NewFold -----------------------------------------------------------

FR NgNW - number of good predictions without weighting for multiple models

FR NpNW - number of total predictions without weighting for multiple models

Overall Performance in CASP5 Contest (M. Levitt, Stanford)