11/11/05 d dobbs isu - bcb 444/544x: protein structure prediction1 11/11/05 protein structure...
TRANSCRIPT
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 1
11/11/05
Protein StructurePrediction & Modeling
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 2
Bioinformatics SeminarsNov 11 Fri 12:10 BCB Seminar in E164 Lago
Building Supertrees Using DistancesSteve Willson, Dept of Mathematicshttp://www.bcb.iastate.edu/courses/BCB691-F2005.html
Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link)
Nov 14 Mon 1:10 PM Doug Brutlag, StanfordDiscovering transcription factor binding sites
Nov 15 Tues 1:10 PM Ilya Vakser, Univ Kansas Modeling protein-protein interactions both seminars will be in Howe Hall Auditorium
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 3
Protein Structure & Function:Analysis & Prediction
Mon Protein structure: basics; classification,databases,
visualization Wed Protein structure databases - cont.
Thurs Lab Protein structure databases Visualization software Secondary structure prediction
Fri Protein structure prediction Protein-nucleic acid interactions
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 4
Reading Assignment (for Mon-Fri)
Mount Bioinformatics• Chp 10 Protein classification & structure prediction
http://www.bioinformaticsonline.org/ch/ch10/index.html
• pp. 409-491 • Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 5
BCB 544 Additional Reading
Required:• Gene Prediction: • Burge & Karlin 1997 JMB 268:78
Prediction of complete gene structures in human genomic DNA
Optional:
• Structure Prediction: • Schueler-Furman…Baker 2005 Science 310:638
Progress in modeling of protein structures and interactions
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 6
Review last lecture:
Protein Structure: Databases, Classification &
Visualization
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 7
Protein sequence databases
• UniProt (SwissProt, PIR, EBI)http://www.pir.uniprot.org
• NCBI Protein http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
More on these later: protein function prediction
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 8
Protein sequence & structure: analysis
• Diamond STING Millennium - many useful structure analysis tools, including Protein Dossier http://trantor.bioc.columbia.edu/SMS/
• SwissProt (UniProt)protein knowledgebase
http://us.expasy.org/sprot
• InterPROsequence analysis tools
http://www.ebi.ac.uk/interpro
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 9
Protein structure databases
• PDB Protein Data Bank http://www.rcsb.org/pdb/ (RCSB) - THE protein structure database
• MMDB Molecular Modeling Databasehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure
(NCBI Entrez) - has "added" value
• MSD Molecular Structure Database http://www.ebi.ac.uk/msd
Especially good for interactions, binding sites
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 10
Protein structure classification
• SCOP = Structural Classification of ProteinsLevels reflect both evolutionary and structural relationshipshttp://scop.mrc-lmb.cam.ac.uk/scop
• CATH = Classification by Class, Architecture, Topology & Homology
http://cathwww.biochem.ucl.ac.uk/latest/
• DALI/FSSP (recently moved to EBI & reorganized)• fully automated structure alignments
• DALI serverhttp://www.ebi.ac.uk/dali/index.html• DALI Database (fold classification)
http://ekhidna.biocenter.helsinki.fi/dali/start
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 11
Protein structure visualization
• Molecular Visualization Freeware:http://www.umass.edu/microbio/rasmol
• MolviZ.Orghttp://www.umass.edu/microbio/chime
• Protein Explorer http://www.umass.edu/microbio/chime/pe/protexpl/frntdoor.htm
• RASMOL (& many decendents: Protein Explorer,PyMol, MolMol, etc.)http://www.umass.edu/microbio/rasmol/index2.htm
• CHIMEhttp://www.umass.edu/microbio/chime/
getchime.htm
• Cn3D http://www.biosino.org/mirror/www.ncbi.nlm.nih.gov/Structure/cn3d/
• Deep View = Swiss-PDB Viewerhttp://www.expasy.org/spdbv
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 12
Protein structure visualization
Superb interactive structure visualization software
by Jane & Dave Richardson, Duke University
• KINIMAGEhttp://
kinemage.biochem.duke.edu/
• Fantastic research tools for structure analysis & refinement
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 13
RCSB PDB - Beta site http://pdbbeta.rcsb.org/pdb/Welcome.do
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 14
MMDBhttp://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 15
Cn3Dhttp://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 16
Cn3D : Displaying 2' Structures
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 17
Cn3D: Structural Alignments
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 18
SCOP - Structure Classification
http://scop.mrc-lmb.cam.ac.uk/scop
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 19
6 main classes of protein structure
1) Domains • Bundles of helices connected by loops
2) Domains• Mainly antiparallel sheets, usually with 2 sheets
forming sandwich
3) Domains• Mainly parallel sheets with intervening helices, also
mixed sheets
4) Domains • Mainly segregated helices and sheets
5) Multidomain (• Containing domains from more than one class
6) Membrane & cell-surface proteins
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 20
CATH - Structure Classification
http://cathwww.biochem.ucl.ac.uk/latest/
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 21
Structural Genomics
~ 30,000 "traditional" genes in human genome
(not counting: ???)~ 3,000 proteins in a typical cell> 2 million sequences in UniProt> 33,000 protein structures in the PDB Experimental determination of protein
structure lags far behind sequence determination!
Goal: Determine structures of "all" protein folds in nature, using combination of experimental structure determination methods (X-ray crystallography, NMR, mass spectrometry) & structure prediction
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 22
Structural Genomics Projects
TargetDB: database of structural genomics targetshttp://targetdb.pdb.org
Protein Structure Prediction?
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 23
Protein Folding
"Major unsolved problem in molecular biology"
In cells: spontaneousassisted by enzymesassisted by chaperones
In vitro: many proteins fold spontaneously & many do not!
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 24
Steps in Protein Folding
1- "Collapse"- driving force is burial of hydrophobic aa’s
(fast - msecs)2- Molten globule - helices & sheets form, but "loose"
(slow - secs)3- "Final" native folded state - compaction, some 2'
structures rearranged
Native state? - assumed to be lowest free energy - may be an ensemble of structures
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 25
Protein Dynamics
• Protein in native state is NOT static• Function of many proteins depends on
conformational changes, sometimes large, sometimes small
• Globular proteins are inherently "unstable"(NOT evolved for maximum stability)
• Energy difference between native and denatured state is very small (5-15 kcal/mol)
(this is equivalent to 1 or 2 H-bonds!)• Folding involves changes in both entropy &
enthalpy
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 26
Protein Structure Prediction
• Structure is largely determined by sequence
BUT:• Similar sequences can assume different structures• Dissimilar sequences can assume similar structures• Many proteins are multi-functional• Protein folding:
• determination of folding pathways • prediction of tertiary structure
still largely unsolved problems
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 27
New today:
Protein Structure Prediction
Secondary structure(text focuses on this - I won't)
Tertiary structure(let's do this instead!)
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 28
Deciphering the Protein Folding Code• Protein Structure
Prediction or "Protein Folding" Problem
given the amino acid sequence of a protein, predict its 3-dimensional structure (fold)
• "Inverse Folding" Problem
given a protein fold, identify every amino acid sequence that can adopt its 3-dimensional structure
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 29
Protein Structure Determination?
High-resolution structure determination• X-ray crystallography (<1A)• Nuclear magnetic resonance (NMR) (~1-
2.5A)
Lower-resolution structure determination• Cryo-EM (electron-microscropy) ~10-15A
Theoretical Models?• Highly variable - now, some equiv to X-ray!
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 30
Tertiary Structure PredictionFold or tertiary structure prediction
problem can be formulated as a search for minimum energy conformation• search space is defined by psi/phi angles of backbone
and side-chain rotamers• search space is enormous even for small proteins!• number of local minima increases exponentially
of the number of residues
Computationally it is an exceedingly difficult problem!
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 31
Ab Initio Prediction1. Develop energy function
• bond energy• bond angle energy• dihedral angle energy• van der Waals energy• electrostatic energy
2. Calculate structure by minimizing energy function (usually Molecular Dynamics or Monte Carlo methods)
Ab initio prediction - not practical in general• Computationally? very expensive• Accuracy? Usually poor for all but short peptides
(but see Baker review!)
Provides both folding pathway & folded structure
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 32
Comparative Modeling
Provide folded structure only
Two primary methods 1) Homology modeling2) Threading (fold recognition)
Note: both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 33
Homology Modeling
1. Identify homologous protein sequences• PSI-BLAST • multiple sequence alignment (MSA)
2. Among those with available structures, choose closest sequence match for template
3. Build model by placing residues into corresponding positions of homologous structure models & refine by "tweaking"
Homology modeling - works "well"• Computationally? not very expensive• Accuracy? higher sequence identity better
modelRequires >30% sequence identity
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 34
Threading - Fold RecognitionIdentify “best” fit between target sequence & template structure
1. Develop energy function2. Develop template library3. Align target sequence with each template &
score4. Determine best score (1D to 3D alignment)5. Build refine structure as in homology
modeling Threading - works "sometimes"
1. Computationally? Can be expensive or cheap, depends on energy function & whether "all atom" or "backbone only" threading
2. Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck")
3. But, usually higher sequence identity better model
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 35
1. Align target sequence with template structures(fold library) from the Protein Data Bank (PDB)
2. Calculate energy (score) to evaluate goodness of fit between target sequence & template structure
3. Rank models based on energy scores
Target Sequence
Structure Templates
ALKKGF…HFDTSE
Threading - a "local" example
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 36
Threading Goals & Issues
Structure database - must be complete: no decent model if no good template in library!
Sequence-structure alignment algorithm:Bad alignment Bad score!
Energy function (scoring scheme): • must distinguish correct sequence-fold alignment from
incorrect sequence-fold alignments • must distinguish “correct” fold from close decoys
Prediction reliability assessment - how determine whether predicted structure is correct (or even close?)
Find “correct” sequence-structure alignment of a target sequence with its native-like fold in PDB
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 37
Threading – Structure database
Build a template database (e.g., ASTRAL domain library derived from
PDB)
Supplement with additional decoys, e.g., generated usingab initio approach such as Rosetta (Baker)
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 38
Threading - Energy function
Two main methods (and combinations of these)
• Structural profile (environmental) physico-chemical properties of aa’s• Contact potential (statistical)
based on contact statistics from PDB (Miyazawa & Jernigan - Jernigan
now at ISU)
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 39
Protein Threading – typical energy function
MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
How well does a specific residue fit structural environment?
What is "probability" that two specific residues are in contact?
Alignment gap penalty?
Total energy: E_p + E_s + E_g
Find a sequence-structure alignment that minimizing the energy function
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 40
A Rapid Threading Approach for Protein Structure Prediction
Kai-Ming Ho, Physics Haibo Cao
Yungok Ihm Zhong Gao
James MorrisCai-zhuang
Wang Drena Dobbs, GDCB
Jae-Hyung LeeMichael
TerribiliniJeff Sander
11/11/05 D Dobbs ISU - BCB 444/544X: Protein Structure Prediction 41
Performance Evaluation? "Blind Test"
CASP5 Competition (Critical Assessment of Protein Structure
Prediction) Given: Amino acid sequence Goal: Predict 3-D structure (before experimental results published)
Typical Results: well, actually, BEST Results: HO = #1 ranked CASP prediction for this
target
Target 174
PDB ID = 1MG7
Actual Structure
Predicted Structure
T174_1
T174_2
FR Fold Recognition
(targets manually assessed by Nick Grishin)
-----------------------------------------------------------
Rank Z-Score Ngood Npred NgNW NpNW Group-name 1 24.26 9.00 12.00 9 12 Ginalski
2 21.64 7.00 12.00 7 12 Skolnick Kolinski
3 19.55 8.00 12.50 9 14 Baker
4 16.88 6.00 10.00 6 10 BIOINFO.PL
5 15.25 7.00 7.00 7 7 Shortle
6 14.56 6.50 11.50 7 13 BAKER-ROBETTA
7 13.49 4.00 11.00 4 11 Brooks
8 11.34 3.00 6.00 3 6 Ho-Kai-Ming
9 10.45 3.00 5.50 3 6 Jones-NewFold -----------------------------------------------------------
FR NgNW - number of good predictions without weighting for multiple models
FR NpNW - number of total predictions without weighting for multiple models
Overall Performance in CASP5 Contest (M. Levitt, Stanford)