proteins secondary structure predictions
DESCRIPTION
Structural Bioinformatics. Proteins Secondary Structure Predictions. Structure Prediction Motivation. Better understand protein function Broaden homology Detect similar function where sequence differs (only ~50% remote homologies can be detected based on sequence) Explain disease - PowerPoint PPT PresentationTRANSCRIPT
Proteins SecondaryStructure Predictions
Structural Bioinformatics
2
Structure Prediction Motivation
• Better understand protein function
• Broaden homology– Detect similar function where sequence differs
(only ~50% remote homologies can be detected based on sequence)
• Explain disease– Explain the effect of mutations – Design drugs
3
“ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”
Solved in 1958 by Max Perutz John Kendrew of Cambridge University.
Won the 1962 and Nobel Prize in Chemistry.
Myoglobin – the first high resolution protein structure
4
Predicting the three dimensional structure from sequence of a protein is very hard
(some times impossible)
However we can predict with relative high precision the secondary structure
MERFGYTRAANCEAP….
What do we mean by Secondary Structure ?
Secondary structure are the building blocks of the protein structure:
=
6
What do we mean by Secondary Structure ?
Secondary structure is usually divided into three categories:
Alpha helix Beta strand (sheet)Anything else –
turn/loop
7
3.6 residues
5.6 Å
Alpha Helix: Pauling (1951)
• A consecutive stretch of 5-40 amino
acids (average 10).
• A right-handed spiral conformation.
• 3.6 amino acids per turn.
• Stabilized by H-bonds
8
Beta Strand: Pauling and Corey (1951)
• Different polypeptide chains run alongside each
other and are linked together by hydrogen bonds.
• Each section is called β -strand,
and consists of 5-10 amino acids.
β -strand
9
The strands become adjacent to each other, forming beta-sheet.
Beta SheetBeta Sheet3.47Å
4.6Å
3.25Å
4.6Å
Antiparallel
Parallel
10
Loops
• Connect the secondary structure elements.
• Have various length and shapes.
• Located at the surface of the folded protein and therefore may have important role in biological recognition processes.
11
Three dimensional Tertiary Structure
Describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the
level of one whole polypeptide chain
12
RBP
Globin
Tertiary
Secondary
13
How do the (secondary and tertiary) structures relate to the primary
protein sequence??
14
-Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen)
- Protein structure is more conserved than
protein sequence and more closely related
to function.
STRUCTURESEQUENCE
15
How (CAN) Different Amino Acid Sequence Determine Similar Protein
Structure ??
Lesk and Chothia 1980
16
The Globin Family
17
Different sequences can result in similar structures
1ecd 2hhd
18
We can learn about the important features which determine structure and function by comparing the sequences and structures ?
19
The Globin Family
20
Why is Proline 36 conserved in all the globin family ?
21
Where are the gaps??
The gaps in the pairwise alignment are mapped to the loop regions
22
How are remote homologs related in terms of their structure?
retinol-binding protein
odorant-binding protein
apolipoprotein D b-lactoglobulin
RBD
23
PSI-BLAST alignment of RBP and -lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
24
The Retinol Binding Protein b-lactoglobulin
Structure Prediction: Motivation
• Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR)
• Only about ~50000 solved protein structures• Experimental methods are time consuming and not
always possible
• Goal: Predict protein structure based on sequence information
26
Prediction Approaches
• Tow stage
1. Primary (sequence) to secondary structure
2. Secondary to tertiary
• One stage
- Primary to tertiary structure
27
According to the most simplified model: • In a first step, the secondary structure is
predicted based on the sequence. • The secondary structure elements are then
arranged to produce the tertiary structure, i.e. the structure of a protein chain.
• For molecules which are composed of different subunits, the protein chains are arranged to form the quaternary structure.
Secondary Structure Prediction
• Given a primary sequence
ADSGHYRFASGFTYKKMNCTEAA
what secondary structure will it adopt ?
28
29
Secondary Structure Prediction Methods
• Chou-Fasman / GOR Method– Based on amino acid frequencies
• Machine learning methods– PHDsec and PSIpred
• HMM (Hidden Markov Model)
30
Chou and Fasman (1974)Name P(a) P(b) P(turn)
Alanine 142 83 66Arginine 98 93 95Aspartic Acid 101 54 146Asparagine 67 89 156Cysteine 70 119 119Glutamic Acid 151 037 74Glutamine 111 110 98Glycine 57 75 156Histidine 100 87 95Isoleucine 108 160 47Leucine 121 130 59Lysine 114 74 101Methionine 145 105 60Phenylalanine 113 138 60Proline 57 55 152Serine 77 75 143Threonine 83 119 96Tryptophan 108 137 96Tyrosine 69 147 114Valine 106 170 50
The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)
Success rate of 50%
31
Secondary Structure Method Improvements
‘Sliding window’ approach• Most alpha helices are ~12 residues long
Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold
predict this is an alpha helix/beta sheet
TGTAGPOLKCHIQWMLPLKK
32
Improvements since 1980’s
• Adding information from conservation in MSA
• Smarter algorithms (e.g. Machine learning, HMM).
Success -> 75%-80%
33
Machine learning approach for predicting Secondary Structure (PHD, PSIpred)
Step 1: Generating a multiple sequence alignment
Query
SwissProt
QuerySubjectSubjectSubjectSubject
34
Step 2:Additional sequences are added using a profile. We end up with a MSA which represents the protein family.
Query
seed
QuerySubjectSubjectSubjectSubject
MSA
35
The sequence profile of the protein family is compared (by machine learning methods) to sequences with known secondary structure.
Query
seed
QuerySubjectSubjectSubjectSubject
MSA Machine LearningApproach Known
structures
Step 3:
36
• HMM enables us to calculate the probability of assigning a sequence to a secondary structure
TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB
p? =
HMM approach for predicting Secondary Structure (SAM)
37
The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15
The probability of
observing Alanine as part of a β-
sheet
Table built according to large database of known secondary structures
α-helix followed by
α-helix
Beginning with an α-
helix
38
• The above table enables us to calculate the probability of assigning secondary structure to a protein
• Example
TGQHHH
p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995
39
Secondary structure prediction
• AGADIR - An algorithm to predict the helical content of peptides • APSSP - Advanced Protein Secondary Structure Prediction Server • GOR - Garnier et al, 1996 • HNN - Hierarchical Neural Network method (Guermeur, 1997) • Jpred - A consensus method for protein secondary structure prediction at University
of Dundee • JUFO - Protein secondary structure prediction from sequence (neural network) • nnPredict - University of California at San Francisco (UCSF) • PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom,
EvalSec from Columbia University • Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction • PSA - BioMolecular Engineering Research Center (BMERC) / Boston • PSIpred - Various protein structure prediction methods at Brunel University • SOPMA - Geourjon and Delיage, 1995 • SSpro - Secondary structure prediction using bidirectional recurrent neural networks
at University of California • DLP - Domain linker prediction at RIKEN