predicting protein properties and structure
DESCRIPTION
Predicting Protein Properties and Structure. Rui Alves. Organization of the Talk. From cDNA sequence to protein sequence. Analyzing the information in the protein sequence Predicting the fold (secondary structure) of a protein Predicting the (tertiary) structure of a protein. - PowerPoint PPT PresentationTRANSCRIPT
Predicting Protein Properties and Structure
Rui Alves
Organization of the Talk
• From cDNA sequence to protein sequence.
• Analyzing the information in the protein sequence
• Predicting the fold (secondary structure) of a protein
• Predicting the (tertiary) structure of a protein
Predicting protein sequence from DNA sequence
• Protein sequence can be predicted by translating the cDNA and using the genetic code.
Translating cDNA into protein sequence
ATGTCTCTTATATGA…
MetSerLeuIleTer
No Gene!!!!!
Translating cDNA to Protein
Translating yeast mitochondrial cDNA into protein sequence
ATGTCTCTTATATGA………SECIS sequence
TrpSerThrMetsCys
MetSerLeuIleTer
There is a Gene with a considerably different protein sequence from the one we would
predict from the universal genetic code!!!!!
Organization of the Talk
• From cDNA sequence to gene sequence.
• Analyzing the information in the protein sequence
• Predicting the fold (secondary structure) of a protein
• Predicting the (tertiary) structure of a protein
Inferring function from sequence
Your Sequence
Protein Sequence Database
No Known Homologues in the Database
Oh, $#!¥!!!
Go to the Protein Databank to get structure
&
Live happily ever after
Analyzing the information in the protein sequence
• Physical-Chemical Information
Why are these properties useful?
For example, they help identifying your protein in an electrophoresis gel
Analyzing the physical chemical information in the protein sequence
How to predict hidrophobicity
How to predict molecular mass
Ala
Molecular Mass: 71.09
Cys
71.09+103.15-18
-H2O
How to predict isoelectric point
Ala
Isoelectric Point:
Cys …
- 9.3 … pH
Pro
tein
Cha
rge
0
0 16
-
+
~10
Amino acid pKa is dependent upon environment
Buried amino acids do not gain/loose protons as easily as exposed amino acids
…
Does not work very well
Isoelectric point is the pH at which the protein is not charged
At each value of pH, calculate the state of hydrogenation of each residue and thus the charge of the whole protein
Analyzing the information in the protein sequence
• Physical-Chemical Information• e.g.
http://prowl.rockefeller.edu/prowl-cgi/sequence.exe/.fsa
• Localization, modifications & secondary structure Information
• E.g. http://seq.cbrc.jp/proteinLocalizationResources/localizationLinks.html
Predicting the localization of your protein
• Search for homology to the relevant TS in your protein
• Complications:
•Small sequences, divergence, change between organisms
• Signal Peptides
•Nuclear localization signals at the N-terminal
•Mitochondrial TS
•Peroxysomal TS
•…
How is the localization of a protein predicted?
Predicting post translational modifications to your protein
How are post translational modifications to a protein predicted?
• Signal sequences
• Search for homology to pattern peptides
Training set of known structures
Training set of corresponding sequences
Test set of known structures
Test set of corresponding sequences
How is 2ndary structure predicted?
p(-helix) p(coil) p(-strand)
A 0.23 0.28 0.5
Database of known structures
Database of corresponding sequences
ACDEFGTYAEE……
-helix coil -strand
p(-helix) p(coil) p(-strand)
A…C… A…C.. A…C…
A 0.1…0.03 0.04…0.002 0.1…0.21
p(aa1-coil) p(aa1-helix)
p(aa1-strand) …
Predict 2ary structureCompare
Bad Predictions:
Reshuffle training set and test set and repeat until predictions are correct
Good Predictions:
Method ready for new sequence 2ndary structure prediction
Predicting transmembrane helices
How are transmembrane regions predicted?
• Transmembrane segments are 17 residues long
17 aa residues
Hydrophobic Hydrophobic
Two Transmembrane helices
How is membrane orientation predicted?
HN-
Outside
Cytosol
NH
NH
Signal Peptide
17 aa
15 aa 15 aa
+++ ---
Organization of the Talk
• From cDNA sequence to gene sequence.
• Analyzing the information in the protein sequence
• Predicting the fold (secondary structure) of a protein
• Predicting the (tertiary) structure of a protein
What is fold?
• Fold can be roughly defined as the succession of --coil structures in a protein
Predicting protein folding
How is fold predicted?
Database of known structures
Database of corresponding sequences
Database of probabilities of aa in 2ndary structure
YOUR SEQUENCE
Homology
based helix
coil-strand
profile folds database
Server
Strong Homology
… Fold Prediction
Weak/No Homology
Helix-coil-strand
profile prediction
… Fold Prediction
Organization of the Talk
• From cDNA sequence to gene sequence.
• Analyzing the information in the protein sequence
• Predicting the fold (secondary structure) of a protein
• Predicting the (tertiary) structure of a protein
Predicting protein structure
• Homology Modeling– 3D-JIGSAW, SWISSMODEL
• Ab initio Modeling– ROBETTA
Predicting protein structure by homology
How does homology modeling work?
Database of known structures
Database of corresponding sequences
…YDVRSEQVENCE…
Server/
Program
Strong Homologues
Best possible Sequence alignment
…YDVR-SEQVENCE…
…YDVRMSD-VDNCD…
…YDVR-SEQVENCE…
…YDVRMSD-VDNCD…
…
…
Thread sequence to predict over known structure according to alignment
…
… Optimization via energy
minimization, etc…
Predicting protein structure
• Homology Modeling– 3D-JIGSAW,SWISSMODEL
• Ab initio Modeling– ROBETTA
Predicting protein structure by ab initio methods
Database of corresponding sequences
…YDVRSEQVENCE…
Server/
Program
NO Homologues
Database of structures for smaller amino acid runs
…YDVR-SEQ
…YDVRMSD-……YDVR-SEQ
…YPVRMSD-…
…
…VENCE…
…YDNCD……VENCE…
…VEQCE…
…
… Assemble
Energy minimization
& optimization
…
Summary
• From cDNA sequence to gene sequence.
• Analyzing the information in the protein sequence
• Predicting the fold (secondary structure) of a protein
• Predicting the (tertiary) structure of a protein