secondary structure prediction and signal peptides
DESCRIPTION
Protein Analysis Workshop 2012. Secondary Structure Prediction and Signal Peptides. Bioinformatics group Institute of Biotechnology University of helsinki. Earlier version: Hung Ta Current: Petri Törönen. Why Sec. Struct. Predictions and signal peptides?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/1.jpg)
Secondary Structure Prediction and Signal Peptides
Protein Analysis Workshop 2012
Bioinformatics groupInstitute of BiotechnologyUniversity of helsinki
Earlier version: Hung Ta
Current: Petri Törönen
![Page 2: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/2.jpg)
Why Sec. Struct. Predictions andsignal peptides?
Usually sequence homology represents good source of information
However sometimes one does not get good homology
We need other sources of information to aid us• Domain (profile) homologies (later lectures)
• Secondary structure
• Signal peptides
• Transmembrane regions
Sec.Struct. And signal peptides also good information for other bioinformatics tools
![Page 3: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/3.jpg)
Secondary Structure
Alternative when only weak sequence homology Structure more conserved than sequence
Similar sec. struct. gives extra support for weak sequence homology
Special cases of sec. struct. can suggest function or localization
![Page 4: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/4.jpg)
Hierachy of Protein Structure Hierachy of Protein Structure
![Page 5: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/5.jpg)
Primary Structure: a Primary Structure: a Linear Arrangement Linear Arrangement of Amino Acidsof Amino Acids
An amino acid has several structural components: a central carbon atom (C), an amino group (NH2), a carboxyl group (COOH), a hydrogen atom (H), a side chain (R). There are 20 amino acids
The peptide bond is formed as the cacboxyl group of an aa bind to the amino group of the adjacent aa.
The primary structure of a protein is simply the linear arrangement, or sequence, of the amino acid residues that compose it
![Page 6: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/6.jpg)
Secondary Structure: Secondary Structure: Core Elements of Core Elements of Protein ArchitectureProtein Architecture
resulted from the folding of localized parts of a
polypeptide chain.
α-helix
β-sheet
Coils, turns,
} major internal supportive elements, 60 percent of the polypeptide chain
![Page 7: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/7.jpg)
αα-Helix-Helix
Hydrogen-bonded
3.6 residues per turn
Axial dipole moment
Side chains point outward
Average length is 10 amino acids
(3 turns).
Typically, rich of Analine,
Glutamine, Leucine, Methione;
and poor of Proline, Glycine,
Tyrosine and Serine.
![Page 8: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/8.jpg)
ββ-Sheet-Sheet
Formed due to hydrogen bonds
between β-strands which are short
polypeptide segments (5-8
residues).
Adjacent β-strands run in the
same directions -> parallel sheet.
Adjacent β-strands run in the
oposite directions -> anti-parallel
sheet.
Ribbon diagram
![Page 9: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/9.jpg)
Turns, loops, coils…Turns, loops, coils…
A turn, composed of 3-4 residues, forms
sharp bends that redirect the polypeptide
backbone back toward the interior.
A loop is similar with turns but can form
longer bends
Turns and loops help large proteins fold into
compact structures.
A random coil is a class of conformations
that indicate an absence of regular
secondary structure.
Turn
![Page 10: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/10.jpg)
Secondary Structure PredictionSecondary Structure Prediction
Primary: MSEGEDDFPRKRTPWCFDDEHMC
Secondary: CCHHHHHHCCCCEEEEEECCCCC
Why: the first level of structural organization.
The tasks:
• H: α-helix
• E: β- strand
• T: turn
• C: coil
aa
?
![Page 11: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/11.jpg)
Secondary Structure PredictionSecondary Structure Prediction
Single residue statistical analysis (Chou-Fasman -1974): For each amino acid type, assign its ‘propensity’ to be in a helix, β-
sheet, or coil.
Based on 15 proteins of known conformation, 2473 total amino
acids.
Limited accuracy: ~55-60% on average.
Eg: Chou-Fasman (1974), not used any more
![Page 12: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/12.jpg)
Secondary Structure PredictionSecondary Structure Prediction
Segment-based statistics: Look for correlations (within 11-21 aa windows).
Many algorithms have been tried.
Most performant: Neural Networks:
Input: a number of protein sequences with their known secondary
structure.
Output: a trained network that predicts secondary structure elements for
given query sequences.
Accuracy < 70%.
![Page 13: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/13.jpg)
Popular Servers for Secondary Structure Prediction
Jpred (http://www.compbio.dundee.ac.uk/www-jpred/ )
Psipred (http://bioinf.cs.ucl.ac.uk/psipred/ ) Metaserver PredictProtein
(http://www.predictprotein.org/ ).
![Page 14: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/14.jpg)
PSIPRED and JPRED
Test with uniprot|P00772|ELA1_PIG Elastase-1 precursor
Correct answer: http://www.uniprot.org/uniprot/P00772
![Page 15: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/15.jpg)
PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/result/351083)
![Page 16: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/16.jpg)
JPRED (http://www.compbio.dundee.ac.uk/www-jpred/results/jp_Pt7zBV4/jp_Pt7zBV4.results.html)
•Above the summary•On the right the Detailed view
![Page 17: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/17.jpg)
Special Cases of Secondary Special Cases of Secondary StructureStructure
Informative special cases of secondary structures. These include: Coiled Coil regions Transmembrane regions
![Page 18: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/18.jpg)
Prediction of coiled-coilsPrediction of coiled-coils
• Coiled-coil protein are often biologically relevant regulators (Transcription Factors)• Coiled-coils are generally solvent exposed multi-stranded helix structures:
Helix periodicity and solvent exposure imposespecial pattern of heptad repeat:
… abcdefg … hydrophobic residues hydrophilic residues
two-stranded
(From Wikipedia Leucine zipper article)
Helical diagram of2 interacting helices:
![Page 19: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/19.jpg)
Compares a sequence to a database of known, parallel two-stranded coiled-coils, and derives a similarity score.
By comparing this score to the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation.
Options:• scoring matrices,• window size (score may vary),• weighting options.
The COILS server at EMBnetThe COILS server at EMBnet
![Page 20: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/20.jpg)
The program works well for parallel two-stranded structures that are solvent-exposed but runs progressively into problems with the addition of more helices, their antiparallel orientation and their decreasing length.
The program fails entirely on buried structures.
COILS LimitationsCOILS Limitations
![Page 21: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/21.jpg)
COILS DemoCOILS Demo
Let us submit the sequence
to the COILS server at EMBnet:
http://www.ch.embnet.org/software/COILS_form.html
>1jch_AVAAPVAFGFPALSTPGAGGLAVSISAGALSAAIADIMAALKGPFKFGLWGVALYGVLPSQIAKDDPNMMSKIVTSLPADDITESPVSSLPLDKATVNVNVRVVDDVKDERQNISVVSGVPMSVPVVDAKPTERPGVFTASIPGAPVLNISVNNSTPAVQTLSPGVTNNTDKDVRPAFGTQGGNTRDAVIRFPKDSGHNAVYVSVSDVLSPDQVKQRQDEENRRQQEWDATHPVEAAERNYERARAELNQANEDVARNQERQAKAVQVYNSRKSELDAANKTLADAIAEIKQFNRFAHDPMAGGHRMWQMAGLKAQRAQTDVNNKQAAFDAAAKEKSDADAALSSAMESRKKKEDKKRSAENNLNDEKNKPRKGFKDYGHDYHPAPKTENIKGLGDLKPGIPKTPKQNGGGKRKRWTGDKGRKIYEWDSQHGELEGYRASDGQHLGSFDPKTGNQLKGPDPKRNIKKYL
![Page 22: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/22.jpg)
Correct answer:http://www.rcsb.org/pdb/explore/explore.do?structureId=1JCH
![Page 23: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/23.jpg)
Correct answer:http://www.rcsb.org/pdb/explore/explore.do?structureId=1JCH
![Page 24: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/24.jpg)
Transmembrane proteins are important receptor or transport proteins.
Transmembrane regions: Usually contain residues with hydrophobic side
chains (surface must be hydrophobic). Usually ~20 residues long, can be up to 30 if
not perpendicular through membrane.Methods: Hydropathy plots (historical, better methods now available)
Threading (TMpred, MEMSAT), Hidden Markov Model (TMHMM), Neural Network (PHDhtm).
Transmembrane Region PredictionTransmembrane Region Prediction
![Page 25: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/25.jpg)
Hydropathy Plots (Kyte-Doolittle)
The hydropathy index of an amino acid is a number
representing the hydrophobic or hydrophilic properties of
its side-chain
compute an average hydropathy value for each position
in the query sequence,
window length of 19 usually chosen for membrane-
spanning region prediction.
•Skip this
![Page 26: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/26.jpg)
>sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIAAFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWLMAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSEGVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFGGDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLTGTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK
Hydropathy Plot ServersHydropathy Plot Servers
Let us submit the sequence
to
Membrane Explorer (also as standalone MPEx), Grease (http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=misc1)
Remove the FASTA header, if seq reading is not working.
•Skip this
![Page 27: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/27.jpg)
Hydropathy PlotHydropathy Plot
The larger the number is, the more hydrophobic the amino acid
Correct answer (http://pir.uniprot.org/uniprot/P06010)
•Skip this
![Page 28: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/28.jpg)
Scans a candidate sequence for matches to a sequence scoring matrix, obtained by aligning the sequences of all transmembrane alpha-helical regions that are known from structures.
These sequences are collected in a database called TMBase.
TM PredTM Pred
Method summary:
Remark: Authors do not suggest this method for genomic sequences. Automatic methods recommended, eg, TMHMM, PHDhtm.
![Page 29: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/29.jpg)
TM Pred ServerTM Pred Server
>sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIAAFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWLMAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSEGVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFGGDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLTGTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK
Let us submit RCEM_RHOVI again
to the TMPred server at EMBnet:
http://www.ch.embnet.org/software/TMPRED_form.html
![Page 30: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/30.jpg)
![Page 31: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/31.jpg)
![Page 32: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/32.jpg)
allows you to obtain many informations based on your sequence including structure predictions, motif or domain search… The predictions are based on several methods.
PredictProtein: http://predictprotein.org
Meta-ServersMeta-Servers
A server which
![Page 33: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/33.jpg)
For sequence analysis, structure and function prediction. When you submit
any protein sequence PredictProtein retrieves similar sequences in the
database and predicts aspects of protein structure and function
SEG: finds low complexity regions.
ProSite: database of functional motifs, ie, biologically relevant short patterns
ProDom: a comprehensive set of protein domain families automatically generated
from the SWISS-PROT and TrEMBL sequence databases.
PROFsec (PHDsec): secondary structure,
PROFacc (PHDacc): solvent accessibility,
PHDhtm: transmembrane helices.
Sequence database is scanned for similar sequences (Blast, Psi-Blast).
Multiple sequence alignment profiles are generated by weighted dynamic
programming (MaxHom).
The PredictProtein meta-server
![Page 34: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/34.jpg)
PredictProtein Demo
Let´s submit again
to http://predictprotein.org/
>uniprot|P00772|ELA1_PIG Elastase-1 precursor MLRLLVVASLVLYGHSTQDFPETNARVVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNDGTEQYVGVQKIVVHPYWNTDDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQLAQTLQQAYLPTVDYAICSSSSYWGSTVKNSMVCAGGDGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGCNVTRKPTVFTRVSAYISWINNVIASN
For a list of mirror sites: http://predictprotein.org/newwebsite/doc/mirrors.html
![Page 35: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/35.jpg)
![Page 36: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/36.jpg)
Detailed results Summary view
![Page 37: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/37.jpg)
Results
![Page 38: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/38.jpg)
Documentation:• COILS: http://www.ch.embnet.org/software/coils/COILS_doc.html
• TMPred: http://www.ch.embnet.org/software/tmbase/TMBASE_doc.html
• MPEx: http://blanco.biomol.uci.edu/mpex/MPEXdoc.html
Articles: B. Rost: Evolution teaches neural networks. In Scientific applications of neural nets. Ed.
J.W.Clark, T.Lindenau, M.L. Ristig, 207-223 (1999).
D.T Jones: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices. J.Mol.Biol. 292, 195-202 (1999).
B. Rost: Prediction in 1D: Secondary Structure, Membrane Helices, and Accessibility. In Structural Bioinformatics (reference below).
Books: P.E. Bourne, H. Weissig: Structural Bioinformatics. Wiley-Liss, 2003.
A. Tramontano: Protein Structure Prediction. Wiley-VCH, 2006.
References •Skip this
![Page 39: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/39.jpg)
Short peptide chain that directs the transport of protein
Peptide chain is located mostly in N or C-terminus
Targets in eukaryotes: ER, nucleus, nucleolus, mitochonrion, peroxisome
Bacteries use them to secrete proteins When one does not have the sequence
homology these still can tell the potential location of the protein => a hint to function
Signal PeptidesSignal Peptides
![Page 40: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/40.jpg)
Challenge is to determine weak signal from the background noise
Various machine learning methods used Hidden Markov Models (HMM) Neural Networks
Most popular tool: SignalP http://www.cbs.dtu.dk/services/SignalP/
Prediction of signal peptidesPrediction of signal peptides
![Page 41: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/41.jpg)
Tools that predict the cellular localization automatically
Wolf Psort: http://wolfpsort.org/ TargetP: http://www.cbs.dtu.dk/services/TargetP/
Prediction of cellular localizatio nPrediction of cellular localizatio n
![Page 42: Secondary Structure Prediction and Signal Peptides](https://reader035.vdocuments.site/reader035/viewer/2022062408/568134b0550346895d9bca3e/html5/thumbnails/42.jpg)
http://www.signalpeptide.de/ Collection of the information on known and
predicted sign.peptide - protein pairs Allows search with sequence name and keywords Advanced search allows limitation of hits to single
species
This is useful when looking for extra information for the known protein
Signal Peptide DatabaseSignal Peptide Database