intro protein function - motifs pp2 introfunc3 lecture ...€¦ · © burkhard rost (tum munich) 1...
TRANSCRIPT
© Burkhard Rost (TUM Munich) /001
title: Intro protein function - motifsshort title: pp2_introfunc3
lecture: Protein Prediction 2 - Protein function TUM winter 2013
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Announcements
Videos: SciVe / www.rostlab.orgTHANKS : Tim Karl + Jona ReebSpecial lectures:
• Oct 29: Tobias Hamp• Nov 21: Tanya Goldberg• Dec 19 + Jan 09: Andrea Schafferhans• Jan 28: Arthur Dong• Jan 21+23: Marco De Vivo/Marco Punta
No lecture:• Oct 10 Thu (Reformation)• Nov 12 Tue (Student assembly)• Dec 12 Thu (TUM Dies Academicus)
LAST lecture: Jan 30Examen: Feb 4 (likely this room)
• Makeup: Apr 9 - morning
CONTACT: Marlena Drabik [email protected]
2Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Past - TOC today - Next
So far: Function introduction• Molecular biology is just at an exciting beginning• We can compute some aspects of molecular life• Most accurate inference of function: based on homology
Today• Motifs• Function by association
NEXT• localization
3Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
I.2b Function Intro:
Sequence motifs
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Homology-based inference of function
5Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Homology-based inference
6
P1 is a Cytochrome P450
P2 is sequence similar to P2 -> also a Cytochrome P450?
MELLQLWSALIILVVTYTISLLINQWRKPKPQGKFPPGPPKLPLIGHLHLLWGKLPQHALASVAKEYGPVAHVQLGEVFSVVLSSREATKEAMKLVDPACANRFESIGTRIMWYDNEDIIFSPYSEHWRQMRKICVSELLSSRNVRSFGFIRQDEVSRLLRHLRSSAGAAVDMTERIETLTCSIICRAAFGSVIRDNAELVGLVKDALSMASGFELADMFPSSKLLNLLCWNKSKLWRMRRRVDTILEAIVDEHKFKKSGEFGGEDIIDVLFRMQKDTQIKVPITTNSIKAFIFDTFSAGTETSSTTTLWVLAELMRNPAVMAKAQAEVRAALKEKTNWDVDDVQELKYMKSVVKETMRMHPPIPLIPRSCREECVVNGYTIPNKARIMINVWSMGRNPLYWEKPDTFWPERFDQVSKDFMGNDFEFVPFGAGRRICPGLNFGLANVEVPLAQLLYHFDWKLAEGMKPSDMDMSEAEGLTGILKNNLLLVPTPYDPSS
MELDLLSAIIILVATYIVSLLINQWRKSKSQQNLPPSPPKLPVIGHLHFLWGGLPQHVFRSIAQKYGPVAHVQLGEVYSVVLSSAEAAKQAMKVLDPNFADRFDGIGSRTMWYDKDDIIFSPYNDHWRQMRRICVTELLSPKNVRSFGYIRQEEIERLIRLLGSSGGAPVDVTEEVSKMSCVVVCRAAFGSVLKDQGSLAELVKESLALASGFELADLYPSSWLLNLLSLNKYRLQRMRRRLDHILDGFLEEHREKKSGEFGGEDIVDVLFRMQKGSDIKIPITSNCIKGFIFDTFSAGAETSSTTISWALSELMRNPAKMAKVQAEVREALKGKTVVDLSEVQELKYLRSVLKETLRLHPPFPLIPRQSREECEVNGYTIPAKTRIFINVWAIGRDPQYWEDPDTFRPERFDEVSRDFMGNDFEFIPFGAGRRICPGLHFGLANVEIPLAQLLYHFDWKLPQGMTDADLDMTETPGLSGPKKKNVCLVPTLYKSP
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
similar sequence
similar function
Important but: HANDLE WITH CARE!!
?
Annotation transfer
7Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Target selection
8Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
what else 2 use?(in absence of
structure)
9Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
What could it be?
10
serine protease (ia46)trypsin domain of human prothrombinHistidine - Aspartate - Serine
CJA Sigrist, L Cerutti, N Hulo, A Gattiker, L Falquet, M Pani, Amos Bairoch & Philipp Bucher (2002) Briefings in Bioinformatics 3:265-274 (Fig. 1)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Motifs - intro
11Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Sequence vs motif
12
Full sequence (ADH1_human, 95 aa): MANEVIKCKAAVAWEAGKPLSIEEIEVAPPKAHEVRIKIIATAVCHTDAY TLSGADPEGCFPVILGHEGAGIVESVGEGVTKLKAVWRMQILSKS
Motifs could be:
MANEVIKCKAA
Or:MAN[ED]hh[KR]C[KR]
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
How can we use this concept 2 search?
13
?Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Resources for motifs/patterns
14
PROSITE: http://us.expasy.org/prosite/[Hulo et al. Nucl. Acids. Res. 32:D134-D137(2004)]
PRINTS:
http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/[Attwood, Briefings in Bioinformatics, 3(3), 252-263 (2002)]
BLOCKS:
http://www.blocks.fhcrc.org/[Henikoff et al., Nucl. Acids Res. 28:228-230 (2000)]
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE
15Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Amos Bairoch
16
1986 starts SWISS-PROT1988 starts PROSITE1993 starts ExPasy (with Ron Appel)1998 SIB: Swiss Institute of Bioinformatics2009 CALIPHO Computer and Laboratory Investigation of Proteins of Human Origin
Amos Bairoch
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Amos Bairoch
FILL IN papers asf!!!!papers: • >300 papers (Nov 2011)• 3 >1,000 citations (end 2011)• 72 over 100• H-index 83 (ISI Nov 2011)xother• x
17
Shapers and Shakers
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Motifs and patterns
Manually align family + annotate motifsUse motifs for automatic alignment and annotation of unknown
18
Search for the motif pattern in a new protein
Find a motif or a pattern in a functionally characterized family
Transfer function annotation
© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE: Concepts for DB
completeness:DB as many motifs as possiblehigh specificity:no false positives at a level at which most are founddocumentationperiodic reviewing
19Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE history
Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993
20Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE history
Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993
21
Search for the motif pattern in a new protein
Find a motif or a pattern in a functionally characterized family
Transfer function annotation
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE history
Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993
Solution:GxxGxxG (membrane)[RK](2)-x-[ST] (phosphorylation)
22
Search for the motif Find a motif or a pattern in a
Transfer function
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE: Concepts for DB
completeness:DB as many motifs as possiblehigh specificity:no false positives at a level at which most are founddocumentationperiodic reviewing
23
Search for the motif Find a motif or a pattern in a
Transfer function
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE history
Bairoch A (1991) NAR 19 2241-5 PROSITE: a dictionary of sites and patterns in proteinsrepeated: 1992, 1993A Bairoch & P Bucher (1994) NAR 22:3583-9PROSITE: recent developments (profiles)A Bairoch, P Bucher & K Hofmann (1996) NAR 24:189-96repeated 1997, 1999 (Hofmann, Bucher, Falquet, Bairoch)L Falquet, M Pani, P Bucher, N Hulo, CJ Sigrist, K Hofmann, & A Bairoch (2002) NAR 30:235-8
24
Philip Bucher
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE history
CJ Sigrist, L Cerutti, N Hulo, A Gattiker, L Falquet, M Pagni, A Bairoch, P Bucher (2002) Brief Bioinform 3:265-74N Hulo, CJ Sigrist, V Le Saux, PS Langendijk-Genevaux, L Bordoli, A Gattiker, E De Castro, P Bucher, A Bairoch (2004) NAR 32:D134-7A Gattiker, E Gasteiger, A Bairoch (2002) Appl Bioinformatics 1:107-8ScanProsite: a reference implementation of a PROSITE scanning tool
25Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE - evolution of methodA Bairoch (1991) NAR 19 Suppl: 2241-5, prev (1992) NAR 20 Suppl: 2013-8, x (1993) NAR 21: 3097-103, A Bairoch and P Bucher (1994) NAR 22: 3583-9, A Bairoch, P Bucher and K Hofmann (1996) NAR 24: 189-96, prev (1997) NAR 25: 217-21, K Hofmann, P Bucher, L Falquet and A Bairoch (1999) NAR 27: 215-9, L Falquet, M Pagni, P Bucher, N Hulo, CJ Sigrist, K Hofmann and A Bairoch (2002) NAR 30: 235-8, A Gattiker, E Gasteiger and A Bairoch (2002) Appl Bioinformatics 1: 107-8, CJ Sigrist, L Cerutti, N Hulo, A Gattiker, L Falquet, M Pagni, A Bairoch and P Bucher (2002) Brief Bioinform 3: 265-74, N Hulo, CJ Sigrist, V Le Saux, PS Langendijk-Genevaux, L Bordoli, A Gattiker, E De Castro, P Bucher and A Bairoch (2004) NAR 32: D134-7, CJ Sigrist, E De Castro, PS Langendijk-Genevaux, V Le Saux, A Bairoch and N Hulo (2005) Bioinformatics 21: 4060-6, E de Castro, CJ Sigrist, A Gattiker, V Bulliard, PS Langendijk-Genevaux, E Gasteiger, A Bairoch and N Hulo (2006) NAR 34: W362-5, N Hulo, A Bairoch, V Bulliard, L Cerutti, E De Castro, PS Langendijk-Genevaux, M Pagni and CJ Sigrist (2006) NAR 34: D227-30, N Hulo, A Bairoch, V Bulliard, L Cerutti, BA Cuche, E de Castro, C Lachaize, PS Langendijk-Genevaux and CJ Sigrist (2008) NAR 36: D245-9,
26Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PROSITE / ScanProsite
© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
K Hofmann, P Bucher, L Falquet & A Bairoch (1999) Nucl Acids Res 27: 215-9N Hulo et al. (2004) Nucleic Acids Res 32: D134-7Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PRINTS
28Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Terry K Attwood
29
University of Manchester (Faculty of Life Sciences & School of Computer Sciences)PRINTS: dignostic fingerprint databaseTK Attwood & ME Beck (1994) PRINTs-a protein motif fingerprint database
Terry K Attwood
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PRINTS concept
Motifs are stretches of evolutionary conserved fingerprintsversion 42.0 (Manchester Univ, Feb 2012) 2,156 FINGERPRINTS encoding 12,444 single motifsTK Attwood, P Bradley, DR Flower, A Gaulton, N Maudling, A Mitchell, G Moulton, A Nordle, K Paine, P Taylor, A Uddin, C Zygouri (2003) NAR:31, 400-2
30Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
PRINTS: examplehomeoboxThe homeobox is a 60-residue motif first identified in a number of Drosophila homeotic and segmentation proteins, but now known to be well-conserved in many other animals, including vertebrates [1-3]. Proteins containing homeobox domains are likely to play an important role in development - most are known to be sequence-specific DNA-binding transcription factors. The domain binds DNA through a helix-turn-helix (HTH) structure.
31Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
BLOCKS
32Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Jorja & Steven Henikoff
Fred Hutchinson Cancer Center, SeattleHHMI (Howard Hughes Medical Institute)papers: • >300 papers (Nov 2011)• 3 >1,000 citations (end 2011)• 72 over 100• H-index 83 (ISI Nov 2011)Paradigm changes• gene in gene - in intron (1986)• histones NOT only in octamers (2004) • DNA-methylation in histones: H2.AZ in histone spool promotes
gene expression (2008): NOT DNA-methylation shuts off genes (important for cancer drug development)
33
Shapers and Shakers
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
BLOSUM
compile log-odd ratios
BLOSUMn=threshold at n% pairwise sequence identity
S Henikoff & Jorja Henikoff (1992) PNAS 89:10915-9
34
Steven HenikoffMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
BLOSUM
BLOcks of amino acid SUbstitution MatricesAlign only conserved regions
JG Henikoff and S Henikoff (1996) Meth Enzymology 266: 88-104 S Pietrokovski, JG Henikoff & S Henikoff (1996) NAR 24: 197-201
35Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
BLOCKS
idea taken from multiple alignments
36Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
BLOCKS: length distribution
37J Liu & B Rost (2003) Current Opinion in Chemical Biology 7, 5-11
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Pfam
38Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Alex Bateman
39
classify all proteins and RNA into families to better understand their function and evolution 1997 starts Pfam (Protein families)2003 Rfam (RNA-families)
Citation giant:• 229 papers (Nov 2011)• 1 with >8,800 citations (Nov 2011)• 6 with >1,000 citations (11/2011)• 32 with > 100 citations (11/2011)• Hirsh index: 48
Shapers and Shakers
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Pfam: Protein families
40
EL Sonnhammer, SR Eddy, R Durbin (1997) Pfam: a comprehensive database of protein families based on seed alignments. Proteins 28:405-20EL Sonnhammer, SR Eddy, E Birney, A Bateman, R Durbin (1998) NAR 26:320-2A Bateman, E Birney, R Durbin, SR Eddy, RD Finn, EL Sonnhammer (1999) Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. NAR 27:260-2SJ Sammut, RD Finn, A Bateman (2008) Pfam 10 years on: 10,000 families and still growing. Brief. Bioinform 9:210-9
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Pfam: how its done
41
manual alignment
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Pfam - current stats
version/families/
42Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Pfam-7TM
A Bateman, et al. (2004) Nucleic Acids Res 32: D138-41© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Clusters & FamiliesDB/Method Version Latest
UpdateEntries Update URL (all begin with http://)
Short sequence motifsPROSITE 17.23 10/2002 1573 manual www.expasy.ch/prosite/Blocks+ 8/2001 8656 manual blocks.fhcrc.org/blocks/PRINTS 35.0 7/2002 1750 manual www.bioinf.man.ac.uk/dbbrowser/PRINTS/
Structural domain-like regions
Pfam-A 7.6 9/2002 4463 manual pfam.wustl.eduTIGRFAM 2.1 9/2002 1622 manual www.tigr.org/TIGRFAMs/SMART 3.4 10/2002 654 manual smart.embl-heidelberg.deSBASE 9.0 10/2002 483 semi-
manualhydra.icgeb.trieste.it/~kristian/SBASE/
DOMO 2.0 4/1998 automatic www.infobiogen.fr/services/domo/ProDom 2001.3 12/2001 automatic prodes.toulouse.inra.fr/prodom/doc/prodom.htmGeneRAGE automatic www.ebi.ac.uk/research/cgg/services/rage/TribeMCL automatic www.ebi.ac.uk/research/cgg/tribe/CHOP 10/2002 automatic cubic.bioc.columbia.edu/db/chop/
Integration
InterPro 5.2 9/2002 5875 N/A www.ebi.ac.uk/interpro/MetaFam 4.1 9/2002 N/A metafam.ahc.umn.edu
Clusters of proteins
CluSTr automatic www.ebi.ac.uk/clustr/SYSTERS 3.0 automatic systers.molgen.mpg.dePICASSO 0 3/1998 automatic systers.molgen.mpg.deProtoNet 1.4 9/2002 automatic www.protonet.cs.huji.ac.il/protonet/ProClust 1.0 automatic promoter.mi.uni-koeln.de/~proclust/
J Liu & B Rost (2003) Cur Op Chem Biol 7, 5-11Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Some overlap between databases
45J Liu & B Rost (2003) Cur Op Chem Biol 7, 5-11Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
… not everything that shines is copper
J Liu & B Rost (2003) Cur Op Chem Biol 7, 5-11Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
localization motifs
47Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
motif-based inference of localization
48Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Rajesh Nairnow: FDA,
Waschington
49
Rajesh Nair
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Similar proteins may differ in localization
R Nair & B Rost (2002) Protein Science 11: 2836-47 50Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Shuttle into the nucleus
CYTOPLASM
NUCLEUS
NLS M9
Transportin Importin
Nucleus
Cytoplasm
51M Cokol, R Nair & B Rost (2000) EMBO Rep 1: 411-415Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Types of zip-codes
following:B Alberts, D Bray, J Lewis, M Raff, K Roberts, JD Watson: The Cell, Garland, 1994
52Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
How many NLS motifs in databases?
ONE in PROSITEbi-partite motif
Set A N NLS B Nprot nuc C Nfam nuc D Accuracy E
Coverage F
PROSITE 1 96 31 90 % 3 %SWISS-PROT 322 290 n.a. 9 %
NLS-lit cleaned 91 309 35 100 % 10 %NLS-lit consensus 91 537 35 100 % 17 %PredictNLS_DB 214 1354 186 100 % 43 %
Coverage
53Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Experimental NLS: positive chargesNLS Protein Reference
RKRKK YstDNApolalpha Hsieh et al., 1998RKRRR Amida Irie et al., 2000KKKKRKREK LEF-1 Prieve et al., 1998KKKRRSREK TCF-1 Prieve et al.,. 1998RQARRNRRRRWR HIV-1 Rev Truant et al., 1999RRMKWKK PDX-1 Moede et al., 1999PKKKRKV SV40 LrgT Kalderon et al., 1984PRRRK SRY Sudbeck and Scherer, 1997GKKRSKA H2B Moreland et al., 1987KAKRQR v-Rel Gilmore and Temin, 1988RGRRRRQR Amida Irie et al., 2000PPVKRERTS RanBP3 Welch et al., 1999PYLNKRKGKP Pho4p Welch et al., 1999KRx{7,9}PQPKKKP p53-NLS1 Liang and Clarke, 1999KVTKRKHDNEGSGSKRPK Hum-Ku70 Koike et al., 1999RLKKLKCSKx{19}KTKR GAL4 Chan et al., 1998RKRIREDRKx{18}RKRKR TCPTP Chan et al., 1998RRERx{4}RPRKIPR BDV-P Schwemmle et al., 1999KKKKKEEEGEGKKK act/inh betaA Blauer et al., 1999PRPRKIPR BDV-P Shoya et al., 1998PPRIYPQLPSAPT BDV-P Shoya et al., 1998KDCVINKHHRNRCQYCRLQR TR2 Yu et al., 1998APKRKSGVSKC PolyomaVP1 Chang et al., 1992RKKRRQRRR HIV-1 Tat Truant et al., 1999MPKTRRRPRRSQRKRPPT Rex Palmeri and Malim, 1999KRPMNAFIVWSRDQRRK SRY Sudbeck and Scherer, 1997KRPMNAFMVWAQAARRK SOX9 Sudbeck and Scherer, 1997PPRKKRTVV NS5A Ide et al., 1996YKRPCKRSFIRFI DNAse EBV Liu et al., 1998LKDVRKRKLGPGH DNAse EBV Lyons et al., 1987KRPRP AdenovE1a Bouvier and Baldacci, 1995RRSMKRK hVDR Vihinen-Ranta et al., 1997PAKRARRGYK CPV capsid Kaneko et al., 1997RKCLQAGMNLEARKTKK hGlu.cort. Kaneko et al., 1997RRERNKMAAAKCRNRRR CFOS Kaneko et al., 1997KRMRNRIAASKCRKRKL CJUN Kaneko et al., 1997
54Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Experimental NLS: more complicated
NLS Protein Reference
CYGSKNTGAKKRKIDDA DNAhelicaseQ1 Miyamoto et al., 1997
[AKR]TPIQKHWRPTVLTEGPPVKIRIETGEWE[KA] ASVintegrase Kukolj G. 1998
GGGx{3}KNRRx{6}RGGRN Nab2 Truant et al., 1998
KRxxxxxxxxxKTKK THOV NP Weber et al., 1998
EYLSRKGKLEL VirD2-Nterm Tinland et al., 1992KRPACTLKPECVQQLLVCSQEAKK HCDA Somasekaram et al., 1999
RVHPYQR QKI-5 Wu et al., 1999HARNT Eguchi et al., 1997YNNQSSNFGPMKGGN M9 Bonifaci et al., 1997
SxGTKRSYxxM InfluenzaNP Wang et al., 1997TKRSxxxM InfluenzaNP Wang et al., 1997VNEAFETLKRC MyoD Vandromme et al., 1995
MNKIPIKDLLNPG Mat-alpha Hall et al., 1984
55Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
In silico mutagenisis
56Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Increasing accuracy and coverage
57
Set A N NLS B Nprot nuc C Nfam nuc D Accuracy E
Coverage F
PROSITE 1 96 31 90 % 3 %SWISS-PROT 322 290 n.a. 9 %
NLS-lit cleaned 91 309 35 100 % 10 %NLS-lit consensus 91 537 35 100 % 17 %PredictNLS_DB 214 1354 186 100 % 43 %
Coverage
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Increasing accuracy and coverage
58
Set A N NLS B Nprot nuc C Nfam nuc D Accuracy E
Coverage F
PROSITE 1 96 31 90 % 3 %SWISS-PROT 322 290 n.a. 9 %
NLS-lit cleaned 91 309 35 100 % 10 %NLS-lit consensus 91 537 35 100 % 17 %PredictNLS_DB 214 1354 186 100 % 43 %
Coverage
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Types of zip-codes
59Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Sarah Gilman
60
Kaz Wrzeszczynski
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
ER
&
Golgi retention
signals
Sequence motif 1 ER/Golgi Non-ER/GolgiN % N %
Endoplasmic reticulum (ER) motifs 2
KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98
Golgi apparatus motifs 3
YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97
C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11
KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
ER
&
Golgi retention
signals
Sequence motif 1 ER/Golgi Non-ER/GolgiN % N %
Endoplasmic reticulum (ER) motifs 2
KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98
Golgi apparatus motifs 3
YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97
C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11
KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
ER
&
Golgi retention
signals
Sequence motif 1 ER/Golgi Non-ER/GolgiN % N %
Endoplasmic reticulum (ER) motifs 2
KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98
Golgi apparatus motifs 3
YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97
C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11
KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
ER
&
Golgi retention
signals
Sequence motif 1 ER/Golgi Non-ER/GolgiN % N %
Endoplasmic reticulum (ER) motifs 2
KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98
Golgi apparatus motifs 3
YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97
C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11
KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
ER
&
Golgi retention
signals
Sequence motif 1 ER/Golgi Non-ER/GolgiN % N %
Endoplasmic reticulum (ER) motifs 2
KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98
Golgi apparatus motifs 3
YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97
C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11
KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
ER
&
Golgi retention
signals
Sequence motif 1 ER/Golgi Non-ER/GolgiN % N %
Endoplasmic reticulum (ER) motifs 2
KDEL-C-term 56 92 5 8KDEL 61 7 714 92HDEL-C-term 45 92 4 8HDEL 46 15 269 2HDEF-C-term 2 50 2 50HDEF 2 2 89 98
Golgi apparatus motifs 3
YQRL 3 1 270 99YKGL 5 1 442 99YHPL 4 5 76 95YXXZ 477 1 83112 99NPFKD 0 0 14 100FXFXD 31 1 3169 99FQFND 1 25 3 75PXPXP 65 1 8477 99X 479 1 80461 99GRIP-motif 5 1 50 1 50GRIP-motif (shortened) 6 1 3 28 97
C-term variations 4PROSITE Pattern 7 134 77 39 23{KH}DEL 86 78 5 4{KHR}{DENQ}EL 125 80 32 20{KHR}{DENQ}L 125 71 49 29{KHRDENQAS}{DENQIYCV}{DENQ}L 156 25 477 75{KRDEAVYF}{KRDEVYFMQ}{KHED}{DK}EL 39 89 5 11
KO Wrzeszczynski & B Rost (2004) CMLS 61: 1341-53Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Open challenges - motifs and patterns
Automate
Unify
Remote homologues
© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Identify active site / functional element
Search for this structural pattern in a new protein
Transfer function annotation
S Jones & J Thornton (2004) Curr Opin Struc Biol 8:3-7
Structural motifs
Manual identification of active site Automatic structural alignment?
© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Open challenges - structural motifs
Find
Search
Add biophysics of the site to the spatial search
© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Example 3:Voltage-gated
potassium channel
70Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Example: Voltage-gated potassium channel
71
V Ruta et al. & R MacKinnon (2003) Nature, 422:180-5
• Eukaryotic voltage-gated potassium channel (VG-K+) • Prokaryotic membrane proteins are easier to crystallize than eukaryotic ones
• find a prokaryotic VG-K+ having functional and structural features similar to the eukaryotic one
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: sequence
72
1MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY
GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD
TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY
ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS
IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA
CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI
LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG
SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN
FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG
LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ
QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV
The template: voltage gated potassium channel from Shaker
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: search
73PSI-BLAST: http://www.ncbi.nih.gov/BLAST/ © Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: alignment
74
Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + LTarget: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209
Shaker: 473 PVPVIVSNF 481 + + + FTarget: 210 LIGTVSNMF 218
the alignment
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: alignment
74
Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + LTarget: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209
Shaker: 473 PVPVIVSNF 481 + + + FTarget: 210 LIGTVSNMF 218
the alignment
~ 30% SI and 80 aligned residues
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: filter
75© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: alignment
76
Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + LTarget: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209
Shaker: 473 PVPVIVSNF 481 + + + FTarget: 210 LIGTVSNMF 218
the alignment
Target :
295
1
the entire sequence of the identified protein
MSVERWVFPGCSVMARFRRGLSDLGGRVRNIGDVMEHPLVELGVSYAALLSVIVVVVEYT
MQLSGEYLVRLYLVDLILVIILWADYAYRAYKSGDPAGYVKKTLYEIPALVPAGLLALIE
GHLAGLGLFRLVRLLRFLRILLIISRGSKFLSAIADAADKIRFYHLFGAVMLTVLYGAFA
IYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTLL
IGTVSNMFQKILVGEPEPSCSPAKLAEMVSSMSEEEFEEFVRTLKNLRRLENSMK
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: function?
77
Shaker channel
• Membrane protein
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel:
78
Out
In
α-bundle β-barrel
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: TMH predicted
79
1 MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY
GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD
TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY
ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS
IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA
CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI
LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG
SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN
FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG
LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ
QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV
S1
S2
S3
S4 S5
P S6
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: TMH predicted
80
Side Viewsingle subunit
Top Viewtetramer
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Programs available for membrane helices
81
PHDpsihtm: http://www.embl-heidelberg.de/predictprotein/predictprotein.html
DAS: http://www.sbc.su.se/~miklos/DAS/maindas.html
HMMTOP2: http://www.enzim.hu/hmmtop/
SOSUI: http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0.html TMHMM2: http://www.cbs.dtu.dk/services/TMHMM/
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: TMHs predicted
82
MSVERWVFPGCSVMARFRRGLSDLGGRVRNIGDVMEHPLVELGVSYAALLSVIVVVVEYT
MQLSGEYLVRLYLVDLILVIILWADYAYRAYKSGDPAGYVKKTLYEIPALVPAGLLALIE
GHLAGLGLFRLVRLLRFLRILLIISRGSKFLSAIADAADKIRFYHLFGAVMLTVLYGAFA
IYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTLL
IGTVSNMFQKILVGEPEPSCSPAKLAEMVSSMSEEEFEEFVRTLKNLRRLENSMK
S1
S2 S3
S4 S5
P S6
TMHs predictions on the target sequence
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: function of template
83
Shaker channel
• Membrane protein
• K+ selectivity
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel:
84
Out
In + -
-
++ -
-
+
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: selectivity motif
85
1 MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY
GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD
TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY
ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS
IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA
CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI
LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG
SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN
FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG
LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ
QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV
S1
S2
S3
S4 S5
P S6
656
TxxTxGxG
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: conservation of outer pore
86
Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + LTarget: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209
Shaker: 473 PVPVIVSNF 481 + + + FTarget: 210 LIGTVSNMF 218
P S6
the selectivity filter
S5 S6
P
S4S3S2S1T
Gx
xG
x xT
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: functional characterization of target
87
Shaker channel
• Membrane protein
• K+ selectivity
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: functional characterization of target
88
Shaker channel
• Membrane protein
• K+ selectivity
• Voltage gating
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel:
89
Out
In
Out
© Marco Punta
closed
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel:
90
Out
In
+
-
Out
© Marco Punta
open
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: residues related with gating
91
1 MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY
GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD
TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY
ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS
IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA
CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI
LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG
SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN
FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG
LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ
QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV
S1
S2
S3
S4 S5
P S6
656
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Voltage sensor
92
1 MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY
GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD
TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY
ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS
IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA
CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI
LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG
SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN
FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG
LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ
QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV
S1
S2
S3
S4 S5
P S6
RxxRxxRxxR
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Other voltage sensing residues
93
1 MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY
GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD
TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY
ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS
IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA
CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI
LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG
SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN
FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG
LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ
QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV
S1
S2
S3
S4 S5
P S6
656
DE E R
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Gating hinge glycine
94
1 MAAVAGLYGLGEDRQHRKKQQQQQQHQKEQLEQKEEQKKIAERKLQLREQQLQRNSLDGY
GSLPKLSSQDEEGGAGHGFGGGPQHFEPIPHDHDFCERVVINVSGLRFETQLRTLNQFPD
TLLGDPARRLRYFDPLRNEYFFDRSRPSFDAILYYYQSGGRLRRPVNVPLDVFSEEIKFY
ELGDQAINKFREDEGFIKEEERPLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLS
IVIFCLETLPEFKHYKVFNTTTNGTKIEEDEVPDITDPFFLIETLCIIWFTFELTVRFLA
CPNKLNFCRDVMNVIDIIAIIPYFITLATVVAEEEDTLNLPKAPVSPQDKSSNQAMSLAI
LRVIRLVRVFRIFKLSRHSKGLQILGRTLKASMRELGLLIFFLFIGVVLFSSAVYFAEAG
SENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVPVIVSN
FNYFYHRETDQEEMQSQNFNHVTSCPYLPGTLGQHMKKSSLSESSSDMMDLDDGVESTPG
LTETHPGRSAVAPFLGAQQQQQQQPVASSLSMSIDKQLQHPLQHVTQTQLYQQQQQQQQQ
QQNGFKQQQQQTQQQLQQQQSHTINASAAAATSGSGSSGLTMRHNNALAVSIETDV
S1
S2
S3
S4 S5
P S6G
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Conservation of functional residues in target
95
S5 S6
P
S4S3S2S1
Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + LSbjct : 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209
Shaker: 473 PVPVIVSNF 481 + + + FSbjct : 210 LIGTVSNMF 218
P S6
the gating hinge
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Conservation of functional residues in target
96
S5 S6
P
S4S3S2S1
Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + LSbjct : 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209
Shaker: 473 PVPVIVSNF 481 + + + FSbjct : 210 LIGTVSNMF 218
P S6
the gating hinge
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Conservation of functional residues in target
97
S5 S6
P
S3S2S1
Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + LTarget: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209
Shaker: 473 PVPVIVSNF 481 + + + FTarget: 210 LIGTVSNMF 218
P S6
+
+++
S4
voltage sensor
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Conservation of functional residues in target
98
PLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLSIVIFCLETL MSVERWVFPGCSVMARFRRGLSDLGGRVRNIGDVMEHPLVELGVSYAALLSVIVVVVEYT
PEFKHYKVF≠PFFLIETLCIIWFTFELTVRFLACPNKLNFC≠VMNVIDIIAIIPYFITLAAMQLSGEYLV≠RLYLVDLILVIILWADYAYRAYKSGDPAGYV≠KKTLYEIPALVPAGLLALI
IIPYFIT≠ILRVIRLVRVFRIFKLSRHSKGLQILGRTLKAMRELGLLIFFLFIGVVLFSSAEGHLAGL≠LFRLVRLLRFLRILLIISRGSKFLSAIADAADKIRFYHLFGAVMLTVLYGAFA
VYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTLLI
PVIVSNFNYFYHGTVSNMFQKILVGEPEPSCSPAKLAEMVSSMSEEEFEEFVRTLKNLRRLENSMK
S1
S2 S3
S4 S5
P S6
voltage sensor
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Conservation of functional residues in target
99
PLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLSIVIFCLETL MSVERWVFPGCSVMARFRRGLSDLGGRVRNIGDVMEHPLVELGVSYAALLSVIVVVVEYT
PEFKHYKVF≠PFFLIETLCIIWFTFELTVRFLACPNKLNFC≠VMNVIDIIAIIPYFITLAAMQLSGEYLV≠RLYLVDLILVIILWADYAYRAYKSGDPAGYV≠KKTLYEIPALVPAGLLALI
IIPYFIT≠ILRVIRLVRVFRIFKLSRHSKGLQILGRTLKAMRELGLLIFFLFIGVVLFSSAEGHLAGL≠LFRLVRLLRFLRILLIISRGSKFLSAIADAADKIRFYHLFGAVMLTVLYGAFA
VYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTLLI
PVIVSNFNYFYHGTVSNMFQKILVGEPEPSCSPAKLAEMVSSMSEEEFEEFVRTLKNLRRLENSMK
S1
S2 S3
S4 S5
P S6
voltage sensor
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Conservation of functional residues in target
100
S5 S6
P
S3S2S1
Shaker: 413 AVYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIAL 472 A+Y E NS KS+ DA WWAVVT TTVGYGD+ P GK++G + G+ + LTarget: 150 AIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTL 209
Shaker: 473 PVPVIVSNF 481 + + + FTarget: 210 LIGTVSNMF 218
P S6
S4
other voltage sensing residues
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel:
Conservation of functional residues in target
101
PLPDNEKQRKVWLLFEYPESSQAARVVAIISVFVILLSIVIFCLETL MSVERWVFPGCSVMARFRRGLSDLGGRVRNIGDVMEHPLVELGVSYAALLSVIVVVVEYT
PEFKHYKVF≠PFFLIETLCIIWFTFELTVRFLACPNKLNFC≠VMNVIDIIAIIPYFITLAAMQLSGEYLV≠RLYLVDLILVIILWADYAYRAYKSGDPAGYV≠KKTLYEIPALVPAGLLALI
IIPYFIT≠ILRVIRLVRVFRIFKLSRHSKGLQILGRTLKAMRELGLLIFFLFIGVVLFSSAEGHLAGL≠LFRLVRLLRFLRILLIISRGSKFLSAIADAADKIRFYHLFGAVMLTVLYGAFA
VYFAEAGSENSFFKSIPDAFWWAVVTMTTVGYGDMTPVGVWGKIVGSLCAIAGVLTIALPVIYIVEYPDPNSSIKSVFDALWWAVVTATTVGYGDVVPATPIGKVIGIAVMLTGISALTLLI
PVIVSNFNYFYHGTVSNMFQKILVGEPEPSCSPAKLAEMVSSMSEEEFEEFVRTLKNLRRLENSMK
S1
S2 S3
S4 S5
P S6
voltage sensor sensitive residues
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Voltage-gated K+ channel: Function of target
102
Shaker channel
• Membrane protein
• K+ selectivity
• Voltage gating
© Marco PuntaMonday November 4, 2013
© Burkhard Rost (TUM Munich) /00
MacKinnon’s nobel
103Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
I.2c Function Intro:
Function by association
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Co-expression
Expression data Machine Learning / Clustering Functional classes
For example: P Brown et al. (2000) PNAS 97:262-267© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Interactions / networks
For example: AH Tong et al. (2002) Science 295: 321-324© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00A Bairoch (2000) Nucleic Acid Res 28:304-305
Differentiate functional and physical interaction
Improve accuracy and coverage (data, algorithm)
Ab initio/de novo prediction
© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Predict aspects of function
Sub-cellular localization (nucleus, membrane,
etc.)
Post-translational modifications
Functionally important residues
Interaction sites
© Marco Punta & Yanay Ofran & Burkhard Rost (Columbia New York)
Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Conclusions today
Function introduction• Molecular biology is just at an exciting beginning• We can compute some aspects of molecular life• Most accurate inference of function: based on homology• Homology-based inference of function can be improved by
motifsproblem: definition of motifs still not fully automated
NEXT• Prediction of subcellular localization
109Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Lecture plan (PP2 function)01: 2013/10/15: no lecture02: 2013/10/17: welcome: who we are03: 2013/10/22: Intro - function 1: concepts04: 2013/10/24: Intro - function 2: homology05: 2013/10/29: Tobias Hamp: Homology-based prediction of function06: 2013/10/31: no lecture Reformation07: 2013/11/05: Intro - function 2: inference08: 2013/11/07: Intro - function 3: motifs09: 2013/11/12: no lecture: SVV (student reps)10: 2013/11/14: Localization 111: 2013/11/19: Localization 212: 2013/11/21: Localization 3 - Tatyana Goldberg13: 2013/11/26: SNP effect 114: 2013/11/28: SNP effect 215: 2013/12/03: SNP effect 316: 2013/12/05: Protein-protein interaction 117: 2013/12/10: Exercise/Presentations18: 2013/12/12: no lecture: Dies Academicus / Protein-protein interaction 119: 2013/12/17: Protein-protein interaction 220: 2013/12/19: Andrea Schafferhans: 3D function prediction21-24: no lectures - winter break (2013/12/23 - 2014/01/06)25: 2014/01/07: Protein-protein interaction 326: 2014/01/09: Andrea Schafferhans: Docking 27: 2014/01/14: Protein-DNA interaction 128: 2014/01/16: Protein-DNA/RNA interaction 229: 2014/01/21: Marco De Vivo (ISS Genoa)30: 2014/01/23: Marco Punta (Pfam)31: 2014/01/28: Arthur Dong: networks32: 2014/01/30: wrap-up33: 2014/02/04: examen34: 2014/02/06: no lecture
110Monday November 4, 2013
© Burkhard Rost (TUM Munich) /00
Lecture plan (PP2 function-should!)01: 2013/10/15: no lecture02: 2013/10/17: welcome: who we are03: 2013/10/22: Intro - function 1: concepts04: 2013/10/24: Intro - function 2: homology05: 2013/10/29: Tobias Hamp: Homology-based prediction of function06: 2013/10/31: no lecture Reformation07: 2013/11/05: Intro - function 3: motifs08: 2013/11/07: Localization 109: 2013/11/12: no lecture: SVV (student reps)10: 2013/11/14: Localization 211: 2013/11/19: Individualized medicine12: 2013/11/21: Localization 3 - Tatyana Goldberg13: 2013/11/26: SNP effect 114: 2013/11/28: SNP effect 215: 2013/12/03: SNP effect 316: 2013/12/05: Protein-protein interaction 117: 2013/12/10: Exercise/Presentations18: 2013/12/12: no lecture: Dies Academicus / Protein-protein interaction 119: 2013/12/17: Protein-protein interaction 220: 2013/12/19: Andrea Schafferhans: 3D function prediction21-24: no lectures - winter break (2013/12/23 - 2014/01/06)25: 2014/01/07: Protein-protein interaction 326: 2014/01/09: Andrea Schafferhans: Docking 27: 2014/01/14: Protein-DNA interaction 128: 2014/01/16: Protein-DNA/RNA interaction 229: 2014/01/21: Marco De Vivo (ISS Genoa)30: 2014/01/23: Marco Punta (Pfam)31: 2014/01/28: Arthur Dong: networks32: 2014/01/30: wrap-up33: 2014/02/04: examen34: 2014/02/06: no lecture
111
should have
Monday November 4, 2013