rostlab.org · © burkhard rost (tu munich) /82 profile-based comparison 1 50 fyn_human vtlfvalydy...
TRANSCRIPT
© Burkhard Rost (TU Munich) /821
title: Alignments - Profile-based
short title: alignments_2
lecture: Protein Prediction I - Protein Structure / Burkhard Rost, TUM, 2011
summer
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Announcements
Videos: SciVee www.rostlab.orgTHANKS : Tim Karl + Haitham Sohby NO lectures: ?LAST lecture: Jul 7Examen: Jul 12 (?), 10:30 (likely this room)
• Makeup: likely: October 13 - morning
CONTACT: Marlena Drabik [email protected]
2Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Today: Secondary structure prediction 1
LAST WEEKs• Secondary structure prediction
THIS WEEK• Alignments and “reach of comparative modeling”
NEXT WEEK• Comparative modeling
3Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence comparisons:
multiple alignment/profile-based
4Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Profile-based comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
5Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile methods
PSI-BLAST fast, partial dynamic programmingStephen F Altschul, TL Madden, Alejandro A Schaeffer, Jinghui Zhang, Zheng Zhang, Webb Miller & David J Lipman (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. NAR 25:3389-3402
>32,000 citations in Google Scholar May 2010
6Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
concept of PSI-BLAST
7Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
PSI-BLAST in steps
8
1. fast hashing
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Like BLAST match ‘words’
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
Default “word” size for “seeds” = 3
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Like BLAST match ‘words’
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
TTYKLILTTYKLIL
WTYDDATKTFWTVEKAFKTF
AATAEKVFKQYAAWTVEKAFKTFA
Default “word” size for “seeds” = 3
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
PSI-BLAST in steps
10
1. fast hashing2. extend in between matches by dynamic programming
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
BLAST + Smith-Waterman
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
dynamic programming to extend
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
BLAST + Smith-Waterman
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
TTYKLILTTYKLIL
WTYDDATKTFWTVEKAFKTF
AATAEKVFKQYAAWTVEKAFKTFA
dynamic programming to extend
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
PSI-BLAST in steps
12
1. fast hashing2. extend in between matches by dynamic programming3. compiles statistics
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Significance of match (e.g. BLAST E-values)
13Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
PSI-BLAST in steps
14
1. fast hashing2. extend in between matches by dynamic programming3. compiles statistics4. collect all pairs and build profile
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
PS-positionspecific
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
PS-positionspecific
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
PS-positionspecific
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
PSI-BLAST in steps
16
1. fast hashing2. extend in between matches by dynamic programming3. compiles statistics4. collect all pairs and build profile5. iterate
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
PSI-positionspecificiteration
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
PSI-positionspecificiteration
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
PSI-positionspecificiteration
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Expanding in sequence space: dynamics of PSI-
BLAST18
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Profile-based database search
Family U
U
B Rost 2001 J Struct Biol 134, 204-21Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Profile-based database search
Family U
safe forpairwise
safe zo
ne
B Rost 2001 J Struct Biol 134, 204-21Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Profile-based database search
zonereached throughposition-specific
family profileFamily U
safe forpairwise
safe zo
neU
B Rost 2001 J Struct Biol 134, 204-21Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Profile-based database search
zonereached throughposition-specific
family profileFamily U
safe forpairwise
safe zo
neUlost afteriteration
B Rost 2001 J Struct Biol 134, 204-21Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Profile-based database search
zonereached throughposition-specific
family profileFamily U
safe forpairwise
safe zo
neU
safe zonesof close
homologues
lost afteriteration
B Rost 2001 J Struct Biol 134, 204-21Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Profile-based database search
B Rost 2001 J Struct Biol 134, 204-21Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
LWYGQQAR KSQDKAKHAF AQHKRLQSTTLWYGQQAR KSQDKAKHAF AQHKRLQSTTLWYGQQAR KSQDKAKHAF AQHKRLQSTTLWYGQQAR KSQDKAKHAF AQHKRLQSTTLWYGQQAR KSQDKAKHAF AQHKRLQSTTLWYGQQAR KSQDKAKHAF AQHKRLQSTTLWYGQQAR KSQDKAKHAF AQHKRLQSTTLWYGQQAR KSQDKAKHAF AQHKRLQSTTLWYGQQAR KSQDKAKHAF AQHKRLQSTTLWYGQQAR KSQDKAKHAF AQHKRLQSTT
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile methods
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalXslow, dynamic programming, for expertsJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
27Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Clustal (ClustalW, ClustalX)
all against all (pairs) by dynamic programming (varying substitution matrices)build phylogenetic tree
28
A B C DA 90 80 70B 90 80C 90D A B C D
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
INSERT: reproduce phylogeny?
29Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
30Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
30Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
30Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
30Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
30Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
31Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
31Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
31Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
31Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
31Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
31Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
32Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Reproduce phylogeny
33Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Clustal (ClustalW, ClustalX)
all against all (pairs) by dynamic programming (varying substitution matrices)build phylogenetic tree
34
A B C DA 90 80 70B 90 80C 90D A B C D
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Clustal (ClustalW, ClustalX)
all pairs (dynamic programming with varying substitution matrices)create phylogenetic treecluster and dynamic programming
35
A B C DA 90 80 70B 90 80C 90D A B
C
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
D
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Clustal (ClustalW, ClustalX)
over 30,000 citations in GoogleScholar
36
Desmond G Higgins & Paul M Sharp (1988) GeneDesmond G Higgins, AJ Bleasby & Reiner Fuchs (1992) Bioinformatics 8:189-91Julie D Thompson, Desmond G Higgins, Tobby J Gibson (1994) NAR 22:4673-80F Jeanmougin, Julie D Thompson, M Gouy, Des G Higgins & Toby J Gibson (1998) TIBS 23:403-5
Clustal
ClustalV
ClustalW
ClustalX
2402
2197
31056
1559
GoogleScholar May 2010
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Clustal (ClustalW, ClustalX)
over 30,000 citations in GoogleScholar
37
Desmond G Higgins & Paul M Sharp (1988) GeneDesmond G Higgins, AJ Bleasby & Reiner Fuchs (1992) Bioinformatics 8:189-91Julie D Thompson, Desmond G Higgins, Tobby J Gibson (1994) NAR 22:4673-80F Jeanmougin, Julie D Thompson, M Gouy, Des G Higgins & Toby J Gibson (1998) TIBS 23:403-5
Des Higgins
Toby Gibson
Julie Dawn Thompson
Shapers and Shakers
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile methods
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402ClustalW/ClustalXslow, dynamic programming, for expertsJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80MaxHomrelatively slow, dynamic programming, good first guessC Sander & R Schneider (1991) Proteins 9:56-69
38Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Maxhom/HSSP
Homology-derived protein structures and the structural meaning of sequence alignment
39
Chris Sander & Reinhard Schneider (1991) Proteins 9:56-69C Sander & R Schneider (1993) NAR 21:3105-9Reinhard Schneider (1994) Sequenz- und Struktur Vergleiche und deren Anwendung für die Struktur- und Funktionsvorhersage von Proteinen (PhD Heidelberg University)
Reinhard SchneiderChris Sander
Shapers and Shakers
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Maxhom/HSSP
40
A B C DA 90 80 70B 90 80C 90D
A B
CA
DA
-> Profile (P0)conservation weight (cw0)
Sweep 1 Sweep 2P0cw0 B
P1cw0 C
P1cw0 D
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile methods
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402ClustalW/ClustalXslow, dynamic programming, for expertsJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80MaxHomrelatively slow, dynamic programming, good first guessC Sander & R Schneider (1991) Proteins 9:56-69SAM/HMMerslow, need preprocess, HMM (statistics), very accurateR Hughey & A Krogh (1996) CABIOS 12:95-107S Eddy (1998) Bioinformatics 14:755-63
41Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
HMM & biology: SAM & HMMerJohn A Hertz, Richard G Palmer, Anders Krogh: Introduction to the Theory of Neural Computation, Westview Press
A Krogh, IS Mian, David Haussler (1994) NAR 22:4768-78R Durbin, S Eddy, A Krog & G Mitchison: Probabilistic models of proteins and nucleic acids, Cambridge University Press
42
Anders Krogh
David Haussler
Sean EddyKevin Karplus
Shapers and Shakers
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Hidden Markov Models (HMM) - SAM
• A Krogh, M Brown, IS Mian, K Sjölander and D Haussler (1994) J Mol Biol 235 1501-31• K Karplus, C Barrett and R Hughey (1998) Bioinformatics 14 846-56• SR Eddy (1998) Bioinformatics 14 755-63
• K Karplus, R Karchin, J Draper, J Casper, Y Mandel-Gutfreund, M Diekhans and R Hughey (2003) Proteins: Structure, Function, and Genetics 53 491-6
SAM-T02 web site, UCSC, Kevin Karplus
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Thanks for slides
Following slides cut out from an ISMB Tutorial given by Kevin Karplus 1999 in Heidelberg
44
© Kevin Karplus UCSChttp://www.sccrtc.org/photos/awards/karplus00-2001.jpg
http://users.soe.ucsc.edu/~karplus/bike/karplus_recumbent.gif
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
SAM-T98: Build alignment
45
Reestimate the alignment with the new homologs
Use the model to search for additional homologs
Build a model from the sequence or alignment
SAM-T98 Alignment Building
(Iterations 1 - 3)
Start: a single sequence
End: a SAM-T98 alignment(Iteration 4)
© Kevin Karplus UCSCThursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Background: entropy
46
Entropy: log (# microstates in a system)
H(x) = - SUM/x(i) { P(x(i)) log P(x(i)) }
minimal for PEAK distribution
maximal for uniform distribution
2 x1
1.0
P(x)
2 x1
1.0
P(x)
2 x1
1.0
P(x)
2 x1
1.0
P(x)
© Kevin Karplus UCSCThursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Entropy in alignment
consider column c at a residue position BEFORE any amino acid is aligned, we expect a particular acid according to some prior or background probability, P0, with entropy H0now consider same column AFTER alignmentposterior probability Pc + priors -> Hcif c conserved: Hc->0, if c totally varied: Hc->H0Hc-H0 reflects the “bits saved” by the alignment
47
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPI
QRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30CMYFPIVLHQARETKGSDN
31MFYHIPVTNDLGQSAERK
32MFYHIPVTNDLGQSAERK
33WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37
MYCHFPI
QRVLDEKNGTAS
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPI
QRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
© Kevin Karplus UCSCThursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Hidden Markov Models (HMM) - SAM
• A Krogh, M Brown, IS Mian, K Sjölander and D Haussler (1994) J Mol Biol 235 1501-31• K Karplus, C Barrett and R Hughey (1998) Bioinformatics 14 846-56• SR Eddy (1998) Bioinformatics 14 755-63
• K Karplus, R Karchin, J Draper, J Casper, Y Mandel-Gutfreund, M Diekhans and R Hughey (2003) Proteins: Structure, Function, and Genetics 53 491-6
SAM-T02 web site, UCSC, Kevin Karplus
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Alignment entropy for small families
49
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPI
QRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28WHQYPMERDNKFGTSIVLAC
29CHPDNYEGQRSKTFAVIML
30CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37
MYCHFPI
QRVLDEKNGTAS
• In alignments with few family members or little divergence the entropy signal will be dominated by the priors-> the background signal dominates
0
1
2
bits
sav
ed
1MHYFQPINRGDKELVAST
2MFYHIV
PLNGTQDSAERK
3CMYHFI
QNRVTLDGKESAP
4CMPQDNGHWERKTSIAVLFY
5MFYHI
PVNGDTLQSAEKR
6WMCFYIHVQLPRTKENDSAG
7MIPFVQTYGNLDARSEKH
8MFYHI
PVNGDTLQSAEKR
9CDQNPHGEKRMWSTAVIYLF
10
MHYFQPINRGDKELVAST
11
MFYHIV
PLNGTQDSAERK
12
MFYHIVLPGNTRQSAKDE
13
FIYHVPLQRATKESGDN
14
WHCNDQGPYRMKESFTALIV
15
MFYHI
PVNGDTLQSAEKR
16
WHCNDQGPREKYSMTFAVLI
17
CWHNDQGPERSKYTMAFVIL
18
MFYHIVLPGNTRQSAKDE
19
MFHYIQVLPRNKDEGTAS
20
CPMQHDNKEGRTSAIVLYFW
21
CDQNPHGEKRMWSTAVIYLF
22
HCMYFNQRPDIKTELSGVA
23MFYHIV
PLNGTQDSAERK
24FIYHVPLQRATKESGDN
25WHCNDQGPREKYSMTFAVLI
26MFYHIVLPGNTRQSAKDE
27
FIYHVPLQRATKESGDN
28
CMYHFI
QNRVTLDGKESAP
29
CMPQDNGHWERKTSIAVLFY
30
CWHNDQGPERSKYTMAFVIL
31
MFIYHVLPRQTAKGSNED
32
MHYFQPINRGDKELVAST
33
MFYHIV
PLNGTQDSAERK
34
WMCFYIHVQLPRTKENDSAG
35
CWHNDQGPERSKYTMAFVIL
36
MFYHIVLPGNTRQSAKDE
37
FIYHVPLQRATKESGDN
38
CWHNDQGPERSKYTMAFVIL
39
CHNDPGQYREKSTFAIVLM
40
MFYHIV
PLNGTQDSAERK
41
FIYHVPLQRATKESGDN
42
MHYFQPINRGDKELVAST
0
1
2
3
4
bits
sav
ed
1HVLPTNGQASDREK
2LDPATNSEQGRK 3IYVHPL
TANGDSQEKR
4YIVPHDTLSANGEQKR
5PGK 6FMYIPVHDGTNLSAEQKR
7GYEFKRQSIAVLT
8SIVTAL 9PQGMNEKRS
HWTAVLIYF
10
LHQRAGPNKDEST
11
PGQNTADERSK
12
ASE
13
FPYGNIDRSTLVKEAQ
14
QEYFSKARIVTL
15
QEVAL
16
QFKSRTIVLAE
17
CTYAFVMIL
18
HIPGVLTSNADQRKE
19
HPLGTQDNSAERK
20
AVLE
21
RTSHAMVIWLYF
22
AL
23
REAVKFL
24
PRAGQKEHDTSN
25
AIVL
26
LDTRASEK
27 28
EANSQKRP
29
MPDQGHENSI
RATVFLKY
30
NGQEKRYSMTFAVIPL
31
HPQRKAEGNDTS
32
SEIAVLR
33
RSAE
34
PMIGNLVSTADQKRE
35
DPHGMNYFQESTAVILKR
36
KRTVAILE
37
GVNTDLRSKAQE
38
CKSYTAFMVIL
39
HYFNDQCPRMKEILTVGSA
40
TDNVLSQRAEK
41
VEKASL
42
NPGQEKCRYMSFAIVTL
© Kevin Karplus UCSCThursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Alignment entropy for large families
50
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPI
QRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28WHQYPMERDNKFGTSIVLAC
29CHPDNYEGQRSKTFAVIML
30CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37
MYCHFPI
QRVLDEKNGTAS
• In alignments with many family members and/or high divergence the entropy signal will be dominated by the observed profile-> profile dominates
• problem: possible over-training
0
1
2
bits
sav
ed
1MHYFQPINRGDKELVAST
2MFYHIV
PLNGTQDSAERK
3CMYHFI
QNRVTLDGKESAP
4CMPQDNGHWERKTSIAVLFY
5MFYHI
PVNGDTLQSAEKR
6WMCFYIHVQLPRTKENDSAG
7MIPFVQTYGNLDARSEKH
8MFYHI
PVNGDTLQSAEKR
9CDQNPHGEKRMWSTAVIYLF
10
MHYFQPINRGDKELVAST
11
MFYHIV
PLNGTQDSAERK
12
MFYHIVLPGNTRQSAKDE
13
FIYHVPLQRATKESGDN
14
WHCNDQGPYRMKESFTALIV
15
MFYHI
PVNGDTLQSAEKR
16
WHCNDQGPREKYSMTFAVLI
17
CWHNDQGPERSKYTMAFVIL
18
MFYHIVLPGNTRQSAKDE
19
MFHYIQVLPRNKDEGTAS
20
CPMQHDNKEGRTSAIVLYFW
21
CDQNPHGEKRMWSTAVIYLF
22
HCMYFNQRPDIKTELSGVA
23
MFYHIV
PLNGTQDSAERK
24
FIYHVPLQRATKESGDN
25
WHCNDQGPREKYSMTFAVLI
26
MFYHIVLPGNTRQSAKDE
27
FIYHVPLQRATKESGDN
28
CMYHFI
QNRVTLDGKESAP
29
CMPQDNGHWERKTSIAVLFY
30
CWHNDQGPERSKYTMAFVIL
31
MFIYHVLPRQTAKGSNED
32
MHYFQPINRGDKELVAST
33
MFYHIV
PLNGTQDSAERK
34
WMCFYIHVQLPRTKENDSAG
35
CWHNDQGPERSKYTMAFVIL
36
MFYHIVLPGNTRQSAKDE
37
FIYHVPLQRATKESGDN
38
CWHNDQGPERSKYTMAFVIL
39
CHNDPGQYREKSTFAIVLM
40
MFYHIV
PLNGTQDSAERK
41
FIYHVPLQRATKESGDN
42
MHYFQPINRGDKELVAST
0
1
2
3
4
bits
sav
ed
1HVLPTNGQASDREK
2LDPATNSEQGRK 3IYVHPL
TANGDSQEKR
4YIVPHDTLSANGEQKR
5PGK 6FMYIPVHDGTNLSAEQKR
7GYEFKRQSIAVLT
8SIVTAL 9PQGMNEKRS
HWTAVLIYF
10
LHQRAGPNKDEST
11
PGQNTADERSK
12
ASE
13
FPYGNIDRSTLVKEAQ
14
QEYFSKARIVTL
15
QEVAL
16
QFKSRTIVLAE
17
CTYAFVMIL
18
HIPGVLTSNADQRKE
19
HPLGTQDNSAERK
20
AVLE
21RTSHAMVIWLYF
22AL
23REAVKFL
24
PRAGQKEHDTSN
25
AIVL
26
LDTRASEK
27 28
EANSQKRP
29
MPDQGHENSI
RATVFLKY
30
NGQEKRYSMTFAVIPL
31
HPQRKAEGNDTS
32
SEIAVLR
33
RSAE
34
PMIGNLVSTADQKRE
35
DPHGMNYFQESTAVILKR
36
KRTVAILE
37
GVNTDLRSKAQE
38
CKSYTAFMVIL
39
HYFNDQCPRMKEILTVGSA
40
TDNVLSQRAEK
41
VEKASL
42
NPGQEKCRYMSFAIVTL
© Kevin Karplus UCSCThursday June 9, 2011
© Burkhard Rost (TU Munich) /82
HMM: some issues
Sequence weightingColumn regularizers
Transition regularizers
51
2crdExample 1
XFTNVSCTTSKECWSVCQRLHNTSRG.KCMNKKCRCYS
------CTTSKECWSVCQRLHNTSKG.WCDHRGCICES
XFTNVSCTTSKECWSVCQRLHNTSRG.KCMNKKCRCYS
XFTNVSCTTSKEXWSVCQRLHNTSRG.KCMNKKXRCYS
XFTQESCTASNQCWSICKRLHNTNRG.KCMNKKCRCYSXFTNVSCSASSQCWPVCKKLFGTYRG.KCMNSKCRCYS
XFTDVKCTGSKQCWPVCKQMFGKPNG.KCMNGKCRCYS
----VSCTGSKDCYAPCRKQTGCPNA.KCINKSCKCYG
TIINVKCTSPKQCSKPCKELYGSSAGaKCMNGKCKCYN
VGINVKCKHSGQCLKPCKDA-GMRFG.KCINGKCDCTP
2crd1sxm1cmr1bah1txm
2bmt1bkt
1lir
1big
Example 2
© Kevin Karplus UCSCThursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile methods
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalXslow, dynamic programming, for expertsJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
MaxHomrelatively slow, dynamic programming, good first guessC Sander & R Schneider (1991) Proteins 9:56-69
SAM/HMMerslow, need preprocess, HMM (statistics), very accurateR Hughey & A Krogh (1996) CABIOS 12:95-107/ S Eddy (1998) Bioinformatics 14:755-63
T-Coffeemuch slower, requires preprocessing, Genetic AlgorithmCedric Notredame, DG Higgins, Jaap Heringa (2000) JMB 302:205-17
52Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
T-Coffee
T-Coffeemuch slower, requires preprocessing, Genetic AlgorithmCedric Notredame, DG Higgins, Jaap Heringa (2000) JMB 302:205-17
53
Des Higgins
Cedric Notredame
Jaap Heringa
Shapers and Shakers
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Local Alignment Global Alignment
Extension
Multiple Sequence Alignment
T-Coffee: Mix local and global alignment
54© Cedric Notredame, CRG BarcelonaThursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Local Alignment Global Alignment
Multiple Sequence Alignment
Multiple Alignment
StructuralSpecialist
© Cedric Notredame, CRG Barcelona
T-Coffee: Use more information
55Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Anything more fancy?
57Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Profile-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Profile-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence comparisons:homology
59Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Why bother to align sequences?
60Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence alignments
61
Why do we need to align sequences?
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence alignments
61
Why do we need to align sequences?
Homology!
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Zones
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Relations in protein space
63Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence -> Structure
structurespace sequence
space
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence -> Structure
structurespace sequence
space
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Sequence -> Structure
structurespace sequence
space
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Similar (Sequence,Structure,Function)
67
Similar Sequence
Similar StructureSimilar Function
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
From 3D twilight to 3D midnight
zone
68Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Zones
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
PDB all-against-all
70Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Databases biased: MUST remove bias!
71Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Hypothetical distribution of similar structures
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
0 25 50 75 1000
20
40
60
Percentage of identical residues
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
0 25 50 75 1000
20
40
60
Percentage of identical residues
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
0 25 50 75 1000
20
40
60
Percentage of identical residues
FAKE DATA
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Midnight zone: real - random
0
20
40
60
0 5 10 15 20 25
Num
ber o
f pai
rs
Percentage identical residues
B Rost 1997 Folding & Design 2, S19-S24Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Midnight zone: real - random
0
20
40
60
0 5 10 15 20 25
Num
ber o
f pai
rs
Percentage identical residues
B Rost 1997 Folding & Design 2, S19-S24Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Midnight zone: real - random
0
20
40
60
0 5 10 15 20 25
Num
ber o
f pai
rs
Percentage identical residues
B Rost 1997 Folding & Design 2, S19-S24Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Midnight zone: real - random
0
20
40
60
0 5 10 15 20 25
Num
ber o
f pai
rs
Percentage identical residues
B Rost 1997 Folding & Design 2, S19-S24 AS Yang and B Honig 2000 J Mol Biol 301, 679-689Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Evolution into the Midnight zone
0
400
800
1200
1600
0 5 10 15 20 25
Num
ber o
f stru
ctur
e pa
irs
Percentage pairwise sequence identity
25 50 75 100
0
B Rost 1997 Folding & Design 2, S19-S24Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Protein structures evolved at random - almost
average < 10%• -> most pairs have ‘random’ identity levels
• 3 - 4% anchor residues4 billion years of evolution reached equilibrium
• rate of creating new structures slower than drift towards meanaverages for convergent and divergent evolution similar
• convergent evolution may have been a major event
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Secondary structure
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Str 3......
3DPDB
EEH
HEEH
HEHH
EHHÉHE
FosfosProfile 1D Projection
sec acc
1aap1tcp
1btr
Seq (U) PHD 3
...
...
1DPHD
PHD 1
PHD 2
PHD n
Str 1Str 2
Str n
Two paths to fold recognition
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
TOPITS
good match to one of the known structures?=> • predict fold of matching structure
• model 3D coordinates by homology
LWQRPLVTIKIGGQLKEALLDTGAD
LWQRPLVTIKIGGQLKEALLDTGADLWRRPVVTAHIEGQLVEVLLDTGAD DRPLVRVILTNTGstALLDSGADLEKRPTTIVLINDTPLNVLLDTGAD :
-----EEEEE-----EEHHHH----o•oo•••••o•ooo•oo•••oo••o
align pre-dicted andknownstructure(s)
Project known 3D structureonto 1D
Predict 1D structure from sequence
input:sequence
generatesequencealignment
predict 1Dstructure
-----EEEEE----EEEEEE-----oooo•o•o•o•ooooo•ooooo•oo
-----EEEEE----EEEEEE-----
oooo•o•o•o•ooooo•ooooo•oo
-----EEEEE-----EEHHHH----o•oo•••••o•ooo•oo•••oo••o
note: exposed = oburied = •
.
55
60
65
70
75
80
85
55
60
65
70
75
80
85
302520151050
Perc
enta
ge of
thre
e-st
ate p
airw
ise
per-
resi
due i
dent
ity (Q
3)
Percentage of pairwise sequence identity
55
60
65
70
75
80
85
55
60
65
70
75
80
85
302520151050Percentage of pairwise sequence identity
Perc
enta
ge of
two-
stat
e pai
rwis
epe
r-re
sidu
e ide
ntity
(Q2)
0
100
200
0 5 10 15 20 25 30N
umbe
r of p
airs
Percentage of pairwise sequence identity
Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Fold recognition without folds: AGAPE
1D predictionerrors correlate!
D Przybylski & B Rost 2004 J Mol Biol 341, 255-269Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Fold recognition without folds: AGAPE
1D predictionerrors correlate!
Fold recognition withoutbetter than with folds
D Przybylski & B Rost 2004 J Mol Biol 341, 255-269Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
AGAPE
Aligning GenerAlized ProfilEs
D Przybylski & B Rost 2004 J Mol Biol 341, 255-269Thursday June 9, 2011
© Burkhard Rost (TU Munich) /82
Announcements
Videos: SciVee www.rostlab.orgTHANKS : Tim Karl + Haitham Sohby NO lectures: ?LAST lecture: Jul 7Examen: Jul 12 (?), 10:30 (likely this room)
• Makeup: likely: October 13 - morning
CONTACT: Marlena Drabik [email protected]
82Thursday June 9, 2011