building the right multiple sequence alignment

34
yright Cédric Notredame (2000-2003) All rights reserved Building the Right Multiple Sequence Alignment.

Upload: cameo

Post on 19-Jan-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Building the Right Multiple Sequence Alignment. Recognizing The Right Sequences When you Meet Them…. Gathering Sequences: BLAST. Common Mistake: Sequences Too Closely Related. PRVA_MACFU SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEE - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Building the Right Multiple Sequence

Alignment.

Page 2: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Recognizing The Right Sequences When you Meet Them…

Page 3: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Gathering Sequences: BLAST

Page 4: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Common Mistake:Sequences Too Closely Related

PRVA_MACFU SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEEPRVA_HUMAN SMTDLLNAEDIKKAVGAFSATDSFDHKKFFQMVGLKKKSADDVKKVFHMLDKDKSGFIEEPRVA_GERSP SMTDLLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKTPDDVKKVFHILDKDKSGFIEEPRVA_MOUSE SMTDVLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKNPDEVKKVFHILDKDKSGFIEEPRVA_RAT SMTDLLSAEDIKKAIGAFTAADSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEEPRVA_RABIT AMTELLNAEDIKKAIGAFAAAESFDHKKFFQMVGLKKKSTEDVKKVFHILDKDKSGFIEE :**::*.*******:***:* :****************..::******:***********

PRVA_MACFU DELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAESPRVA_HUMAN DELGFILKGFSPDARDLSAKETKMLMAAGDKDGDGKIGVDEFSTLVAESPRVA_GERSP DELGFILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVSESPRVA_MOUSE DELGSILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAESPRVA_RAT DELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAESPRVA_RABIT EELGFILKGFSPDARDLSVKETKTLMAAGDKDGDGKIGADEFSTLVSES :*** ******.******.**** *:************.:******:**

-IDENTICAL SEQUENCES BRING NO INFORMATION FOR THE MULTIPLE SEQUENCE ALIGNMENT

-MULTIPLE SEQUENCE ALIGNMENTS THRIVE ON DIVERSITY…

Page 5: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Page 6: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Sequence Weighting Within ClustalW

Page 7: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Selecting Diverse Sequences (Opus II)

Page 8: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Respect Information!

This Alignment Is not Informative about the relation Betwwen TPCC MOUSE and the rest of the sequences.

PRVA_MACFU ------------------------------------------SMTDLLN----AEDIKKAPRVA_HUMAN ------------------------------------------SMTDLLN----AEDIKKAPRVA_GERSP ------------------------------------------SMTDLLS----AEDIKKAPRVA_MOUSE ------------------------------------------SMTDVLS----AEDIKKAPRVA_RAT ------------------------------------------SMTDLLS----AEDIKKAPRVA_RABIT ------------------------------------------AMTELLN----AEDIKKATPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM : :*. .*::::

PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFIPRVA_HUMAN VGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSGFIEEDELGFIPRVA_GERSP IGAFAAADS--FDHKKFFQMVG------LKKKTPDDVKKVFHILDKDKSGFIEEDELGFIPRVA_MOUSE IGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSGFIEEDELGSIPRVA_RAT IGAFTAADS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGSIPRVA_RABIT IGAFAAAES--FDHKKFFQMVG------LKKKSTEDVKKVFHILDKDKSGFIEEEELGFITPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM

-A better Spread of the Sequences is needed

Page 9: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Selecting Diverse Sequences (Opus II)

Page 10: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Selecting Diverse Sequences (Opus II)

PRVB_CYPCA -AFAGVLNDADIAAALEACKAADSFNHKAFFAKVGLTSKSADDVKKAFAIIDQDKSGFIEPRVB_BOACO -AFAGILSDADIAAGLQSCQAADSFSCKTFFAKSGLHSKSKDQLTKVFGVIDRDKSGYIEPRV1_SALSA MACAHLCKEADIKTALEACKAADTFSFKTFFHTIGFASKSADDVKKAFKVIDQDASGFIEPRVB_LATCH -AVAKLLAAADVTAALEGCKADDSFNHKVFFQKTGLAKKSNEELEAIFKILDQDKSGFIEPRVB_RANES -SITDIVSEKDIDAALESVKAAGSFNYKIFFQKVGLAGKSAADAKKVFEILDRDKSGFIEPRVA_MACFU -SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEPRVA_ESOLU --AKDLLKADDIKKALDAVKAEGSFNHKKFFALVGLKAMSANDVKKVFKAIDADASGFIE : *: .: . .* .:*. * ** *: * : * :* * **:**

PRVB_CYPCA EDELKLFLQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA-PRVB_BOACO EDELKKFLQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKGPRV1_SALSA VEELKLFLQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ-PRVB_LATCH DEELELFLQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA-PRVB_RANES QDELGLFLQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA-PRVA_MACFU EDELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAESPRVA_ESOLU EEELKFVLKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA :** .*:.* .* *: ** :: .* **** **::** **

-A REASONABLE Model Now Exists.

-Going Further:Remote Homologues.

Page 11: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Aligning Remote Homologues

PRVA_MACFU ------------------------------------------SMTDLLNA----EDIKKAPRVA_ESOLU -------------------------------------------AKDLLKA----DDIKKAPRVB_CYPCA ------------------------------------------AFAGVLND----ADIAAAPRVB_BOACO ------------------------------------------AFAGILSD----ADIAAGPRV1_SALSA -----------------------------------------MACAHLCKE----ADIKTAPRVB_LATCH ------------------------------------------AVAKLLAA----ADVTAAPRVB_RANES ------------------------------------------SITDIVSE----KDIDAATPCS_RABIT -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAITPCS_PIG -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAITPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM : ::

PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFIPRVA_ESOLU LDAVKAEGS--FNHKKFFALVG------LKAMSANDVKKVFKAIDADASGFIEEEELKFVPRVB_CYPCA LEACKAADS--FNHKAFFAKVG------LTSKSADDVKKAFAIIDQDKSGFIEEDELKLFPRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKFPRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLFPRVB_LATCH LEGCKADDS--FNHKVFFQKTG------LAKKSNEELEAIFKILDQDKSGFIEDEELELFPRVB_RANES LESVKAAGS--FNYKIFFQKVG------LAGKSAADAKKVFEILDRDKSGFIEQDELGLFTPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEITPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEITPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM : . .: .. . *: * : * :* : .*:*: :** .

PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES-PRVA_ESOLU LKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA-PRVB_CYPCA LQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA--PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG-PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ--PRVB_LATCH LQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA--PRVB_RANES LQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA--TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQTPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQTPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE :: .. :: : :: .* :.** *. :** ::

Page 12: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

SomeGuideline

s…

Page 13: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Do Not Use Two Many Sequences…

Page 14: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Reading Your Alignment

Page 15: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Page 16: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Going Further…

PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFIPRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKFPRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLFTPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEITPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEITPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMMTPC_PATYE SDEMDEEATGRLNCDAWIQLFER---KLKEDLDERELKEAFRVLDKEKKGVIKVDVLRWI . : .. . :: . : * :* : .* *. : * .

PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES--PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG--PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ---TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ-TPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQ-TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE-TPC_PATYE LS---SLGDELTEEEIENMIAETDTDGSGTVDYEEFKCLMMSSDA : . :: : :: * :..* :. :** ::

Page 17: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

WHAT MAKES A GOOD ALIGNMENT…

-THE MORE DIVERGEANT THE SEQUENCES, THE BETTER

-THE FEWER INDELS, THE BETTER

-NICE UNGAPPED BLOCKS SEPARATED WITH INDELS

-DIFFERENT CLASSES OF RESIDUES WITHIN A BLOCK:

•Completely Conserved•Conserved For Size and Hydropathy•Conserved For Size or Hydropathy

-THE ULTIMATE EVALUATION IS A MATTER OF PERSONNAL JUDGEMENT AND KNOWLEDGE.

Page 18: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Page 19: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Potential Difficulties

Page 20: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

DO NOT OVERTUNE!!!

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

DO NOT PLAY WITH PARAMETERS IF YOU KNOW THE ALIGNMENT YOU WANT: MAKE IT YOURSELF!

chite ---ADKPKRPL-SAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAP-SAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPR-SAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :*: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

Page 21: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

TUNING or NOT TUNING!!!

-MOST METHODS ARE TUNED FOR WORKING WELL ON AVERAGE

-PARAMETERS BEHAVIOUR DO NOT NECESSARILY FOLLOW THE THEORY (i.e. Substitution Matrices).

-A GOOD ALIGNMENT IS USUALLY ROBUST(i.e. Changes little).

-TUNE IF YOU WANT TO CONVINCE YOURSELF.

-PARAMETERS TO TUNE USUALLY INCLUDE:•GOP/ GEP•MATRIX•SENSITIVITY Vs SPEED

GOP

GEP

Substitution Matrices (Etzold and al. 1993)

Gonnet 61.7 %Blosum50 59.7 %

Pam250 59.2 %

Page 22: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Page 23: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

KEEP A BIOLOGICAL PERSPECTIVE

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AD--K----PKR-PLYMLWLNS-ARESIKRENPDFK-VT-EVAKKGGELWRGL- wheat -DPNK----PKRAP-FFVFMGE-FREEFKQKNPKNKSVA-AVGKAAGERWKSLStrybr -K--KDSNAPKR-AMT-MFFSSDFR-S-KH-S-DLS-IV-EMSKAAGAAWKELG mouse ----K----PKR-PRYNIYVSESFQEA-K--D-D-S-AQGKL-KLVNEAWKNLS * *** .:: ::... : * . . . : * . *: *

DIFFERENT PARAMETERS

WRONG ALIGNMENT !!!

Page 24: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

REPEATS

THERE IS A PROBLEM WHEN TWO SEQUENCES DO NOT CONTAIN THE SAME NUMBER OF REPEATS

IT IS THEN BETTER TO MANUALLY EXTRACT THE REPEATS AND TO ALIGN THEM. INDIVIDUAL REPEATS CAN BE RECOGNIZED USING DOTTER

Page 25: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Page 26: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Naming Your Sequences The Right Way

Page 27: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Choosing the right method

Page 28: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Situation Solution

Page 29: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Priority Solution

Method

Priority

Trees Profile 2D –Pred 3D-Pred Func-Pred

Accuracy

Speed

Page 30: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Purpose Solution

Page 31: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Conclusion

Page 32: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

-The BEST alignment Method: Your BrainThe Right Data

-Beware of repeated elements

Multiple Alignment

-The Best Evaluation Procedure:Experimental Data (SwissProt)

-Choosing The Sequences Well is Important

Page 33: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Know Your Problem: What do you want to do with your MSA

Multiple Alignment

Page 34: Building the Right Multiple Sequence Alignment

Copyright Cédric Notredame (2000-2003) All rights reserved

Addresses

MAFFT Progressive/iterative www.biophys.kyoto-u.jp/katoh

POA Progressive/Simultaneous www.bioinformatics.ucla.edu/poa

MUSCLE Progressive/Iterative www.drive5.com/muscle