workshop tips to align dna sequences and build phylogenetic trees

Upload: qcbsannie

Post on 03-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    1/29

    Tips to align DNA sequencesbuild phylogenetic treesA QCBS workshop - Annie ArchambaultJuly 2013

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    2/29

    Intro

    Level and content: practical How to get sequences,

    To align

    To build a tree

    Your expectations

    Tomorrow: Your turn

    Participants

    Jorge Ramirez

    Pedram Samani

    Genevieve Guay

    Annie Archamba

    Nomie Blanchet

    Ariane Pelletier

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    3/29

    Get sequences In GenBank: The database

    Use the Nucleotide , the Gene , or the Populatio

    https://www.ncbi.nlm.nih.gov/nuccore/advanced

    https://www.ncbi.nlm.nih.gov/popset/

    https://www.ncbi.nlm.nih.gov/nuccore/advancedhttps://www.ncbi.nlm.nih.gov/popset/https://www.ncbi.nlm.nih.gov/popset/https://www.ncbi.nlm.nih.gov/nuccore/advanced
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    4/29

    Get the sequences Use the Blast similarity search

    http://blast.ncbi.nlm.nih.gov/Blast.cgi

    Use automation if many similar searches

    e.g. http://qcbs.ca/wiki/commandline_remote_blas

    http://blast.ncbi.nlm.nih.gov/Blast.cgihttp://qcbs.ca/wiki/commandline_remote_blasthttp://qcbs.ca/wiki/commandline_remote_blasthttp://blast.ncbi.nlm.nih.gov/Blast.cgi
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    5/29

    Get the sequences

    Alternative : In reference databases (Genbank is not error-freeorganisms

    SILVA http://www.arb-silva.de : aligned ribosomal RNA (16S/1(23S/28S, LSU) gene sequences from the Bacteria,Archaea anEukaryota

    RDP http://rdp.cme.msu.edu/ The Ribosomal Database ProjecBacterial and Archaeal 16S rRNA seq

    Greengenes http://greengenes.lbl.gov 16S rRNA gene sequen

    alignment UNITE http://unite.ut.ee/ reference records of ITS sequences f

    Ectomycorrhizal (ECM) fungi

    Barcode of Life Data System (BOLD) http://www.boldsystems.animal mitochondrial cytochrome c oxidase I (COI)

    Protist Ribosomal Reference Database http://ssu-rrna.org/, uneukaryotes Small SubUnit rRNA (18S)

    http://www.arb-silva.de/http://rdp.cme.msu.edu/http://greengenes.lbl.gov/http://unite.ut.ee/http://www.boldsystems.org/http://ssu-rrna.org/http://ssu-rrna.org/http://ssu-rrna.org/http://ssu-rrna.org/http://www.boldsystems.org/http://unite.ut.ee/http://greengenes.lbl.gov/http://rdp.cme.msu.edu/http://www.arb-silva.de/http://www.arb-silva.de/http://www.arb-silva.de/
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    6/29

    Practical steps for alignment an

    Align sequences with automated programs

    Test a few algorithms

    Adjust alignment by eye - Controversial

    Trim (or exclude) ends, and regions with aligment you don

    Identify the best-fit model for your data

    Build trees (typically ML or Bayesian)

    Share data, codes and models

    Account for uncertainty

    Branch support (Bootstrap or Bayesian trees)

    Inferring evolutionary forces (e.g. positive selection)

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    7/29

    >Protea_witzenbergiana Some description here

    CGCGAGAAGTCCACTGAACCTTATCATTTAGAGGAA

    TTCCGTAGGTGAACCTGCGGAAGGATCATTGTCGAT

    CCCGCGAACACGTCGAACGGTGACC

    >Protea_wentzeliana Different description hereCGCGAGAAGTCCACTGAACCTTATCATTTAGAGGAA

    TTCCGTAGGTGAACCTGCGGAAGGATCATTGTCGAT

    CCCGCGAACACGTCGAACGGTGACC

    >Protea_vogtsiae Other heree

    CGCGAGAAGTCCACTGAACCTTATCATTTAGAGGAA

    TTCCGTAGGTGAACCTGCGGAAGGATCATTGTCGAT

    CCCGCGAACACGTCGAACGGTGACCGGGGGGCGA

    >Protea_witzenbergiana Some description here

    ----C-------GCGA--------GAAGTCCACTGAACCTTATCATTTAGAGGAAGGAGA

    TAGGTGAACCTGCGGAAGGATCATTGTCGATGCCTG

    GAACACGTC-G-AACGGT-GACC-

    >Protea_wentzeliana Different description here

    ----C-------GCGA--------

    GAAGTCCACTGAACCTTATCATTTAGAGGAAGGAGA

    TAGGTGAACCTGCGGAAGGATCATTGTCGATGCCTG

    GAACACGTC-G-AACGGT-GACC-

    >Protea_vogtsiae Other heree

    ----C-------GCGA--------

    GAAGTCCACTGAACCTTATCATTTAGAGGAAGGAGA

    TAGGTGAACCTGCGGAAGGATCATTGTCGATGCCTG

    GAACACGTC-G-AACGGT-GACC-GGGG-G-G-CGA-G-TG----------

    Fasta format

    All your sequences into one file, in

    the fasta format A text file

    Greater than sign

    Sequence name + Description

    Return

    The sequence (with - or not)

    Return

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    8/29

    Lets align!

    A plethora of alignment algorithms

    Very different calculation methods

    Will try only five (listed on the wiki):

    Clustal

    Muscle

    PRANK

    SATe

    FAST

    JalView, SuiteMSA, BioEdit (viewer)

    On 3 datasets

    PR10_fabaceae_1

    ITS_oxytropis_84se

    Fungal_refseq_ITS_

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    9/29

    Why care about alignment?

    To have a reliable tree!

    Because your matrix will be public

    http://treebase.org

    http://datadryad.org/ 80$

    http://treebase.org/http://datadryad.org/http://datadryad.org/http://treebase.org/http://treebase.org/
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    10/29

    Lets align!

    Questions to ponder

    Would you trust aalignments?

    Does one algorithoutperform all thcircumstances?

    Which algorithm likely to use for yo

    projects?

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    11/29

    Lets align!

    Clustal Muscle Prank SAT

    AdvantageEasy, Usedeverywhere

    Fast distanceestimation,progressivealignment,refinement byrestricted

    partitioning.

    Corrects forinsertions anddeletions. Goodfor codons

    Co-estimatealignments andtrees. Runsrelatively fast.Divide-and-conquer

    realignment.

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    12/29

    Lets align!Clustal Muscle Prank SAT

    PR10_fabaceae_11seq = With longgaps + CDS

    8240 bp. Does notlocate the cdsregions

    8560 bp. Foundsimilar cds regionof a divergentgene

    9470 bp. Lowconfidence in cdsregion of adivergent gene.

    8370. Found thesimilar cds regionof the divergentgene.

    ITS_oxytropis_84se

    q = highlyconserved good goog good good

    Fungal_refseq_ITS_301seq = Highlydiverging

    1070 pb.Is not careful inaligning lowsimilarity areas.

    1850 bp Alignstogether stretcheswith low similarity.Finds conservedregions in themiddle.

    21630 bp. Findsconserved regionin the middle.

    1420 bp. Alignstogether stretchewith low similarityFinds conservedregions in themiddle.

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    13/29

    Lets align!

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    14/29

    Phylogenetic trees - Basics

    Branch = edge: a lineage through time

    Node: branching of a lineage into two. Byspeciation or gene duplication

    Internal node

    Leaf (terminal node) = Tip (OUT)

    Branch length: Typically nb substitution/site ; isoften not constant;

    Outgroup: used to find the root

    Topology: The branching pattern of the tree

    Terminology

    It is a drawing = can be re-rooted, branches swapped

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    15/29

    Phylogenetic trees - Basics

    Useful in :

    Gene duplication events Recombination or horizontal gene transfer

    Variation of selective pressures andadaptive evolution

    Divergence times between species

    Origin of epidemics

    Host-parasite cospeciation events

    Genealogies of somatic cells in cancer

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    16/29

    Phylogenetic trees Yang, Z., and B. Rannala. 2012. Molecular phylogenetics: principles a

    Nature Reviews Genetics 13: 303314.

    Hall, Berry G. Phylogenetic Trees Made Easy: A How-To Manual, Third

    Aris-Brosou, S., and X. Xia. 2008. Phylogenetic Analyses: A Toolbox Exptowards Bayesian Methods. International Journal of Plant Genomics

    Roquet, C., W. Thuiller, and S. Lavergne. 2013. Building megaphyloge

    macroecology: taking up the challenge. Ecography 36: 013026.

    http://treethinkers.org/Workshops in applied phylogenetics

    http://www.molecularevolution.org/ Software description and glossa

    http://www.nature.com/nrg/journal/v13/n5/full/nrg3186.htmlhttp://www.hindawi.com/journals/ijpg/2008/683509/http://www.hindawi.com/journals/ijpg/2008/683509/http://onlinelibrary.wiley.com/doi/10.1111/j.1600-0587.2012.07773.x/abstracthttp://onlinelibrary.wiley.com/doi/10.1111/j.1600-0587.2012.07773.x/abstracthttp://treethinkers.org/http://www.molecularevolution.org/http://www.molecularevolution.org/http://treethinkers.org/http://onlinelibrary.wiley.com/doi/10.1111/j.1600-0587.2012.07773.x/abstracthttp://onlinelibrary.wiley.com/doi/10.1111/j.1600-0587.2012.07773.x/abstracthttp://www.hindawi.com/journals/ijpg/2008/683509/http://www.hindawi.com/journals/ijpg/2008/683509/http://www.nature.com/nrg/journal/v13/n5/full/nrg3186.html
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    17/29

    Building the treeDistance Parsimony Max. Likelihood Bayesi

    Based on Distance matrix Informativecharacters

    All characters All cha

    Approach Clustering Which treeexplains the datawith leastevolutionarychanges?

    What tree and valuesgive the highestlikelihood to thisalignment?

    What isprobabdistribubased data?

    Score for

    choosingtrees

    Steps: minimum

    number ofchanges

    Log likelihood. A

    relative number,cannot compareacross alignments

    Posterio

    that thcorrecunders

    Yang, Z., and B. Rannala. 2012. Molecular phylogenetics: principles and pracNature Reviews Genetics 13: 303314.

    Hall, Berry G. Phylogenetic Trees Made Easy: A How-To Manual, Third Edition

    http://www.nature.com/nrg/journal/v13/n5/full/nrg3186.htmlhttp://www.nature.com/nrg/journal/v13/n5/full/nrg3186.html
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    18/29

    Building the treeDistance Parsimony Max. Likelihood Bayesi

    When not touse

    Withnumerouslong gaps

    With highly divergentsequences

    With a model not fitto your data

    When yPriors aappropdata

    Strength Quick Efficient, andgenerally reliable

    Consistent, efficient.Can be used for testsof evolution

    ConsistCan btests of

    Weaknesses Sensitive to

    gap and todivergentsequences

    Calculation cannot

    be improved,because no model ofsequence evolution

    Computationally

    demanding.Depends on modelof evolution.

    Comp

    demanPosterioprobabbe hig

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    19/29

    Building the treeDistance Parsimony Max. Likelihood Bayesi

    Programs -commercial PAUP*

    Programs -free

    MEGA MEGA5, TNT MEGA5, RAxML,GARLi,PAML, Hyphy

    BEAST, PhycasBayesPBUCKyPhy

    Kumar 1994 (cited 2459 times); 2001

    (cited 6481 tiems); 2004 (11433 tiems);2007 (cited 19499 times); 2011 (cited6226 times

    RAxML, Stamatakis

    2006 . Cited 3536times

    GARLi, Zwickl 2006Cited 1574 times

    BEAST,

    2007 C

    MrBaye2003 Ctimes; Cited 2

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    20/29

    Distance

    HugeFlo BigFl MediumFl

    HugeFl

    BigFl 2/12

    MediumFl 3/12 3/12

    SmallFl 412 4/12 3/12

    TinyFl 5/12 5/12 4/12

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    21/29

    Parsimony

    Only informative ch

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    22/29

    Max. likelihood; Bayesian; distance

    Rates of substitution bnucleotides

    More complex: 6

    Rate variation among

    e.g. codon positio gamma

    Proportion of invarian

    Model of sequence evolution Includes

    Which model fits your data?

    jModelTest

    MEGA5

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    23/29

    Newick format

    tree ML_tree = [&U](SmallFlower:0.0,TinyFlower:0.0937:0.086766,(HugeFlower:0.08978,Big):0.118638):0.228517);

    tree ML_tree = [&U](SmallFlower:0.0,TinyFlower:0.0937:0.086766,(HugeFlower:0.08978,Big)[&"bootstrapproportion"=78.0"]:0.118638)[&"bootproportion"=87.0"]:0.228517);

    The tree fileNewick forma

    Branc

    Branc

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    24/29

    Build trees, general steps

    Test which model of sequence evolution fits

    your data

    Choose method

    Distance : MEGA5 - Quick, notpublications

    Parsimony : MEGA5 - Not very popular

    Max. likelihood : GARli - Efficient

    Max. likelihood : RAxMLVery largedatasets, e.g thousands taxa + hundredsgenes. Use CAT.

    Bayesian: BEAST2.0To answer questionswith a range of probable trees

    At your com

    Download 2ailgnment htcontent/uploads/20TS.txt ; http://qcbs.cacontent/uploads/20trnL.txt

    Analyze withRAxML on yo

    Use BEAST

    http://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_ITS.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_ITS.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_ITS.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_ITS.txthttp://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_ITS.txt
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    25/29

    BEAST 2.0

    Start from one of the tutorial

    Know the model of substitution (e.g.from jModelTest)

    Read carefully the BEAST FAQ from theWiki

    For any problem, search and browsethrough the BEAST forum first

    http://www.beast2.org/wiki/index.php/FAQhttps://groups.google.com/forum/https://groups.google.com/forum/http://www.beast2.org/wiki/index.php/FAQ
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    26/29

    Branch support

    From a large sets of tree (many hundreds) Bootstrap replicates

    Bayesian analysis

    Compute consensus tree

    Strict = 100% of the trees

    Majority = a % set by the user

    Report on your best tree (e.g. dendropy)

    http://pythonhosted.org/DendroPy/scripts/sumtrees.htmlhttp://pythonhosted.org/DendroPy/scripts/sumtrees.html
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    27/29

    Why so many programs?

    Umbrella programs (commercial)

    Geneious 400$

    G l d ti

  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    28/29

    General recommendations

    Anisimova, et al. 2013. State-of the art methodologies dictatstandards for phylogenetic analysis. BMC Evolutionary Biolog

    161.

    Write a question that the analysis will answer

    Justify the choice of methods, test alternatives

    Account for uncertainty (branch support, confidence int

    Share the data

    http://www.biomedcentral.com/1471-2148/13/161/abstracthttp://www.biomedcentral.com/1471-2148/13/161/abstracthttp://www.biomedcentral.com/1471-2148/13/161/abstracthttp://www.biomedcentral.com/1471-2148/13/161/abstracthttp://www.biomedcentral.com/1471-2148/13/161/abstracthttp://www.biomedcentral.com/1471-2148/13/161/abstract
  • 7/28/2019 Workshop Tips to align DNA sequences and build phylogenetic trees

    29/29

    End!