phylogenetics who-tdr bioinformatics workshop · goals for this lecture •become familiar with...
TRANSCRIPT
1
PhylogeneticsWHO-TDR
Bioinformatics WorkshopJessica KissingerNew Delhi, India
October, 2005
Why do Phylogenetics?
• We make evolutionary assumptions in oureveryday research life. For example, we need adrug that will kill the parasite and not us. Thus,we need a target that is present in the parasite andnot us.
• We need a good model system, Which parasite(or host) is most closely related to P. falciparumor Humans?
2
Why Phylogenetics?
• This strain is resistant to drug and this oneis sensitive, what has changed?
• Where did this parasite come from? Has it“co-evolved” with humans? Did it enterthe human lineage from another source?
• Which other mosquitoes are likely to serveas a host for my parasite in nature?
Phylogenetics
• What is Phylogenetics?– Molecular Systematics
• The use of molecular data to infer the relationshipsof the host species e.g. using rRNA to build trees tolook at the relationship of the bacteria to theeukaryotes
– Molecular Evolution• Use trees to infer how a molecule, protein, or gene
has evolved (insertions, deletions, substitutions).
3
Gene Trees vs Species Trees
You Can Make Phylogenies ofMany Things:
• Amino acid sequences• Nucleotide sequences• RFLP data• Morphological data• “Paper fastening devices”
4
12
34
5 6 7 8 9 10 11
12 13 14 15
16 17 1819
20
21
Issues you had to deal with
1) Conflict - Size, color, material, shape2) Direction of change, e.g. red to green?3) Homology - these items have a similar function
but do they have a similar origin?4) Mixed materials - plastic coated metal5) How do you assign weight, are some traits more
important?6) Lots of possibilities
>8,2000,794,532,637,891,559,375 rooted trees!
5
Goals for this lecture
• Become familiar with concepts• Become familiar with vocabulary• Become familiar with the data analysis flow• Reach the point where you can read the
available literature on how to use thesemethods in greater detail
Assumptions made byPhylogenetic algorithms
• The sequences are correct• The sequence are homologous• Each position is homologous• The sampling of taxa or genes is sufficient to resolve the
problem of interest• Sequence variation is representative of the broader group
of interest• Sequence variation contains sufficient phylogenetic signal
(as opposed to noise) to resolve the problem of interest• Each position in the sequence evolved independently
6
Availability of Sequenced Genomes
AquifexThermodesulfobacterium
Thermotoga
Flavobacteria
Cyanobacteria
Proteobacteria Green nonsulfurbacteria
Gram+bacteria
SpirochetesEuryarcheota
Crenarcheota
AnimalsFungi
PlantsSlime molds
Flagellates
MicrosporidiaGiardia
Bacteria 74 Archaea 16 Eucarya 14
Courtesy of Igor Zhulin
Apicomplexans
Giardia lamblia Varimorpha necatrix
Trichomonas vaginallisTrichomonas foetus
Physarum polycephalum
Euglenoids
KinetoplastidsBodonids
Amoebamastigote
Dictyostelium discoideumEntamoebae histolytica
Entamoebae invadens
Naegleria gruberi
STRAMENOPILES
Cnidaria
EUBACTERIA
ALVEOLATESGREENPLANTS
ANIMALS
FUNGI
EUKARYOTES
PROTISTS
ARCHAEBACTERIA
Brown algae Chr
ysop
hyte
s
Diatom
s
Oom
ycete
sLa
barin
thul
idsCili
ates
Dinoflagelates
Red Algae
adapted fromSogin et al (1991)
7
Sandra Baldauf, Science June 2003
Circumsporozoite Phylogeny(molecular systematics, host relationships)
8
How to do an analysis
• Define a question• Select sequences appropriate to answer your
question (not all sequences are equally good!)• Make a multiple sequence alignment• Edit your alignment to make it better• Perform lots and lots of analyses• Perform Bootstrap analyses to test confidence
Multiple Sequence Alignment
9
Multiple Sequence Alignment
Study your Alignments!
10
A Word About Methods• There are two overall categories of methods
– Transformed distance methods (data are transformedinto a distance matrix). The matrix is used to build asingle tree. UPGMA and Neighbor-Joining areexamples of this method. They are computationallysimple and very fast.
– Optimality methods (tree generation is separate fromtree evaluation). Parsimony and Maximum-likelihoodmethods divorce the issue of tree generation fromevaluating how good a tree is. For parsimony, theremany be more than 1 “most parsimonious” or“shortest” tree found.
Distance methods• UPGMA
– Assume all lineagesevolve at the same rate
– Produces a root– Produces only one tree– Computationally very
fast– Trees are additive
• Neighbor-joining– Permits variation in
rates of evolution– Does not produce a
root– Produces only one tree– Computationally very
fast– Trees are additive
11
1 ATTGCTCAGA2 AATGCTCTGA3 ATAGGACTGA
1 vs 2 = 80% similar = 0.2 distance1 vs 3 = 60% similar = 0.4 distance2 vs 3 = 60% similar = 0.4 distance
0.1
0.1
0.1
0.2
1 2 3 123
0.10.2
1 2 31 - 0.2 0.42 - 0.43 -
Create a distance matrixCan use scoring schemes to transform data into distances(e.g. do transitions occur moreoften than transversions)
The implementation of theUPGMA algorithm toproduce the tree below. Anew matrix is calculatedat each iteration.
12
An unrooted Neighbor-joining tree of thesame dataset
Factors that Affect Phylogenetic Inference
1. Relative base frequencies (A,G,T,C)2. Transition/transversion ratio3. Number of substitutions per site4. Number of nucleotides (or amino acids) in sequence5. Different rates in different parts of the molecule6. Synonymous/non-synonymous substitution ratio7. Substitutions that are uninformative or obfuscatory
1. Parallel substitutions2. Convergent substitutions3. Back substitutions4. Coincidental substitutions
In general, the more factors that are accounted for by themodel (i.e., more parameters), the larger the error ofestimation. It is often best to use fewer parameters bychoosing the simpler model.
Models of evolution: choosing parameters
13
Some distance models: p-distance
• p = nd/n, where n is the number of sites(nucleotides or amino acids), and nd is thenumber of differences between the two sequencesexamined.• Very robust when divergence times are recentand the affect of complicating phenomena is minor
Some distance models: Jukes-Cantor
• Used to estimate the number ofsubstitutions per site
• The expected number of substitutionsper site is:
• d = 3αt = -(3/4)ln[1-(4/3)p], where pis the proportion of differencebetween 2 sequences
• Variance can be calculated• No assumptions are made about
nucleotide frequencies, or differentialsubstitution rates
A T C G
ATCG
-ααα
α-αα
αα-α
ααα-
14
Some distance models: Kimura two-parameter
• Used to estimate the number ofsubstitutions per site
• d = 2rt, where r is thesubstitution rate (per site, peryear) and t is the generation time;r = α + 2β, so:
• d = 2αt + 4βt• Accounts for different transition
and transversion rates• No assumptions are made about
nucleotide frequencies, varianceis greater than Jukes-Cantor
C T
A G
Pyrimidines
Purines
α
α
ββ ββ
α = transition rateβ = transversion rateThese are treated thesame for longdivergence times.
Other models
• Hasegawa, Kishino, Yano (HKY): corrects forunequal nucleotide frequencies and transition/transversion bias into account
• Unrestricted model: allows different rates betweenall pairs of nucleotides
• General Time Reversible model: allows differentrates between all pairs of nucleotides and correctsfor unequal nucleotide frequencies
• Many other models have been invented to correctfor specific problems
• The more parameters are introduced, the larger thevariance becomes
15
Optimality Methods
• All possible trees (or a heuristic samplingof trees) are generated and evaluatedaccording to Parsimony or Maximumlikelihood.
• Note: Tree generation is divorced from treeevaluation. More than one tree topologymay be optimal according to your criteria
General differences between optimality criteria
Works well with strongor weak sequencesimilarity
Works only when sequencesimilarity is high
Works well with strong orweak sequence similarity
Can estimate branchlengths with somedegree of accuracy
Cannot estimate branchlengths accurately
Can accurately estimatebranch lengths (importantfor molecular clocks)
Well understoodstatistical properties(easy to test)
Poorly understood statisticalproperties (hard to test)
Well understood statisticalproperties (easy to test)
Computationally slowComputationally fastComputationally fast
Can account for manytypes of sequencesubstitutions
Assumes that all substitutionsare equal
Can account for many typesof sequence substitutions
Model based“Model free”Model based
MaximumLikelihood
MaximumParsimony
Minimumevolution
16
Rooted Tree Unrooted Tree
A definiteBeginning andPolarity, a root
Rooted Tree Unrooted Tree
Terminal branches
Nodes
InternalBranches
Root
17
1 2 31 23 12 3
1
2
3
In the world of trees, there are more rooted topologies for a given Number of taxa than unrooted
Rooted
Unrooted
Possible trees as function ofnumber of Taxa
Taxa Rooted Trees Unrooted Trees3 3 14 15 35 105 15
10 34,459,425 2,027,025100 2 x 10 182
More trees than the number of atoms in the universe!
18
Tree search considerations
• Exhaustive searches are searches of allpossible trees for the number of Taxa inyour data set (15 Taxa or less)
• If you have more than 15 Taxa, thenheuristic methods must be employed inwhich you search a sample of all possibletrees. There are many algorithms for thegeneration of different populations of trees.
Tree search considerations
Strategy Type• Stepwise addition Algorithmic• Star decomposition Algorithmic• Exhaustive Exact• Branch & bound Exact• Branch swapping Heuristic• Genetic algorithm Heuristic• Markov Chain Monte Carlo Heuristic
19
Parsimony basics & scores• Based on shared derived characters
(synapomorphies)• Identical characters which evolve more than once
are “homoplasies”• Unique characters are “autapomorphies”• The score of the tree is the total of all the changes
needed to map the data. The scale bar is #ofchanges.
• Smaller, i.e. more parsimonious scores are better• More than one tree topology may have the same
score
An informative position is one that can favor one treeover another when some type of criteria are applied.
1 2 3 4 5 6 7 8 91 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G
* **
1 2 3423 4 11 2 3 4
1A
2G
3A
4G
1A
3A
2G
4G
1A
4G
2G
3A
2 1 2
Position #5 isInformative, itpermits us tochoose a shortertree from amongthe options. It prefers the treeof length 1 overthose of length 2
20
1 2 3 4 5 6 7 8 91 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G
* **
1G
2C
3A
4A
1G
3A
2C
4A
1G
4A
2C
3A
1G
2A
3G
4G
1G
3G
2A
4G
1G
4G
2A
3G
1A
2A
3A
4A
1A
3A
2A
4A
1A
4A
2A
3A
Pos 1
Pos 2
Pos 3
0 0 0
22 2
1 1 1
Not all alignment positions can help pick a better tree
None ofthesecharacterscan distinguishbetween thethreepossible unrootedtopologies.They areuninformative
Maximum Likelihood
• Is an optimality method, it is an algorithm whichevaluates trees according to some criterion
• The algorithm searches for trees which maximizethe probability of observing the data
• Trees are scored with Log likelihoods• This is the most computationally intensive
method available• More tractable versions include (puzzle)• Alternate approaches include Bayesian inference
(Mr. Bayes)
21
Not all methods can be usedwith all types of data
• Parsimony can be used with all types of data,nucleotide, protein, binary, morphological, mixeddata sets. States can be ordered.
• Distance can be used with nucleotide and proteindata but you need a model to generate distances
• Maximum likelihood, normally only nucleotidedata, but PAML can do protein maximumlikelihood (still a tricky and debatable approach).
• Bayesian - All types of sequence data
There are Many Types of Trees
• Cladogram vs. Phylogram– Cladograms have uniform branch lengths and only
represent relationships– Phylograms have lengths proportional to change or
distance• Rooted vs. Unrooted
– A defined origin as opposed to a network orrelationships (most tress are unrooted because they areeasier to calculate)
• Artistic license (slanted, rectangular, circle, “network”)
22
A Word about treesABCDEFG
A
B
CD
E
F
G
AB
CD
E
FG
A
B
CD
E
F
G
A word about trees(there are many types)
rayfinned fish
frogs
salamanders
turtles
crocodiles
birds
lizards
snakes
mammals
lungfishbirds
crocodiles
snakeslizards
mammals
frogssalamanders
rayfinned fish
lungfish
turtles
1 change
rayfinned fish
frogs
salamanders
turtles
crocodiles
birds
lizards
snakes
mammals
lungfish
1 change
Slanted Cladogram Rectangular Phylogram
UnrootedPhylogram
23
The Bootstrap
• The bootstrap is a method for assigning ameasure of confidence to a particular nodein tree.
• It is NOT a measure of the overall“goodness” of the tree.
• Rules of thumb: 70-100% = Good, 0-30%= bad, 30-70% = “gray zone” difficult tointerpret.
1 2 3 4 5 6 7 8 91 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G
8 8 3 2 1 4 6 5 9 1 C C G A A A T G A2 C C C G A C T G G3 C C A G A T T A A4 C C A G A G T A G
1STsample
Original DataEach column Represented once
2ND etc.
3RD etc.
2 9 6 2 1 3 4 8 7
6 3 3 1 6 5 7 4 9
100 or 1,000
The bootstrap process
Then build consensus of all trees produced by sample datasets. This provides support for nodes
24
A caution about alignmentscharacters in columns are homologous
stickman
Daffy
Donald
RoadRunner
TweetyBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
Donald
RoadRunner
TweetyBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
Donald
RoadRunner
TweetyBugs
Goofy
Mickey
Pluto
Wile E
1 change
stickman
Daffy
RoadRunner
Tweety
DonaldBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
RoadRunner
Tweety
DonaldBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
RoadRunner
Tweety
DonaldBugs
Goofy
Mickey
Pluto
Wile E
1 change
stickman
Daffy
RoadRunner
Tweety
DonaldBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
RoadRunner
Tweety
DonaldBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
RoadRunner
Tweety
DonaldBugs
Goofy
Mickey
Pluto
Wile E
1 change
stickman
Daffy
RoadRunner
Donald
TweetyBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
RoadRunner
Donald
TweetyBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
RoadRunner
Donald
TweetyBugs
Goofy
Mickey
Pluto
Wile E
1 change
stickman
Daffy
Donald
RoadRunner
TweetyBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
Donald
RoadRunner
TweetyBugs
Goofy
Mickey
Wile E
Pluto
1 change
stickman
Daffy
Donald
RoadRunner
TweetyBugs
Goofy
Mickey
Pluto
Wile E
1 change
15 equally Parsimonious treesOf Disney characters.All trees have the same,smallest score.
25
stickman
Daffy
RoadRunner
Tweety
Donald
Bugs
Goofy
Mickey
Wile E
Pluto
Strict
stickman
Daffy
RoadRunner
Tweety
Donald
Bugs
Goofy
Mickey
Wile E
Pluto
100
100
60
100
100
Majority rule
Comparison of real trees Assesment ofsupport
Bootstrap Example
Donald DuckDaffy DuckTweety bird
71 Donald Duck
Daffy DuckTweety bird
?
Donald Duck
Daffy DuckTweety bird
?
?
If 79% of the time this relationship holds, 29% it is something else
26
Some points to consider for the paper fasteners:
We decided, in our evolutionary model that material was so important that weneeded to give it extra weight, so we did (weight = 2).
Based on external information, such as the archeological record, we have learnedthat metal predates plastic, so, we ordered our characters: metal must precedeplastic.
We decided to use as an “outgroup”, an unbent piece of metal, (taxon 21) topolarize the direction of evolution within our tree, i.e. we have evolved from astraight piece of metal into a “paper fastening device”. We will not allow reversionto this “unbent” state.
We will enforce the assumptions/decisions made above by using a constraint tree.By using this constraint tree, we reduce the number of possible rooted trees from2.216431 x 1020 to 273,922,023,375 and we reduce the number of unrooted treesfrom 6.332660 x 1018 to 54,784,404,674 - a considerable savings!
We removed taxa 4 and 11 from the data set because they are non-homologous, i.e.the have a similar function but they do not share a common evolutionary descent orpath. What we have here is a case of convergent evolution, i.e. independent originsof a paper fastening solution!
27
Neighbor-joining analysis and bootstrap of clip dataset
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.15.14.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.13.4.11.5.6.21.
1 change
1.2.17.3.18.19.7.9.10.20.14.15.8.16.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.5.6.4.11.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.12.17.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.14.15.9.10.20.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.15.14.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.14.15.9.10.20.17.16.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.14.15.9.10.20.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.15.14.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.14.15.9.10.20.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.14.15.12.16.17.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.14.15.9.10.20.17.16.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.14.15.7.9.10.20.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.9.10.20.7.15.14.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.9.10.20.7.15.14.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.15.7.9.10.20.14.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.15.8.14.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.15.8.14.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.8.14.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.15.14.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.14.15.7.9.10.20.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.8.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.8.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.14.15.7.9.10.20.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.15.14.7.9.10.20.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.9.10.20.7.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.14.9.10.20.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.8.15.9.10.20.14.16.17.12.4.11.5.6.13.21.
1 change
1.2.17.3.18.19.7.9.10.20.14.15.8.16.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.13.4.11.5.6.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.13.4.11.5.6.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.6.5.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.13.4.11.5.6.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.6.5.17.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.6.5.17.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.5.6.17.12.4.11.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.5.6.17.12.4.11.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.12.3.18.19.7.9.10.20.14.15.8.16.17.4.11.13.5.6.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.13.12.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.
1 change
1.2.3.18.19.7.8.9.10.20.14.15.4.11.5.6.12.16.17.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.8.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.9.10.20.7.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.8.15.16.17.12.4.11.5.6.13.21.
1 change
1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.
1 change
Some of the >37,500Trees generated by a Parsimony analysisof the clip dataset
28
Consensus of 5,000 parsimony Trees Bootstrap of clips
Software and Books
• “How to make a phylogenetic Tree” by BarryHall, comes with PAUP* CD, ~$30, Sinauer Press
• Phylip - Joe Felsenstein, Free via internet• PAML - Free via internet• Mr. Bayes - Free via internet• ClustalW or ClustalX - Free via internet• Fundamentals of molecular evolution, Second
edition, Wen-Hsiung Li, Sinauer Press
* Best on a MAC, but also command line
29
Giving Credit
• Several slides in this presentation wereprovided by Mike Thomas, via apresentation he posted on the internet in2002.