bcb 444/544
DESCRIPTION
BCB 444/544. Lecture 29 Phylogenetics #29_Oct31. Required Reading ( before lecture). Mon Oct 29 - Lecture 28 Promoter & Regulatory Element Prediction Chp 9 - pp 113 - 126 Wed Oct 30 - Lecture 29 Phylogenetics Basics Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 - PowerPoint PPT PresentationTRANSCRIPT
1BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
BCB 444/544
Lecture 29
Phylogenetics
#29_Oct31
2BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Mon Oct 29 - Lecture 28
Promoter & Regulatory Element Prediction
• Chp 9 - pp 113 - 126
Wed Oct 30 - Lecture 29
Phylogenetics Basics
• Chp 10 - pp 127 - 141
Thurs Oct 31 - Lab 9
Gene & Regulatory Element Prediction
Fri Oct 30 - Lecture 29
Phylogenetic Tree Construction Methods & Programs
• Chp 11 - pp 142 - 169
Required Reading (before lecture)
3BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Assignments & Announcements
Mon Oct 29 - HW#5
HW#5 = Hands-on exercises with phylogenetics and tree-building software
Due: Mon Nov 5 (not Fri Nov 1 as previously posted)
4BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
BCB 544 "Team" Projects
Last week of classes will be devoted to Projects
• Written reports due: • Mon Dec 3 (no class that day)
• Oral presentations (20-30') will be: • Wed-Fri Dec 5,6,7
• 1 or 2 teams will present during each class period
See Guidelines for Projects posted online
5BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
BCB 544 Only: New Homework Assignment
544 Extra#2
Due: √PART 1 - ASAP
PART 2 - meeting prior to 5 PM Fri Nov 2
Part 1 - Brief outline of Project, email to Drena & Michael
after response/approval, then:
Part 2 - More detailed outline of project
Read a few papers and summarize status of problem
Schedule meeting with Drena & Michael to discuss
ideas
6BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Seminars this Week
BCB List of URLs for Seminars related to Bioinformatics:http://www.bcb.iastate.edu/seminars/index.html
• Nov 1 Thurs - BBMB Seminar 4:10 in 1414 MBB
• Todd Yeates UCLA TBA -something cool about structure and evolution?
• Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI
• Bob Jernigan BBMB, ISU
•Control of Protein Motions by Structure
7BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Chp 10 - Phylogenetics
SECTION IV MOLECULAR PHYLOGENETICS
Xiong: Chp 10 Phylogenetics Basics
• Evolution and Phylogenetics• Terminology• Gene Phylogeny vs. Species Phylogeny• Forms of Tree Representation• Why Finding a True Tree is Dificult• Procedure of Building a Phylogenetic Tree
8BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Evolution and Phylogenetics
• Evolution – the development of biological
form from other preexisting forms
• Evolution proceeds by natural selection
9BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Natural Selection
• Species can produce more offspring than
the environment can support. This leads
to competition for resources. Genetic
variations exist in a population that give
some individuals an advantage, others a
disadvantage, leading to differential
reproductive success.
10BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Phylogenetics
• Phylogenetics is the study of the evolutionary history of living organisms• Uses tree like diagrams to represent the
pedigrees of the organisms
• Similarities and differences seen in a
multiple sequence alignment are easier to
make sense of in a phylogenetic tree
11BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Data Used in Phylogenetics
• Fossil records - morphology and timeline of divergence• Limitations - not available for all species in all areas,
morphology determined by multiple genetic factors, fossils for microorganisms are especially rare
• Molecular data - DNA and protein sequences - molecular fossils• Advantages - lots of data, easy to obtain• Limitations - can be difficult to get sequences from
extinct species
• Physical, behavior, and developmental characteristics can also be used in phylogenetics
12BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Molecular Phylogenetics
• Molecular phylogenetics is the study of evolutionary relationships of genes and other biological macromolecules by analyzing their sequences• Sequence similarity can be used to infer evolutionary relationships
13BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Assumptions in Molecular Phylogenetics
• Sequences used are homologous, i.e. share a common ancestor• Phylogenetic divergence is bifurcating, i.e. parent branch splits into two daughter branches• Each position in a sequence evolved independently• Molecular Clock – sequences evolve at constant rates (only used in some methods)
14BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Terminology
A B C D E F G H
Root
BranchInternal node
Taxa (terminal nodes)
15BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Terminology
• Clade = group of taxa descended from a common ancestor• Lineage = branch path depicting ancestor-
descendant relationship• Paraphyletic group = group of taxa that share
more than one closest common ancestor
A B C D E F G H
16BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Tree Topology
• Tree topology is the branching pattern in a tree
DichotomyBifurcation Polytomy
Multifurcation
17BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Rooted vs. Unrooted Trees
A
DCBA
B
C
D
Rooted Tree
Unrooted Tree
18BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Rooted vs. Unrooted Trees
• Unrooted trees have no root node – do not assume knowledge of a common ancestor, just relationships• Can convert between unrooted and rooted, but
first need to determine where the root is• Two ways to define the root:• Use an outgroup• Midpoint rooting – midpoint of the two most divergent
groups is assigned to be the root
19BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Outgroups
• Outgroup is a sequence related to the sequences being studied, but is more distantly related• Must be distinct from the ingroup, but not too
distant• If outgroup is too distantly related, it can lead
to errors in tree construction• Trick is to find the closest related sequence
that is removed from the ingroup
20BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Gene Phylogeny vs. Species Phylogeny
• When using molecular data, we are technically building a phylogeny for just that sequence, not for the species from which the sequences came• Species evolution is the result of mutations in
the entire genome• Your gene may have evolved differently than
other genes in the genome• To obtain a species phylogeny, we need to use
a variety of gene families to construct the tree
21BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Forms of Tree Representation
PhylogramBranch lengths represent amount of evolutionary divergence
CladogramBranch lengths are meaningless, only topology matters
22BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Forms of Tree Representation
• Newick format – text format for use by computer programs• Example: (((B,C),A),(D,E))• Can also have branch lengths
23BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Consensus Trees
Multiple trees that are equally optimal – build consensus tree by collapsing disagreements into a single node
24BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Why Finding a True Tree is Difficult
• The number of possible trees grows exponentially with the number of species (or sequences)
• Nr = (2n -3)!/2(n-2)(n-2)!
• Nu = (2n -5)!/2(n-3)(n-3)!
• To find the best tree, you must explore all possibilities (or must you?)
Number of rooted trees
25BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Tree Building Procedure
• Choose molecular markers• Perform MSA• Choose a model of evolution•Determine tree building method• Assess tree reliability
26BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Choice of Molecular Markers
• Very closely related organisms - nucleic acid sequence will show more differences• For individuals within a species - faster
mutation rate is in noncoding regions of mtDNA• More distantly related species - slowly
evolving nucleic acid sequences like ribosomal RNA or protein sequences• Very distantly related species - use highly
conserved protein sequences
27BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Advantages of Protein Sequences
• More highly conserved - mutations in DNA may not change amino acid sequence
• Third position in a codon especially can vary - violates our assumption of independent evolution of all positions in a sequence
• DNA sequences can be biased by codon usage differences between species - causes variations in sequence that are not attributable to evolution
• In alignments, DNA sequences that are not related can show a lot of similarity due to only 4 letters in alphabet, proteins do not have this problem (at least not as much)
• Introducing gaps in alignments of DNA sequences can cause frameshift errors, making alignment biologically meaningless
28BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Advantages of DNA Sequences
• Better for closely related species• Show synonymous and non-synonymous
mutations, which allows analysis of positive and negative selection events• Lots of nonsynonymous mutations may mean
positive selection for new functions of protein with different amino acid sequence• Lots of synonymous mutations may mean
negative selection - changed amino acid sequence is detrimental
29BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Multiple Sequence Alignment
• Most critical step in tree building - cannot build correct tree without correct alignment• Should build alignments with multiple
programs, then inspect and compare to identify the most reasonable one• Most alignments need manual editing• Make sure important functional residues
align• Align secondary structure elements• Use full alignment or just parts
30BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07
Automatic Editing of Alignments
• Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences• Gblocks – detect and eliminate poorly
aligned positions and divergent regions