bcb 444/544

30
1 BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 BCB 444/544 Lecture 29 Phylogenetics #29_Oct31

Upload: kitra

Post on 22-Jan-2016

60 views

Category:

Documents


1 download

DESCRIPTION

BCB 444/544. Lecture 29 Phylogenetics #29_Oct31. Required Reading ( before lecture). Mon Oct 29 - Lecture 28 Promoter & Regulatory Element Prediction Chp 9 - pp 113 - 126 Wed Oct 30 - Lecture 29 Phylogenetics Basics Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BCB 444/544

1BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

BCB 444/544

Lecture 29

Phylogenetics

#29_Oct31

Page 2: BCB 444/544

2BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Mon Oct 29 - Lecture 28

Promoter & Regulatory Element Prediction

• Chp 9 - pp 113 - 126

Wed Oct 30 - Lecture 29

Phylogenetics Basics

• Chp 10 - pp 127 - 141

Thurs Oct 31 - Lab 9

Gene & Regulatory Element Prediction

Fri Oct 30 - Lecture 29

Phylogenetic Tree Construction Methods & Programs

• Chp 11 - pp 142 - 169

Required Reading (before lecture)

Page 3: BCB 444/544

3BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Assignments & Announcements

Mon Oct 29 - HW#5

HW#5 = Hands-on exercises with phylogenetics and tree-building software

Due: Mon Nov 5 (not Fri Nov 1 as previously posted)

Page 4: BCB 444/544

4BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

BCB 544 "Team" Projects

Last week of classes will be devoted to Projects

• Written reports due: • Mon Dec 3 (no class that day)

• Oral presentations (20-30') will be: • Wed-Fri Dec 5,6,7

• 1 or 2 teams will present during each class period

See Guidelines for Projects posted online

Page 5: BCB 444/544

5BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

BCB 544 Only: New Homework Assignment

544 Extra#2

Due: √PART 1 - ASAP

PART 2 - meeting prior to 5 PM Fri Nov 2

Part 1 - Brief outline of Project, email to Drena & Michael

after response/approval, then:

Part 2 - More detailed outline of project

Read a few papers and summarize status of problem

Schedule meeting with Drena & Michael to discuss

ideas

Page 6: BCB 444/544

6BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Seminars this Week

BCB List of URLs for Seminars related to Bioinformatics:http://www.bcb.iastate.edu/seminars/index.html

• Nov 1 Thurs - BBMB Seminar 4:10 in 1414 MBB

• Todd Yeates UCLA TBA -something cool about structure and evolution?

• Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI

• Bob Jernigan BBMB, ISU

•Control of Protein Motions by Structure

Page 7: BCB 444/544

7BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Chp 10 - Phylogenetics

SECTION IV MOLECULAR PHYLOGENETICS

Xiong: Chp 10 Phylogenetics Basics

• Evolution and Phylogenetics• Terminology• Gene Phylogeny vs. Species Phylogeny• Forms of Tree Representation• Why Finding a True Tree is Dificult• Procedure of Building a Phylogenetic Tree

Page 8: BCB 444/544

8BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Evolution and Phylogenetics

• Evolution – the development of biological

form from other preexisting forms

• Evolution proceeds by natural selection

Page 9: BCB 444/544

9BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Natural Selection

• Species can produce more offspring than

the environment can support. This leads

to competition for resources. Genetic

variations exist in a population that give

some individuals an advantage, others a

disadvantage, leading to differential

reproductive success.

Page 10: BCB 444/544

10BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Phylogenetics

• Phylogenetics is the study of the evolutionary history of living organisms• Uses tree like diagrams to represent the

pedigrees of the organisms

• Similarities and differences seen in a

multiple sequence alignment are easier to

make sense of in a phylogenetic tree

Page 11: BCB 444/544

11BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Data Used in Phylogenetics

• Fossil records - morphology and timeline of divergence• Limitations - not available for all species in all areas,

morphology determined by multiple genetic factors, fossils for microorganisms are especially rare

• Molecular data - DNA and protein sequences - molecular fossils• Advantages - lots of data, easy to obtain• Limitations - can be difficult to get sequences from

extinct species

• Physical, behavior, and developmental characteristics can also be used in phylogenetics

Page 12: BCB 444/544

12BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Molecular Phylogenetics

• Molecular phylogenetics is the study of evolutionary relationships of genes and other biological macromolecules by analyzing their sequences• Sequence similarity can be used to infer evolutionary relationships

Page 13: BCB 444/544

13BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Assumptions in Molecular Phylogenetics

• Sequences used are homologous, i.e. share a common ancestor• Phylogenetic divergence is bifurcating, i.e. parent branch splits into two daughter branches• Each position in a sequence evolved independently• Molecular Clock – sequences evolve at constant rates (only used in some methods)

Page 14: BCB 444/544

14BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Terminology

A B C D E F G H

Root

BranchInternal node

Taxa (terminal nodes)

Page 15: BCB 444/544

15BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Terminology

• Clade = group of taxa descended from a common ancestor• Lineage = branch path depicting ancestor-

descendant relationship• Paraphyletic group = group of taxa that share

more than one closest common ancestor

A B C D E F G H

Page 16: BCB 444/544

16BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Tree Topology

• Tree topology is the branching pattern in a tree

DichotomyBifurcation Polytomy

Multifurcation

Page 17: BCB 444/544

17BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Rooted vs. Unrooted Trees

A

DCBA

B

C

D

Rooted Tree

Unrooted Tree

Page 18: BCB 444/544

18BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Rooted vs. Unrooted Trees

• Unrooted trees have no root node – do not assume knowledge of a common ancestor, just relationships• Can convert between unrooted and rooted, but

first need to determine where the root is• Two ways to define the root:• Use an outgroup• Midpoint rooting – midpoint of the two most divergent

groups is assigned to be the root

Page 19: BCB 444/544

19BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Outgroups

• Outgroup is a sequence related to the sequences being studied, but is more distantly related• Must be distinct from the ingroup, but not too

distant• If outgroup is too distantly related, it can lead

to errors in tree construction• Trick is to find the closest related sequence

that is removed from the ingroup

Page 20: BCB 444/544

20BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Gene Phylogeny vs. Species Phylogeny

• When using molecular data, we are technically building a phylogeny for just that sequence, not for the species from which the sequences came• Species evolution is the result of mutations in

the entire genome• Your gene may have evolved differently than

other genes in the genome• To obtain a species phylogeny, we need to use

a variety of gene families to construct the tree

Page 21: BCB 444/544

21BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Forms of Tree Representation

PhylogramBranch lengths represent amount of evolutionary divergence

CladogramBranch lengths are meaningless, only topology matters

Page 22: BCB 444/544

22BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Forms of Tree Representation

• Newick format – text format for use by computer programs• Example: (((B,C),A),(D,E))• Can also have branch lengths

Page 23: BCB 444/544

23BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Consensus Trees

Multiple trees that are equally optimal – build consensus tree by collapsing disagreements into a single node

Page 24: BCB 444/544

24BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Why Finding a True Tree is Difficult

• The number of possible trees grows exponentially with the number of species (or sequences)

• Nr = (2n -3)!/2(n-2)(n-2)!

• Nu = (2n -5)!/2(n-3)(n-3)!

• To find the best tree, you must explore all possibilities (or must you?)

Number of rooted trees

Page 25: BCB 444/544

25BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Tree Building Procedure

• Choose molecular markers• Perform MSA• Choose a model of evolution•Determine tree building method• Assess tree reliability

Page 26: BCB 444/544

26BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Choice of Molecular Markers

• Very closely related organisms - nucleic acid sequence will show more differences• For individuals within a species - faster

mutation rate is in noncoding regions of mtDNA• More distantly related species - slowly

evolving nucleic acid sequences like ribosomal RNA or protein sequences• Very distantly related species - use highly

conserved protein sequences

Page 27: BCB 444/544

27BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Advantages of Protein Sequences

• More highly conserved - mutations in DNA may not change amino acid sequence

• Third position in a codon especially can vary - violates our assumption of independent evolution of all positions in a sequence

• DNA sequences can be biased by codon usage differences between species - causes variations in sequence that are not attributable to evolution

• In alignments, DNA sequences that are not related can show a lot of similarity due to only 4 letters in alphabet, proteins do not have this problem (at least not as much)

• Introducing gaps in alignments of DNA sequences can cause frameshift errors, making alignment biologically meaningless

Page 28: BCB 444/544

28BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Advantages of DNA Sequences

• Better for closely related species• Show synonymous and non-synonymous

mutations, which allows analysis of positive and negative selection events• Lots of nonsynonymous mutations may mean

positive selection for new functions of protein with different amino acid sequence• Lots of synonymous mutations may mean

negative selection - changed amino acid sequence is detrimental

Page 29: BCB 444/544

29BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Multiple Sequence Alignment

• Most critical step in tree building - cannot build correct tree without correct alignment• Should build alignments with multiple

programs, then inspect and compare to identify the most reasonable one• Most alignments need manual editing• Make sure important functional residues

align• Align secondary structure elements• Use full alignment or just parts

Page 30: BCB 444/544

30BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07

Automatic Editing of Alignments

• Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences• Gblocks – detect and eliminate poorly

aligned positions and divergent regions