introduction to bioinformatics molecular phylogeny lesson 5

Post on 21-Dec-2015

224 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Bioinformatics

Molecular Phylogeny

Lesson 5

2

Theory of Evolution: Life is monophyletic

• All organisms on Earth had a common ancestor.

• Any two organisms share a common ancestor in their past.

Ancestor

Descendant 1 Descendant 2

3

Theory of Evolution:• Speciation events

lead to creation of different species (two species ).

• Speciation caused by physical separation into groups where different genetic variants become dominant.

Ancestor

Descendant 1 Descendant 2

4

Ancestor

5

Ancestor

6

Ancestor

7

extinct

extant 1 extant 2

The genetic distance between any two extant

organisms is computable.

8

The differences The differences between 1 and between 1 and 2 are the result 2 are the result of changes on of changes on the lineage the lineage leading to leading to descendant 1 + descendant 1 + those on the those on the lineage leading lineage leading to descendant to descendant 2.2.

descendant 1 descendant 2

ancestor

9

Thus, any set of species are related: the relation is Phylogeny

The relationships can be represented by Phylogenetic Tree (or dendrogram)

10

5 MYA

120 MYA

1,500 MYAMYA = Million Years Ago

11

Phylogenetic Tree Terminology

• Graph composed of nodes & branches

• Each branch connects two adjacent nodes

A B C D

E

F

R

12

Phylogenetic Tree Terminology• Nodes represent the taxonomic units

• Taxonomic units = species/genes/individuals

• Branch = relations among the taxonomic units (descant & ancestry)

• Branching pattern = Topology

• Branch lengths correspond to number of substitutions. Longer branch means more substitutions.

13

Phylogenetic Tree Terminology

AB C D E

internal node - hypothetical most recent common ancestors

leaf (terminal node) - current day species or gene “taxa”

Branches

Root

14

OTUs & HTUs

• OTUs = Operational Taxonomic Units– leaves of the tree

• HTUs = Hypothetical Taxonomic Units– internal nodes of the tree

15

Chimp HumanGorillaHuman ChimpGorilla

=

Chimp GorillaHuman

= =

Human GorillaChimp

TreesTrees

16

Same thingSame thing

s4 s5s1 s3s2s4 s5s1 s3s2

=

17

Newick format

A

B

C

D

E

((A,B),(C,(D,E)));

18

Rooted vs. unrooted treesRooted vs. unrooted trees

1

2

3

3 1

2

19

Gorilla gorilla

(Gorilla)

Homo sapiens (human)

Pan troglodytes (Chimpanzee)

Gallus gallus (chicken)

20

3 possible UNROOTED trees:3 possible UNROOTED trees:

Human

Chimp

Chicken

Gorilla

Human

Gorilla

Chimp

Chicken

Human

Chicken

Chimp

Gorilla

the best tree

21

Rooting based on priori knowledge:Rooting based on priori knowledge:

Human

Chimp

Chicken

Gorilla

Human ChimpChicken Gorilla

22

Ingroup / Outgroup:Ingroup / Outgroup:

Human ChimpChicken Gorilla

INGROUPOUTGROUP

23

Monophyletic groups (clades):

A group is monophyletic (clade) if it has a common

ancestor and all the descendents of this ancestor are in

the group.

24

Monophyletic groupsMonophyletic groups

Human ChimpChicken Gorilla

The Gorilla+Human+Chimp are monophyletic

25

Non-monophyletic groupsNon-monophyletic groups

Whale ChimpDrosophila Zebra-fish

The Zebra-fish+Whale are not monophyletic:

Adaptation to water occurred more than once during evolution, independently… (or was lost in the lineage leading to chimp).

26

Monophyletic groups:Monophyletic groups:Human

Chimp

Chicken

Gorilla

When an unrooted tree is given, you cannot know which groups are monophyletic. You can only say which are not.

For example, Chicken + Rat might be monophyletic if the root was between Chicken + Rat and the rest. In fact, the real root of the tree is between Chicken and the rest, hence Chicken and rat are not monophyletic. But, Human and Gorilla are not monophyletic no matter where is the root…

Rat

27

What data can be used?(1) Molecular data (DNA, RNA, proteins)

(2) Morphological data (living or fossilized organisms)

28

Advantages of molecular data:

• Heritable entities• Characters’ description is unambiguous• Molecular data are amenable to quantitative

treatment• Can assess evolutionary relationship among

distantly related organisms (ribosomal RNA)• More abundant data (bacteria, algae)

29

What we can learn from phylogenetics tree?

Determining the closest relatives of the organism that’s you are interested in.

30

Example 1: Which species are closest to Human?

Human

Chimpanzee

Gorilla

Orangutan

Gorilla

Chimpanzee

Orangutan

Human

Molecular analysis:Chimpanzee is related more closely

to human than the gorilla

Pre-Molecular analysis:The great apes

(chimpanzee, Gorilla & orangutan)Separate from the human

31

Example 2 :Guilty Sequence - scientists map a

murder weapon

“In 1998, a Louisiana doctor was convicted of attempting to murder his ex-girlfriend, a nurse. The murder weapon was a syringe of HIV-infected blood drawn from a patient under the doctor's care.”

32

History of the virus:

©2002 National Academy of Sciences, U.S.A.

Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99, 14292-14297

Phylogenetic analysis of the RT region. The smaller set of boxed sequences represents the sequences from the victim, and the larger set of boxed sequences represents the patient plus victim sequences. LA denote viral sequences from control HIV-1 infected individuals.

33

Species trees and Gene trees

• Species trees - representing the evolutionary relationships among species (the speciation process).

• Gene trees – Different genes may have different evolutionary history.

34

Before Darwin, homology was defined morphologically.

Similarity between properties in various species.

Example:• Bats and butterflies fly, but the structures are different. • Bats fly and whales swim, yet the bones in a bat's wing and a whale's flipper are strikingly alike.

Conclusions: 1. Bats and butterflies wings are not homologous.2. Bat wings and whales flippers are homologous.

What is Homology ?What is Homology ?

35

• Darwin (1859): Homology is a result of descent with modifications from a common ancestor.

• Modern genetics: Homology is determined by genes.

• Two sequences are homologous if they are similar and share a common ancestor (similarity by itself is not enough).

• Large enough similarities typically imply homology.

Homology Interpretation: Homology Interpretation: from Darwin to 21st Centuryfrom Darwin to 21st Century

36

Homolog

• A gene related to a second gene by descent from a common ancestral DNA sequence.

37

OrthologsHomologous sequences are Homologous sequences are

orthologousorthologous if they were separated if they were separated by aby a speciationspeciation event:event:

If a gene exists in a species, and that If a gene exists in a species, and that species diverges into two species, species diverges into two species, then the copies of this gene in the then the copies of this gene in the resulting species are orthologous.resulting species are orthologous.

38

Orthologs

• Orthologs will typically have the same or similar function in the course of evolution.

• Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.

39

Orthologs

speciation

ancestor

descendant 2descendant 2

40

Paralogs Homologous sequences are Homologous sequences are

paralogousparalogous if they were separated if they were separated by a by a gene duplicationgene duplication event: event:

If a gene in an organism is If a gene in an organism is duplicated, then the two copies are duplicated, then the two copies are

paralogous. paralogous.

41

Paralogs

• Orthologs will typically have the same or similar function.

• This is not always true for paralogs due to lack of the original selective pressure upon one copy of the duplicated gene, this copy is free to mutate and acquire new functions.

42

Paralogs

DuplicationDuplication

43

Orthologs & Paralogs

Duplication

Speciation

Species a Species b

Paralogs

Orthologs

Orthologs

44

How many rooted trees

a ba b c b a c c a b

N=3, TR(3) = 3

b c da c b da d b ca a c db c a db

TR = “TREE ROOTED”

N=2, TR(2) = 1

d a cb a b dc b a dc d a bc a b cd

b a cd c a bd b c da c b da d b ca

N=4, TR(4) = 15

45

Number of Number of Number of taxarooted treesunrooted trees2 1 13 3 14 15 35 105 156 954 1057 10,395 9548 135,135 10,3959 2,027,025 135,13510 34,459,425 2,027,02511 654,729,075 34,459,42512 13,749,310,575 654,729,075

Number of possible trees:

46

NRooted=(2n-3)! / 2n-2(n-2)!

NUnrooted=(2n-5)! / 2n-3(n-3)!

Number of possible trees

47

Evolution is an historical process.

Only one historical narrative is true.

From 8,200,794,532,637,891,559,375 possibilities for 20 taxas, 1

possibility is true and 8,200,794,532,637,891,559,374 are false.

Truth is one, falsehoods are many.

48

How do we know which of the

8,200,794,532,637,891,559,375 trees is true?

We don’t, we infer by using decision criteria.

49

Methods

50

Approach 1 - Distance methods• Two steps:

– Compute a distances between any two sequences from the MSA.

– Find the tree that agrees most with the distance table.

Approach 2 - Character state methods• Input: multiple sequence alignment

• Algorithms: – Maximum parsimony (MP)– Maximum likelihood (ML)

51

Step 1 :Distances estimation

There are different methods to compute the distance between any two sequences. For example, one can take into account different probabilities between transitions and transversions…

B 8

OTU A B C

CD

7 912 14 11

D

A

52

Step 2:From a distance table to a tree

• Algorithms:– UPGMA – Neighbor Joining (NJ)

53

Neighbor Joining (NJ)

• Reconstructs unrooted tree• Calculates branch lengths • Based on Star decomposition• In each stage, the two nearest nodes of the

tree are chosen and defined as neighbors in our tree. This is done recursively until all of the nodes are paired together.

54

What are neighbours?What are neighbours?Neighbours are defined as a pair of OTU's who Neighbours are defined as a pair of OTU's who have one internal node connecting them.have one internal node connecting them.

Neighbors, we are …Neighbors, we are …

BD

A C

A and B are neighbours,C and D are neighbours,But…A and C are not neighbours…

55

Which pair is closest?Which pair is closest?

Neighbors, we are …Neighbors, we are …

ri=Σdik /(N-2) average distance from all nodes

Mij= dij - [ri + rj] distance of i,j relative to the rest

56

7 9

OTU A B C

CDE

12 1 3

D

A

B 8

A

B

C

D

(B,D)

A

C

(B,D)

EE

11 10 2 6

E

OTU A (B,D) C

CE

7 6

A

10

E

11 8 2

57

(B,D)

A

C

E

OTU A (B,D) C

CE

7 6

A

10

E

11 8 2

(B,D)

(B,D)

(C,E)

A

B

D

CE

A

=

58

Advantages and disadvantages of NJ • Advantages

– is fast and thus suited for large datasets and for bootstrap analysis

– permist lineages with largely different branch lengths

– permits correction for multiple substitutions

• Disadvantages – sequence information is reduced

• gives only one possible tree – strongly dependent on the model of evolution

used.

top related