msc in bio informaticsmscbioinformatics.uab.cat/base/documents/... · mscin bioinformatics module...

Post on 14-Aug-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Molecular Evolution and Phylogeny (2)Sebastián E. Ramos-Onsins

Centre of Research in Agricultural Genomics

(CRAG )

1

Module 2: Core BioinformaticsModule 2: Core Bioinformatics

MSc in Bioinformatics

Course 2014-15

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

2 Sebastián E. Ramos-OnsinsMolecular Evolution

Representation of the genealogical relationships

among species, genes, population or even

individuals.

Phylogeny:

Ziheng Yang (2006)

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

3 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

A tree is a graphical representation of the relationships between

lineages using a tree structure in nodes and branches.

Rooted vs Unrooted Trees:

1

2

3

4

5

6

12

3

4

5

6

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

4 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Cladogram vs Phylogram Trees:

1

2

3

4

5

6

1

2

3

4

5

6

Qualitative Lengths are represented

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

5 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Unsolved vs resolved Trees:

Star Tree Partially resolved Tree Resolved Tree

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

6 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Species vs Gene Trees:

1

2

3

4

5

6

1

2

3

4

5

6

Based on multiple information

of the species

Based on a single or few regions of

(ex.) DNA of the species

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

7 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Ultrametric and AdditiveTrees: (not excludent)

1

2

3

4

5

6

Ex: d45 <= d43 = d53

The distances between any three

nodes connected by the same internal

node are equal.

d15 = d1i + dij + djk + dk5

The distances between species on the tips of

the tree are equal to the sum of the lengths

of the branches connecting them.

1

2

3

4

5

6

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

8 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Let’s create a tree history using R:

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

9 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

10 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

11 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

Two steps:

- Calculate the distance matrix.

- Reconstruct the phylogenetic tree from matrix.

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

12 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

1 2 3 4

1 0

2 1 0

3 2 4 0

4 3 5 6 0

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

13 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

1 2 3 4

1 0

2 1 0

3 2 4 0

4 3 5 6 0

3 4 5

3 0

4 6 0

5 3 4 0

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

14 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

1 2 3 4

1 0

2 1 0

3 2 4 0

4 3 5 6 0

3 4 5

3 0

4 6 0

5 3 4 0

4 6

4 0

6 4.67 0

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

15 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

node1 node2 go.to.n

ode

Div

1 1 - 5 0.5

2 2 - 5 0.5

3 3 - 6 1.5

4 4 - 7 2.33

5 2 1 6 1.0

6 5 3 7 0.83

7 6 4 - -

1

2

3

4

5

6

7

0.5

1.5

2.33

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

16 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

NJ (Neighbour-Joining): Minimum evolution tree criterion based on the

smallest sum of total length branches.

Starting from a star-tree, join the two nodes that give the minimum length

distance, repeat the process until resolve the tree.

From Yang 2006

To calculate the distances, it is assumed they are additive.

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

17 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

18 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

-Maximum Parsimony:

-Criterion based on minimum evolution.

-The best tree is the tree with the minimum number of changes.

-Reconstruct all possible trees assigning values to the internal nodes and score the

trees according to the number of changes.

-Heuristic methods are necessary for large samples.

-Long Branch Attraction (LBA) is specially problematic in MP trees; MP trees support

wrong reconstructions in case having longer branches (join together).

A

A

AG

G G G

A

A G A Aa

b d

c a

b

d

c

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

19 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

20 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Maximum Likelihood:

-Criterion is the maximum probability tree.

-Calculate the probability of a tree for a given evolutionary model

-Computationally expensive calculations to obtain the ML tree.

-Nice statistical properties. Popular method and gives reasonable results.

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

21 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Distance Methods

- Maximum Parsimony

- Maximum Likelihood

- Bayesian Inference

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

22 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Tree-reconstruction Methods

- Bayesian Inference

Seek for a distribution of compatible trees with the highest probabilities

according to a given model and a prior distribution of the parameters included.

Main criticisms concerning the selection of the prior distributions.

Method also popular and gives reasonable results.

Based on the Bayes theorem (inverse probability theorem):

P(A|B) = P(A) x P(B|A)

P(B)

P(A) x P(B|A)

P(A) x P(B|A) + P(Ā) x P(B|Ā)=

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

23 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Let’s do a simple tree reconstruction using R:

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

24 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

25 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

26 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Jacknife

Bootstrap

-Draw a subset of the data

-This data is used to infer again the tree

-The support for the obtained tree is obtained from the number of

times the same clusters (nodes) are obtained in the

pseudoreplicates.

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

27 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Jacknife

Bootstrap

Assumptions:

-Data size is large, so we have accurate estimates of the error.

-Each position (column in the alignment) is independent from each

other.

Results:

The resulted values are not directly a probability value but a support

value of the reliability of the obtained tree.

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

28 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

1

2

3

4

5

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

29 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

1

2

3

4

5

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

30 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

Resampling

1137721

AACTTTA

GGCTTTG

AAGCCTA

AAGCCTA

AAGCCGA

1

2

3

4

5

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

31 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

Resampling Do Tree

1

2

3

4

5

1

2

3

4

5

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

1137721

AACTTTA

GGCTTTG

AAGCCTA

AAGCCTA

AAGCCGA

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

32 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

Resampling Do Tree

1

2

3

4

5

1

2

3

4

5

+1

+1

+0

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

1137721

AACTTTA

GGCTTTG

AAGCCTA

AAGCCTA

AAGCCGA

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

33 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Non-parameteric methods of resampling (no model is assumed)

Bootstrap

Resampling Do Tree

1

2

3

4

5

1

2

3

4

5

+1

+1

+0

… and repeat again n times!

1234567

ATCTTCT

GTCTTCT

ATGATCC

ATGAACC

AGGAACC

1137721

AACTTTA

GGCTTTG

AAGCCTA

AAGCCTA

AAGCCGA

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

34 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Let’s do a Bootstrap analysis using R:

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

35 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

- Parametric bootstraping

- Bayesian Inference

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

36 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

- Parametric bootstraping

Repetition of phylogeny based on a given model

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

37 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Support of the phylogenetic Trees obtained

Different methods to contrast the support of phylogenetic trees

-Depending on the method of reconstruction (Bremer Support in MP)

-Non-parameteric methods of resampling (no model is assumed)

-Parametric methods (assuming a model)

-Bayesian Inference

-Bayesian inference itself collects compatible trees assuming

the uncertainty of the tree

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

38 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Phylogenomics: An approach to obtain the Species Tree

In case the speciation process is close among species, a gene tree can give

an erroneous topology:

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

39 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Phylogenomics: An approach to obtain the Species Tree

In case the speciation process is close among species, a gene tree can give

an erroneous topology:

Incomplete Lineage Sorting

Anomalous Region

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

40 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Phylogenomics: An approach to obtain the Species Tree

-Having a large number of regions (or also information from different

sources) can help to solve the incongruence.

-Heuristic methods based on a Supermatrix (concatenate all regions as

one) or on a Supertree (make a single tree from individual trees) are used.

-Likelihood-based methods are computationally expensive but are

statistically well supported.

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

41 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Let’s try to obtain the species Tree using the library phybase in R:

MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics

42 Sebastián E. Ramos-OnsinsMolecular Evolution

Phylogeny

Use of phylogenies for different objectives:

- Ancestral sequence reconstruction

- Dating ancestral events

- Detection of selection (Syn vs Nsyn positions)

- Correlation of the phylogenetic signal with phenotypic Traits

top related