measuring genetic change level 3 molecular evolution and bioinformatics jim provan page and holmes:...

18
Measuring genetic Measuring genetic change change Level 3 Molecular Evolution and Level 3 Molecular Evolution and Bioinformatics Bioinformatics Jim Provan Jim Provan Page and Holmes: Section 5.2 Page and Holmes: Section 5.2

Upload: may-jennings

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Types of substitution (continued) Multiple substitutions can greatly obscure actual evolutionary history, particularly in cases where there have been many mutations i.e. over long evolutionary time scales Final three examples have serious implications for inference of evolutionary history: Similarity inherited from an ancestor is called homology Independently acquired similarity is called homoplasy All tree-building methods rely on sufficient levels of homology

TRANSCRIPT

Page 1: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Measuring genetic Measuring genetic changechange

Level 3 Molecular Evolution and Level 3 Molecular Evolution and BioinformaticsBioinformatics

Jim ProvanJim Provan

Page and Holmes: Section 5.2Page and Holmes: Section 5.2

Page 2: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

AParallelParallel

2 changes, no difference2 changes, no difference

AA CCAA CC

ACoincidentalCoincidental

2 changes, 1 difference2 changes, 1 difference

AA CCAA GG

ASingleSingle

1 change, 1 difference1 change, 1 difference

AA CC

ABackBack

2 changes, no difference2 changes, no difference

AA CCCC AA

AConvergentConvergent

3 changes, no difference3 changes, no difference

AA CCCC TT AA TT

AMultipleMultiple

2 changes, 1 difference2 changes, 1 difference

AA CCCC TT

Types of substitutionTypes of substitution

A C A T G C

C C T T A A

Page 3: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Types of substitution (continued)Types of substitution (continued)Multiple substitutionsMultiple substitutions can greatly obscure actual can greatly obscure actual evolutionary history, particularly in cases where evolutionary history, particularly in cases where there have been many mutations i.e. over long there have been many mutations i.e. over long evolutionary time scalesevolutionary time scalesFinal three examples have serious implications Final three examples have serious implications for inference of evolutionary history:for inference of evolutionary history:

Similarity inherited from an ancestor is called Similarity inherited from an ancestor is called homologyhomologyIndependently acquired similarity is called Independently acquired similarity is called homoplasyhomoplasy

All tree-building methods rely on sufficient levels All tree-building methods rely on sufficient levels of homologyof homology

Page 4: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Types of substitution (continued)Types of substitution (continued)

Substitutions that Substitutions that exchange a purine exchange a purine for another purine or for another purine or a pyrimidine for a pyrimidine for another pyrimidine another pyrimidine are called are called transitionstransitions

A

TG

C

Substitutions that Substitutions that exchange a purine for a exchange a purine for a pyrimidine or vice-versa pyrimidine or vice-versa are called are called transversionstransversions

Page 5: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Measuring evolutionary changeMeasuring evolutionary changeSimplest measure is to Simplest measure is to count number of count number of different sitesdifferent sitesPoor measure:Poor measure:

Some sites may Some sites may undergo repeated undergo repeated substitutionssubstitutionsAs sequences diverge, As sequences diverge, measure becomes less measure becomes less accurateaccurate 0

20

40

60

80

100

120

0 5 10 15 20 25

Time since divergence (Myr)Time since divergence (Myr)

Base

pai

r di

ffer

ence

sBa

se p

air

diff

eren

ces

SaturationSaturation occurs - most occurs - most sites changing have sites changing have changed beforechanged before

Page 6: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

TimeTime

Sequ

ence

diff

eren

ceSe

quen

ce d

iffer

ence

Correction of observed sequence Correction of observed sequence differencesdifferences

Observeddifference

Expected difference ‘‘Correction’Correction’

Page 7: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

A general framework of A general framework of sequence evolution modelssequence evolution models

PPtt = =

ppAAAA

ppCACA

ppGAGA

ppTATA

ppACAC

ppCCCC

ppGCGC

ppTCTC

ppAGAG

ppCGCG

ppGGGG

ppTGTG

ppATAT

ppCTCT

ppGTGT

ppTTTT

PPiiii = 1 - = 1 - ppijijjjii

ff = [ = [ffAA ffCC ffGG ffTT]]

Page 8: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

The Jukes-Cantor (JC) modelThe Jukes-Cantor (JC) modelAssumes that all four bases have equal frequencies Assumes that all four bases have equal frequencies and that all substitutions are equally likelyand that all substitutions are equally likely

PPtt ==

--

--

--

--

ff = [¼ ¼ ¼ ¼] = [¼ ¼ ¼ ¼]

Page 9: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Kimura’s 2 parameter model Kimura’s 2 parameter model (K2P)(K2P)

Takes into account different Takes into account different frequencies of transitions frequencies of transitions vs. transversionsvs. transversions

PPtt ==

--

--

--

--

ff = [¼ ¼ ¼ ¼] = [¼ ¼ ¼ ¼]0

102030405060708090

100

0 5 10 15 20 25

Transitions ()

Transversions ()

Page 10: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Felsenstein (1981) (F81)Felsenstein (1981) (F81)Takes into account Takes into account differences in base differences in base compositioncompositionPercentage (G + C) can Percentage (G + C) can range from 25% - 75%range from 25% - 75%F81 model allows the F81 model allows the frequencies of the four frequencies of the four nucleotides to be differentnucleotides to be differentDoes not allow for Does not allow for variation between variation between genes/speciesgenes/species ff = [ = [AA CC GG TT]]

PPtt ==

--

AA

AA

AA

CC

--

CC

CC

GG

GG

--

GG

TT

TT

TT

--

Page 11: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Hasegawa, Kishino and Yano Hasegawa, Kishino and Yano (1985) (HKY85)(1985) (HKY85)

Essentially merges the K2P and F81 models to allow Essentially merges the K2P and F81 models to allow transitions and transversions to occur at different transitions and transversions to occur at different rates as well as allowing base frequencies to varyrates as well as allowing base frequencies to vary

ff = [ = [AA CC GG TT]]PPtt ==

--

AA

AA

AA

CC

--

CC

CC

GG

GG

--

GG

TT

TT

TT

--

Page 12: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

General reversible model (REV)General reversible model (REV)Most general model - each substitution has its Most general model - each substitution has its own probabilityown probability

ff = [ = [AA CC GG TT]]PPtt ==

--

AAaa

AAbb

AAcc

CCaa

--

CCdd

CCee

GGbb

GGdd

--

GGff

TTcc

TTee

TTff

--By constraining By constraining a-fa-f it is possible to generate all it is possible to generate all the other modelsthe other models

Page 13: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Comparing the modelsComparing the models

JC

A=C=G=T

=

HKY85

ACGT

REV

ACGT

a,b,c,d,e,f

K2P

A=C=G=T

Allow transition/Allow transition/transversion biastransversion bias

Allow transition/Allow transition/transversion biastransversion bias

F81

ACGT

=

Allow baseAllow basefrequencies to varyfrequencies to vary

Allow baseAllow basefrequencies to varyfrequencies to vary

Page 14: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Comparing the models Comparing the models (continued)(continued)

AA CC GG TT

AA

CC

GG

TT

ObservedObserved

AA CC GG TT

AA

CC

GG

TT

JCJC

AA CC GG TT

AA

CC

GG

TT

K2PK2P

AA CC GG TT

AA

CC

GG

TT

HKY85HKY85

Page 15: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Assumptions: independenceAssumptions: independenceAssumes that change Assumes that change at one site has no at one site has no effect on other siteseffect on other sitesGood example is in RNA Good example is in RNA stem-loop structuresstem-loop structures

AA CC CC CC CC UU UU GGCCAA

UUGG GG GG GG GG AA AA

Substitution may result in Substitution may result in mismatched bases and mismatched bases and decreased stem stabilitydecreased stem stability

AA CC CC CC CC UU UU GGCCAA

UUGG GG GG GG CC AA AA

AA CC CC CC GG UU UU GGCCAA

UUGG GG GG GG CC AA AA

Compensatory changeCompensatory change may occur to restore may occur to restore Watson-Crick base pairingWatson-Crick base pairing

Page 16: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Assumptions: base compositionAssumptions: base compositionAssumption that base Assumption that base composition is at composition is at equilibrium and that equilibrium and that it is similar across all it is similar across all taxa studiedtaxa studiedIn example opposite, In example opposite, trees inferred using trees inferred using models which do not models which do not allow for this will not allow for this will not group group ThermusThermus and and DeinococcusDeinococcus

AquifexAquifex

ThermotogaThermotoga

ThermusThermus

DeinococcusDeinococcus

OthersOthers

64.064.0

63.763.7

63.263.2

55.555.5

53.953.9

% G + C% G + C

Page 17: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Assumptions: variation in Assumptions: variation in substitution rate across sitessubstitution rate across sitesAll sites are not All sites are not equally likely to equally likely to undergo a substitutionundergo a substitutionFunctional constraints:Functional constraints:

Pseudogenes have lost Pseudogenes have lost all function and can all function and can evolve freelyevolve freelyFourfold degenerate Fourfold degenerate sites do not change sites do not change amino acid composition amino acid composition of proteinsof proteinsNon-degenerate sites are Non-degenerate sites are highly constrainedhighly constrained

00.5

11.5

22.5

33.5

4

5’ flan

king r

egion

5’ flan

king r

egion

5’ un

trans

lated

regio

n

5’ un

trans

lated

regio

n

Non-de

gene

rate

site

s

Non-de

gene

rate

site

s

Twofo

ld de

gene

rate

site

s

Twofo

ld de

gene

rate

site

s

Four

fold d

egen

erat

e site

s

Four

fold d

egen

erat

e site

s

Intro

ns

Intro

ns

3’ un

trans

lated

regio

n

3’ un

trans

lated

regio

n

3’ fla

nking

regio

n

3’ fla

nking

regio

n

Pseu

doge

nes

Pseu

doge

nesSu

bsti

tuti

on /

site

/ 10

Subs

titu

tion

/ si

te /

1099 y

ears

yea

rs

Page 18: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

Assumptions: variation in Assumptions: variation in substitution rate across sites substitution rate across sites

(continued)(continued)

More rapidly evolving sequence shows most divergence More rapidly evolving sequence shows most divergence initially but soon saturatesinitially but soon saturatesSequence A actually appears to be more rapidly evolvingSequence A actually appears to be more rapidly evolving

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 50 100 150 200 250

DN

A di

verg

ence

DN

A di

verg

ence

Divergence time (Myr)Divergence time (Myr)

A 0.5% / Myr + 20% constraint0.5% / Myr + 20% constraint

B 2% / Myr + 50% constraint2% / Myr + 50% constraint