genome evolution. amos tanay 2012 genome evolution lecture 9: mutations and variational inference

24
Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Upload: anna-oliver

Post on 28-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Genome evolution

Lecture 9: Mutations and variational inference

Page 2: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Sources of mutations

• Mistakes– Replication errors (point mutations, tandem dups/deletions)– Recombination errors (mainly indels)

• Endogenous DNA Damage– Spontaneous base damage: Deaminations, depurinations– Byproducts of metabolism: Oxygen radicals that damage DNA

• Exogenous DNA Damage– UV– Chemicals

All of these mechanisms cross talk with the surrounding sequence

Page 3: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

DNA polymerases

• replicating DNA

• A good polymerase domain has a misincorporation rate of 10-5

(1/100,000)

• Any misincorps are clipped off with 99% efficiency by the “proofreading” activity of the polymerase

• Further mismatch repair that works in 99.9% of the case bring the fidelity of the main Polymerases to 10-10

• Some dedicated polymerases are not as accurate!

Page 4: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Recombination errors

• A consequence of partial homology between different chromosomal loci

• Can introduce translocations if the matching sequences are on different chromosomes

• Can introduce inversion or deletion if the matching sequences are on the same chromosome

• Can generate duplication or deletions if the matching sequences are in tandem

Replication slippage• Processing a strand, disconnect and reconnect at the wrong place

CACACACACACACACACA CGACAGCGACAGTTACAAA

Page 5: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Endogenous DNA damage: Deamination of Cytosines

*Thymine has CH3 here

NH

H

H

H

ON

N

2

H*

H

H

ON

N

O

deNHn

Cytosine Uracil

H

Page 6: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Deamination of Cytosine creates a G-U mismatchEasy to tell that U is wrong

Deamination of Cytosine creates a G-T mismatchNot easy to tell which base is the mutation.

About 50% of the time the G is “corrected” to Aresulting in a mutation

Page 7: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

UV irradiation generate primarily Thymine dimers:

Exogenous DNA damage

Chemicals -

• Food• Benzopyrene – smoke

UV radiations (Sunlight)

Ionizing raidation• radon •Cosmic rays•X rays

Page 8: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Direct repair

Repairing DNA damage

Page 9: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Thymine Dimers can be corrected by a direct repair mechanism

Photon

Page 10: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Deaminated basesare repaired by a base excision mechanism.

BER

Page 11: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Spontaneously occuringabasic sites are repairedby the same mechanism

BER

Page 12: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Dimeric bases andbulky lesions, e.g.,large chemical adductsare repaired by Nucleotide excision repair

NER

Page 13: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Evolutionary consequences of the rich mutational process

Cannot ignore dependencies among adjacent sites

Mechanisms are evolutionary variable

Lifestyle -> Environmental exposure

Germline and male/female ratio

Mechanisms are variable on the genomic scale – late vs. early replication

Page 14: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Dynamic Bayesian Networks

1

2 3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

Synchronous discrete time process

T=1 T=2 T=3 T=4 T=5

Conditional probabilities

Conditional probabilities

Conditional probabilities

Conditional probabilities

Page 15: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Context dependent Markov Processes

A AA C AA G AA

AAQ CAQ GAQ

Context determines A markov process rate matrix

Any dependency structure make sense, including loops

A AA

AQ?C

When context is changing, computing probabilities is difficult.Think of the hidden variables as the trajectories Continuous time Bayesian Networks

Koller-Noodleman 2002

1 2 3 4

)(pa iQi

Page 16: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Modeling simple context in the tree: PhyloHMM

Siepel-Haussler 2003

hpaij

hij-1 hi

j

hpaij

hij-1 hi

j hij+!

hpaij+!hpai

j-1

hkj-1 hk

j hkj+1

Heuristically approximating the Markov process?

Where exactly it fails?

Page 17: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Log-likelihood to Free Energy

h

shPsP )|,(log)|(log

• We have so far worked on computing the likelihood:

h hs

hqhqF )

)|,Pr(

)(log()(

• Better: when q a distribution, the free energy bounds the likelihood:

• Computing likelihood is hard. We can reformulate the problem by adding parameters and transforming it into an optimization problem. Given a trial function q, define the free energy of the model as:

• The free energy is exactly the likelihood when q is the posterior:

)|Pr(log)|Pr(log),|Pr(

))|,Pr(/),|log(Pr(),|Pr(),|Pr()(

sssh

hsshshFshhq

h

h

hh

shqshhqhqF )),Pr(ogl)()),|Pr(/)(log()(

D(q || p(h|s)) Likelihood

Page 18: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Energy?? What energy?

T

xE

eTZ

xp)(

)(

1)(

• In statistical mechanics, a system at temperature T with states x and an energy function E(x) is characterized by Boltzman’s law:

• If we think of P(h|s,):

• Given a model p(h,s|T) (a BN), we can define the energy using Boltzman’s law

• Z is the partition function:

dxeTZ TxE /)()(

)|,(log)|,(1 shpshET

)(),,(log)( spZshphE

Page 19: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Free Energy and Variational Free EnergyT

xE

eTZ

xp)(

)(

1)(

• The Helmoholtz free energy is defined in physics as:

• The average energy is:

• The variational transformation introduce trial functions q(h), and set the variational free energy (or Gibbs free energy) to:

• This free energy is important in statistical mechanics, but it is difficult to compute, as our probabilistic Z (= p(s))

)()()( qHqUqF

h

hEhqqU )()()(

ZFH log

h

hqhqqH )(log)()(

• The variational entropy is:

• And as before:

)||()( pqDFqF H

Page 20: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Solving the variational optimization problem

• So instead of computing p(s), we can search for q that optimizes the free energy

)()()( qHqUqF h

shphqqU ),(log)()( h

hqhqqH )(log)()(

• This is still hard as before, but we can simplify the problem by restricting q• (this is where the additional degrees of freedom become important)

Maxmizing U? Maxmizing H?

Focus on max configurations Spread out the distribution

Page 21: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Simplest variational approximation: Mean Field

• Let’s assume complete independence among r.v.’s posteriors:

)()()( qHqUqF h

shphqqU ),(log)()( h

hqhqqH )(log)()(

• Under this assumption we can try optimizing the qi – (looking for minimal energy!)

Maxmizing U? Maxmizing H?

Focus on max configurations Spread out the distribution

)()( iii

hqhq

)(log),(logmin)(min iiiiiq

MF hqqshpqqFFi

i hiii

hii

i

hqqshpq )(log),(log)(min

Page 22: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Mean Field Inference

• We optimize iteratively:

• Select i (sequentially, or using any method)

• Optimize qi to minimize FMF(q1,..,qi,…,qn) while fixing all other qs

• Terminate when FMF cannot be improved further

)()( iii

hqhq )(log),(logmin)(min iiiiiq

MF hqqshpqqFFi

i hiii

hii

i

hqqshpq )(log),(log)(min

• Remember: FMF always bound the likelihood

• qi optimization can usually be done efficiently

Page 23: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

Adaptive mutations: Cairns et al. 88

Experimental system: lacz frameshiftLuria-Delbruk’s observation

The experiment suggests adaptive mutations

Page 24: Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012

The “Mutator” paradigm:

Ability to switch to the mutator phenotype depends on particular DNA repair mechanisms (Double Strand Break repair in E. Coli)

Mutator phenotype is suggested to be important in pathogenesis, antibiotic resistance, and in cancer

Species occasionally change (adaptively or even by drift) their repair policy/efficiency

The resulted substitution landscape must be very complex