expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_em_algorithm_cgat_2mar17.pdf ·...

50
Expectation–maximisation algorithm Christiana Kartsonaki [email protected] CGAT March 2, 2017

Upload: others

Post on 25-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–maximisation algorithm

Christiana Kartsonaki

[email protected]

CGAT

March 2, 2017

Page 2: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Introduction

Fitting a parametric model

Likelihood

function of the (unknown) parameters and the data

Maximum Likelihood Estimates (MLE) → parameter estimates whichmake the observed data most likely

General approach, as long as tractable likelihood function exists

Calculate derivative of log likelihood, set to zero and solve for parameter,check that it is a maximum

Christiana Kartsonaki March 2, 2017 2 / 40

Page 3: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Introduction

Fitting a parametric model

Likelihood

function of the (unknown) parameters and the data

Maximum Likelihood Estimates (MLE) → parameter estimates whichmake the observed data most likely

General approach, as long as tractable likelihood function exists

Calculate derivative of log likelihood, set to zero and solve for parameter,check that it is a maximum

Christiana Kartsonaki March 2, 2017 2 / 40

Page 4: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Likelihood

Data x

If x known we take `(θ; x) = log{f (x; θ)} and maximise it w.r.t. θ

In one or two dimensions the likelihood function can be tabulated.

May be very difficult in many dimensions and complex models.

x not fully observed → not possible

Expectation–Maximisation (EM) algorithm → iterative procedure formaximising the likelihood

Christiana Kartsonaki March 2, 2017 3 / 40

Page 5: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Likelihood

Data x

If x known we take `(θ; x) = log{f (x; θ)} and maximise it w.r.t. θ

In one or two dimensions the likelihood function can be tabulated.

May be very difficult in many dimensions and complex models.

x not fully observed → not possible

Expectation–Maximisation (EM) algorithm → iterative procedure formaximising the likelihood

Christiana Kartsonaki March 2, 2017 3 / 40

Page 6: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

EM algorithm → maximise a conditional likelihood for the unobserveddata

Generalisation of maximum likelihood estimation to ’incomplete’ data

Any problem that is simple to solve for complete data, but the availabledata are ’incomplete’ in some way

The likelihood for the incomplete data may have multiple local maximaand no closed form solution, even if for the complete version it has aglobal maximum and closed form solution

Christiana Kartsonaki March 2, 2017 4 / 40

Page 7: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

Starting parameter values

Use them to ’estimate’ complete data

Use estimates of complete data to update parameters

Repeat until converge (estimates close to each other).

Christiana Kartsonaki March 2, 2017 5 / 40

Page 8: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

EM algorithm

Incomplete data.

Starting parameter values

Supplement by synthetic data

E step

Analyse

M step

Use estimate to improve the synthetic data.

Analyse.

Repeat until stable answer.

Christiana Kartsonaki March 2, 2017 6 / 40

Page 9: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

EM algorithm

Incomplete data.

Starting parameter values

Supplement by synthetic data E step

Analyse M step

Use estimate to improve the synthetic data.

Analyse.

Repeat until stable answer.

Christiana Kartsonaki March 2, 2017 6 / 40

Page 10: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

EM algorithm

Incomplete data.

Starting parameter values

Supplement by synthetic data E step

Analyse M step

Use estimate to improve the synthetic data.

Analyse.

Repeat until stable answer.

Christiana Kartsonaki March 2, 2017 6 / 40

Page 11: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

EM algorithm

Incomplete data.

Starting parameter values

Supplement by synthetic data E step

Analyse M step

Use estimate to improve the synthetic data.

Analyse.

Repeat until stable answer.

Christiana Kartsonaki March 2, 2017 6 / 40

Page 12: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Incomplete data

Latent variables (present participle of lateo (lie hidden))

variables that are not directly observed but are rather inferred from othervariables that are observed

Christiana Kartsonaki March 2, 2017 7 / 40

Page 13: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

The fitting of certain models is simplified by treating the observed dataas an incomplete version of an ideal dataset whose analysis would havebeen easy.

Key idea → estimate the log likelihood contribution from the missingdata by its conditional value given the observed data

Christiana Kartsonaki March 2, 2017 8 / 40

Page 14: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

Many versions of ’the’ EM algorithm.

Rather than picking a single most likely completion of the missingassignments on each iteration, the EM algorithm computes probabilitiesfor each possible completion of the missing data, using the currentparameters θ̂(t)

→ create a weighted set of data consisting of all possible completions ofthe data

A modified version of maximum likelihood estimation that deals withweighted data provides new parameter estimates θ̂(t+1)

’Estimates’ completions of missing data given the current model (E step)and then estimates the model parameters using these completions (Mstep)

Christiana Kartsonaki March 2, 2017 9 / 40

Page 15: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

Many versions of ’the’ EM algorithm.

Rather than picking a single most likely completion of the missingassignments on each iteration, the EM algorithm computes probabilitiesfor each possible completion of the missing data, using the currentparameters θ̂(t)

→ create a weighted set of data consisting of all possible completions ofthe data

A modified version of maximum likelihood estimation that deals withweighted data provides new parameter estimates θ̂(t+1)

’Estimates’ completions of missing data given the current model (E step)and then estimates the model parameters using these completions (Mstep)

Christiana Kartsonaki March 2, 2017 9 / 40

Page 16: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

complete data yC = (yO , yU)

parameter θ

Expectation–Maximisation algorithm

initialize parameter: θ ← θ(0)

for t = 1 to ...

E step

Q(θ | θ(t)) = E{

log f (yC | θ) | yO , θ(t)}

Q(θ | θ(t)): conditional expectation

f (yC | θ): complete-data likelihood

M step

θ(t+1) = argmaxθ

Q(θ | θ(t))

Christiana Kartsonaki March 2, 2017 10 / 40

Page 17: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

complete data yC = (yO , yU)

parameter θ

Expectation–Maximisation algorithm

initialize parameter: θ ← θ(0)

for t = 1 to ...

E step

Q(θ | θ(t)) = E{

log f (yC | θ) | yO , θ(t)}

Q(θ | θ(t)): conditional expectation

f (yC | θ): complete-data likelihood

M step

θ(t+1) = argmaxθ

Q(θ | θ(t))

Christiana Kartsonaki March 2, 2017 10 / 40

Page 18: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

complete data yC = (yO , yU)

parameter θ

Expectation–Maximisation algorithm

initialize parameter: θ ← θ(0)

for t = 1 to ...

E step

Q(θ | θ(t)) = E{

log f (yC | θ) | yO , θ(t)}

Q(θ | θ(t)): conditional expectation

f (yC | θ): complete-data likelihood

M step

θ(t+1) = argmaxθ

Q(θ | θ(t))

Christiana Kartsonaki March 2, 2017 10 / 40

Page 19: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

complete data yC = (yO , yU)

parameter θ

Expectation–Maximisation algorithm

initialize parameter: θ ← θ(0)

for t = 1 to ...

E step

Q(θ | θ(t)) = E{

log f (yC | θ) | yO , θ(t)}

Q(θ | θ(t)): conditional expectation

f (yC | θ): complete-data likelihood

M step

θ(t+1) = argmaxθ

Q(θ | θ(t))

Christiana Kartsonaki March 2, 2017 10 / 40

Page 20: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

Expectation step: given the current estimate θ(t) of θ, calculateQ(θ | θ(t)) as a function of θ

Maximisation step: determine a new estimate θ(t+1) as the value of θwhich maximises Q(θ | θ(t))

Christiana Kartsonaki March 2, 2017 11 / 40

Page 21: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Some considerations

Ascent property M step increases Q ⇒ increases L(θ) = f (yO ; θ)

The log likelihood `(θ, yC ) is never decreased at any iteration of the algorithm.

Convergence to local maxima choose multiple initial values

Convergence rate → depends on the amount of missing information

E and/or M steps may be too complicated can be broken into smaller

components, sometimes involving Monte Carlo simulation to compute the

conditional expectations required for the E step

Christiana Kartsonaki March 2, 2017 12 / 40

Page 22: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Precision

How flat or peaked the likelihood is around the maximum determines theprecision of the estimate.

The EM algorithm on its own does not give this and much furtherdevelopment is involved in getting an idea of precision via EM.

Oakes (1999) JRSS B

Christiana Kartsonaki March 2, 2017 13 / 40

Page 23: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

Y : observed data

U: unobserved variables

Our goal is to use the observed value y of Y for inference on a parameterθ, in models where we cannot easily calculate the density

f (y ; θ) =

∫f (y | u; θ)f (u; θ)du

and hence cannot readily compute the likelihood for θ based only on y .

Christiana Kartsonaki March 2, 2017 14 / 40

Page 24: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

We write the complete-data log likelihood based on both y and the valueu of U as

log f (y , u; θ) = log f (y ; θ) + log f (u | y ; θ), (1)

first term on the right → observed-data log likelihood `(θ)

As the value of U is unobserved, the best we can do is to remove it bytaking expectation of (1) with respect to the conditional densityf (u | y ; θ) of U given that Y = y .

Christiana Kartsonaki March 2, 2017 15 / 40

Page 25: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

This yields

E{log f (Y ,U; θ) | Y = y ; θ′} = `(θ) + E{log f (U | Y ; θ) | Y = y ; θ′},

which can be expressed as

Q(θ; θ′) = `(θ) + C (θ; θ′).

Christiana Kartsonaki March 2, 2017 16 / 40

Page 26: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Expectation–Maximisation (EM) algorithm

EM

Starting from an initial value θ′ of θ,

1 Compute Q(θ; θ′) = E{log f (Y ,U; θ) | Y = y ; θ′} E step

2 With θ′ fixed, maximise Q(θ; θ′) over θ, giving θ∗ M step

3 Check if the algorithm has converged, using `(θ∗)− `(θ′) ifavailable, or |θ∗ − θ′|, or both. If not, set θ′ = θ∗ and repeat.

Christiana Kartsonaki March 2, 2017 17 / 40

Page 27: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequencies

Blood group: A, AB, B, O

ABO → 3 alleles

A and B codominant, both dominant over O

assume Hardy–Weinberg equilibrium and random sampling

pA = 2NAA+NAa

2N = nAA + 12nAa

dominance → cannot distinguish the homozygotes and the heterozygotes

Christiana Kartsonaki March 2, 2017 18 / 40

Page 28: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequenciesHardy–Weinberg equilibrium

Genotypic frequencies → always determine the allelic frequencies

The reverse is not necessarily true

Under some assumptions, for an autosomal gene with 2 alleles (A and a)with frequencies p and q respectively, we have 3 genotypes (AA, Aa andaa) whose frequencies are

p2, 2pq and q2

= (p + q)2

G. H. Hardy, W. Weinberg (1908) (independently)

For more than 2 alleles:

(p1 + p2 + . . .+ pn)2 = p21 + 2p1p2 + . . .+ 2p1pn

+p22 + . . .

+p2n + . . .+ 2pn−1pn

Christiana Kartsonaki March 2, 2017 19 / 40

Page 29: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequencies

Phenotype Genotype Phenotype Genotype Expected(Blood group) counts counts frequency

A AA + AO nA nAA + nAO p2 + 2prB BB + BO nB nBB + nBO q2 + 2prAB AB nAB nAB 2pqO OO nO nOO r2

Total n n 1

Christiana Kartsonaki March 2, 2017 20 / 40

Page 30: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequenciesHistory

Bernstein (1925)

r ′ =

√nOn

E(nA + nO

n

)= (p + r)2 = (1− q)2 ⇒ q′ = 1−

√nA + nO

n

and similarly

p′ = 1−√

nB + nOn

I do not generally add up to 1

Christiana Kartsonaki March 2, 2017 21 / 40

Page 31: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequenciesHistory

Bernstein (1930)

d = 1− (p′ + q′ + r ′)

therefore

p′′ = p′(

1 +d

2

)

q′′ = q′(

1 +d

2

)

r ′′ =

(r +

d

2

)(1 +

d

2

)

I still don’t add up to 1, but difference smaller

Christiana Kartsonaki March 2, 2017 22 / 40

Page 32: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequenciesHistory

Wiener (1929)

r ′′′ =

√nOn

E(nA + nO

n

)= (p + r)2 ⇒ p′′′ =

√nA + nO

n−√

nOn

and similarly

q′′′ =

√nB + nO

n−√

nOn

I don’t add up either

Christiana Kartsonaki March 2, 2017 23 / 40

Page 33: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequencies

Phenotype Genotype Phenotypic Genotypic Expected(Blood group) counts counts frequency

A AA + AO nA nAA + nAO p2 + 2prB BB + BO nB nBB + nBO q2 + 2prAB AB nAB nAB 2pqO OO nO nOO r2

Total n n 1

Christiana Kartsonaki March 2, 2017 24 / 40

Page 34: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequencies

Complete data: X = (nAA, nAO , nBB , nBO , nAB , nOO) (genotype counts)

Observed data: Y = (nA, nB , nAB , nO) (phenotype counts)

n (total number of individuals in the sample), in terms of the completedata

nA in terms of the complete data

nB in terms of the complete data

nAB in terms of the complete data

nO in terms of the complete data

complete data likelihood – assume HWE

Christiana Kartsonaki March 2, 2017 25 / 40

Page 35: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequencies

If a person has type A (B), the underlying genotype could be either AA(BB) or AO (BO).

nA = nAA + nAO

nB = nBB + nBO

nAB = nAB

nO = nOO

The likelihood for the complete data is simple:

(nAA, nAO , nBB , nBO , nAB , nOO) ∼ Multinomial(n, p2, 2pr , q2, 2pr , 2pq, r2)

Christiana Kartsonaki March 2, 2017 26 / 40

Page 36: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequencies

Phenotype Genotype Phenotypic Genotypic Expected(Blood group) counts counts frequency

A AA + AO nA = 186 nAA + nAO p2 + 2prB BB + BO nB = 38 nBB + nBO q2 + 2prAB AB nAB = 13 nAB 2pqO OO nO = 284 nOO r2

Total n n 1

Clarke et al. (1959) BMJ

p =?

q =?

r =?

Christiana Kartsonaki March 2, 2017 27 / 40

Page 37: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequencies

Available data are incomplete

nAA, nAO , nBB , nBO are unknown

Observed data: nO = (nA, nB , nAB , nO)

Unobserved data: nU = (nAA, nAO , nBB , nBO)

Complete data: nC = (nAA, nAO , nBB , nBO , nAB , nOO)

nAA + nAO = nA

nBB + nBO = nB

nO = nOO

Christiana Kartsonaki March 2, 2017 28 / 40

Page 38: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Example: ABO allele frequencies

EM algorithm

• Start from estimates of the allele frequencies and use them tocalculate the expected frequencies of all genotypes (step E),assuming Hardy–Weinberg equilibrium

• Use these ’hypothetical’/’augmented’ complete genotypicfrequencies to obtain new estimates of the allele frequencies, usingmaximum likelihood (step M)

Then use these new allele frequency estimates in a new E step, iteratinguntil the values converge.

Christiana Kartsonaki March 2, 2017 29 / 40

Page 39: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

History

Ceppellini et al. (1955) – estimation of gene frequencies

Dempster et al. (1977)

Christiana Kartsonaki March 2, 2017 30 / 40

Page 40: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Other applications

I Allele frequencies – recessive

I Haplotype frequencies

I Welch–Baum algorithm – Hidden Markov Models

I RNAseq

I Mixture densities

Christiana Kartsonaki March 2, 2017 31 / 40

Page 41: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Haplotypes

Co-dominant markers:

count number of chromosomes (e.g. 2N)count number of alleles (e.g. n)allele frequency → simple proportion (n/2N)

Unphased genotypes of individuals from some populations, where eachunphased genotype consists of unordered pairs of SNPs taken fromhomologous chromosomes of the individual

Haplotypes: contiguous blocks of SNPs inherited from a singlechromosome

Unknown phase

Haplotypes can’t always be counted directly – focusing on unambiguousgenotypes introduces bias

Excoffier and Slatkin (1995) Mol Biol EvolChristiana Kartsonaki March 2, 2017 32 / 40

Page 42: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Haplotypes

Assuming that each individual’s genotype is a combination of twohaplotypes (one maternal and one paternal), the goal of haplotypeinference is to determine a small set of haplotypes that best explain all ofthe unphased genotypes observed in the population

Observed data: unphased genotypes

Latent variables: assignments of unphased genotypes to pairs ofhaplotypes

Parameters: frequencies of each haplotype in the population

E step: using the current haplotype frequencies to estimate probabilitydistributions over phasing assignemnts for each unphased genotype

M step: using the expected phasing assignments to refine estimates ofhaplotype frequencies

Christiana Kartsonaki March 2, 2017 33 / 40

Page 43: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Haplotypes

Assuming that each individual’s genotype is a combination of twohaplotypes (one maternal and one paternal), the goal of haplotypeinference is to determine a small set of haplotypes that best explain all ofthe unphased genotypes observed in the population

Observed data: unphased genotypes

Latent variables: assignments of unphased genotypes to pairs ofhaplotypes

Parameters: frequencies of each haplotype in the population

E step: using the current haplotype frequencies to estimate probabilitydistributions over phasing assignemnts for each unphased genotype

M step: using the expected phasing assignments to refine estimates ofhaplotype frequencies

Christiana Kartsonaki March 2, 2017 33 / 40

Page 44: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

RNA-Seq

Challenges in transcript assembly and abundance estimation, arising fromthe ambiguous assignment of reads to isoforms

Christiana Kartsonaki March 2, 2017 34 / 40

Page 45: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Baum–Welch algorithm

→ estimate parameters of a Hidden Markov Model (HMM)

EM algorithm

e.g. Copy Number Variants (CNVs)

uses the forward-backward algorithm

HMM: joint probability of a collection of ’hidden’ (latent) andobserved discrete random variables

• ith hidden variable given the (i − 1)th hidden variable is independentof previous hidden variables

• current observed variables depend only on the current hidden state

Christiana Kartsonaki March 2, 2017 35 / 40

Page 46: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Clustering

E step of EM algorithm → assigns clusters to each data point based inits relative density under each mixture component

M step → recomputes the component density parameters based on thecurrent clusters

k mixture components, each with a Gaussian density

’soft’ version of k-means clustering

probabilistic (rather than deterministic) assignments of points to clustercenters

Christiana Kartsonaki March 2, 2017 36 / 40

Page 47: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Variational Bayes

Variational Bayes → family of techniques for approximating intractableintegrals arising in Bayesian inference and machine learning

Typically used in complex statistical models consisting of observed andlatent variables and unknown parameters, as might be described by agraphical model

To provide an approximation to the posterior probability of the unobservedvariables

Alternative to Monte Carlo sampling methods (particularly, Markov chain MonteCarlo methods such as Gibbs sampling) for a Bayesian approach for complexdistributions that are difficult to directly evaluate or sample from

To derive a lower bound for the marginal likelihood of the observed data

Used for model selection – higher marginal likelihood for a given model indicatesa better fit (thus greater probability that the model in question was the one thatgenerated the data).

Christiana Kartsonaki March 2, 2017 37 / 40

Page 48: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Variational Bayes

Can be seen as an extension of the EM algorithm from maximum aposteriori (MAP) estimation of the single most probable value of eachparameter to fully Bayesian estimation which computes (anapproximation to) the entire posterior distribution of the parameters andlatent variables.

It finds a set of optimal parameter values, and it has the same alternatingstructure as does EM, based on a set of equations that cannot be solvedanalytically.

Christiana Kartsonaki March 2, 2017 38 / 40

Page 49: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

Discussion

• The EM algorithm enables parameter estimation in probabilisticmodels with incomplete data.

• Can be used in cases where a complex problem can be formulated interms of a simple problem if the data were complete.If log likelihood of the data that would have been observed ifcomplete has an appreciably simpler functional form than that of thedata actually observed, then EM useful.

• Many variations.

• Standard errors not produced directly.

Christiana Kartsonaki March 2, 2017 39 / 40

Page 50: Expectation{maximisation algorithmusers.ox.ac.uk/~scro1407/slides_EM_algorithm_CGAT_2mar17.pdf · Introduction Fitting a parametric model Likelihood function of the (unknown) parameters

References

Ceppellini, R., Siniscalco, M. and Smith, C. A. B. (1955). The estimation of gene

frequencies in a random-mating population Annals of Human Genetics 20, 97–115.

Clarke, C. A., Price Evans, D. A., McConnell, R. B. and Sheppard, P. M. (1959). Secretion

of blood group antigens and peptic ulcer. British Medical Journal 1, 603–607.

Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall / CRC.

Davison, A. C. (2003). Statistical Models. Cambridge University Press.

Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from

incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.

Hastie, T., Tibshirani, R. and Friedman, J. (2009). The elements of statistical learning:Data Mining, Inference, and Prediction (Vol. 2, No. 1). New York: Springer.

Excoffier, L. and Slatkin, M. (1995). Maximum-likelihood estimation of molecular haplotype

frequencies in a diploid population. Molecular Biology and Evolution 12 (5), 921–927.

Oakes, D. (1999). Direct calculation of the information matrix via the EM algorithm. ournal

of the Royal Statistical Society, Series B, 61 (2), 479–482.

Christiana Kartsonaki March 2, 2017 40 / 40