expectation- maximization & baum-welchrshamir/algmb/presentations/em-bw-ron-16 .pdfbaum-welch:...

41
CG 1 Expectation- Maximization & Baum-Welch Slides: Roded Sharan, Jan 15; revised by Ron Shamir, Nov 15

Upload: others

Post on 27-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG 1

Expectation-Maximization & Baum-Welch

Slides: Roded Sharan, Jan 15; revised by Ron

Shamir, Nov 15

Page 2: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

• Input: incomplete data originating from a probability distribution with some unknown parameters

• Want to find the parameter values that maximize the likelihood

• EM – approach that helps when maximum likelihood solution cannot be directly computed.

• Seeks a local maximum by iteratively solving two easier subproblems

The goal

2

Page 3: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

• Coins A, B with unknown heads probs. A, B

• Goal: estimate A, B

• Experiment: Repeat x5: choose A or B with prob. 1/2, flip 10 times, record results.

Coin flipping: complete data

3

x=(x1,…x5) : no of H in set 1,…5

Y=(y1,…y5) : coin used in set 1,…5

Do & Batzoglou, NBT 08

Page 4: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

• Natural guess: i = fraction of H in flips of coin i

• This is actually the ML solution: maximizes P(x,y|) (ex.)

• What if we do not know which coin was used in each round?

Coin flipping: complete data

4

Page 5: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

• Now (y1,…y5) are hidden / latent variables.

• Cannot compute H prob for each coin

• If we guessed Y correctly – we could.

• Idea: guess initial 0A, 0

B – Use t

A, tB to compute the most likely coin for each

set, get new y

– Use the resulting y to recompute A, B using ML, get t+1

A, t+1B

– Repeat till convergence

• EM: use probabilities rather than the single most likely completion y

Coin flipping: incomplete data

5

Page 6: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

Coin flipping: incomplete data

6

Page 7: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

The probabilistic setting

Input: data X coming from a probabilistic model with hidden information y Goal: Learn the model’s parameters so that the likelihood is maximized.

Page 8: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

Mixture of two Gaussians

2 1

2

2

1( 1) ; ( 2) 1

( )1( | ) exp

22

j

i i

i

i i

P y P y p p

xP x y

p

j

Kalai et al. Disentangling

Gaussians CACM 2012

Our input generates the black distribution. We want to color each sample red/blue and find the parameters of the two distributions to maximize the data probability. (assume known)

1 1 2( , , )p

Page 9: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

The likelihood function

1 2 1

2

2

2

2

( 1) ; ( 2) 1

( )1( | ) exp

22

( ) ( | ) ( , | )

( )log ( ) log exp

22

i i

i j

i i

i i i

ji i

j i j

i j

P y p P y p p

xP x y j

L P x P x y j

p xL

To be continued…

Page 10: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

KL divergence

log(x)≤x-1 for all x>0 with equality iff x=1

( ) ( )( || ) ( ) log ( ) 1

( ) ( )

( ) ( ) ( ) 1 0

i iKL i i

i ii i

i i i

i i i

Q x Q xD P Q P x P x

P x P x

Q x P x Q x

( )( ) log| )

)( |

(

ii

i

L

i

KD P QP x

P xQ x

Def: The Kullback-Liebler divergence (aka relative entropy) of discrete probability distributions P and Q:

i: sum over x s.t P(x)>0

Q(x)=0 P(x)=0, 0log0=0

With equality iff PQ

Lemma: KL divergence is nonnegative

Page 11: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

The EM algorithm (i) Goal: max log P(x|θ)=log (Σ P(x,y|θ)) Strategy: guess an initial and iteratively adjust it, making sure that the likelihood always improves. Assume we have a model θt that we wish to improve to a new value. Bayes rule: P(x|θ) = P(x,y|θ) / P(y|x,θ) ( | , ) log ( | ) ( | , ) log ( , | ) ( | , ) log ( | , )

( | , ) log ( | ) ( | , ) log ( , | ) ( | , ) log ( | , )

log ( | ) ( | , ) log ( , | ) ( | , ) log ( | , )

t t t

t t t

y y y

t t

y y

P y x P x P y x P x y P y x P y x

P y x P x P y x P x y P y x P y x

P x P y x P x y P y x P y x

Take log and multiply both sides by ( | , )tP y x

Page 12: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

The EM algorithm (ii)

log ( | ) ( | , ) log ( , | ) ( | , ) log ( | , )

log ( | ) ( | , ) log ( , | ) ( | , ) log ( | , )

t t

y y

t t t t t

y y

P x P y x P x y P y x P y x

P x P y x P x y P y x P y x

Want ( | ) ( | )tP x P x

( | , ) = ( | ) ( | ) ( | , ) log

( | , )

tt t t t

y

P y xQ Q P y x

P y x

= logP(x| ) - logP(x| ) t Define

( | , ) log () , || )( tt

y

P y x P x yQ Define

( | ) ( | )t t tQ Q

KL Divergence 0

Page 13: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

The EM algorithm (iii) Main component:

log P(x,y|θ) is called the complete log likelihood function Q is the expectation of the complete log likelihood

over the distribution of y given the current parameters θt

The algorithm: repeat

• E-step: Calculate the Q function • M-step: Maximize Q(θ|θt) with respect to θ • Stopping criterion: improvement in log likelihood ≤ ε

( | ) ( | , ) log ( , | )t t

y

Q P y x P x y

Note: local optimum guaranteed to be reached, not global. Starting point matters! Try many..

Page 14: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG 14

Back to the Gaussian mixture model

( | ) ( | , ) log ( , | )t t

y

Q P y x P x y

( , | ) ( , | ) ( , | )

1

0

ijy

i i i i

j

i

i

i

i i

j

P x y P x y P x y j

y j

y jy

log ( , | ) log ( , | )

( | ) ( | , ) log ( , | )

( | , ) log ( , | )

ij i i

i j

t t

ij i i

y i j

t

ij i i

i j y

P x y y P x y j

Q P y x y P x y j

P y x y P x y j

Page 15: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG 15

Application (cont.)

( | ) ( 1| , ) log ( , | )

( , | ): ( 1| , )

( , | )

t t

ij i i i

i j

tt t i iij ij i t

i i

j

Q P y x P x y j

P x y jw P y x

P x y j

2

2

( )1( | ) log log log

22

i jt t

ij j

i j

xQ w p

Now write the derivatives and equate to zero to get the optimal parameters t+1 =( 1

t+1, 2 t+1, p1

t+1 )

Page 16: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

EM for HMM: The Baum-Welch algorithm

16

Page 17: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

17

Reminder: HMM

path =1,…,M

Given sequence X = (x1,…,xM):

• akl = P(i=l | i-1=k),

• ek(b) = P(xi=b | i=k)

Model=(, Q, )

P(X,) = a0,1·i=1…Lei(xi) ·ai,i+1

Goal: Finding path * maximizing P(X,)

Hidden states j

Observed output symbols xi

Markovian transition

prob. akl

Emission prob. ek(b)

Page 18: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

Max likelihood in HMM

• y=π, =( akl, ek(b) )

the log likelihood is

And the Q function is:

log ( | ) log ( , | )P x P x

( | ) ( | , ) log ( , | )t tQ P x P x

18

Page 19: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

Computing Q

( , ) ( )

1 1 1

( , | ) [ ( )]k kl

M M ME b A

k kl

k b k l

P x e b a

Emission probability, state k

character b

Transition probability, state k

to state l

Number of times we saw b from k in path π

Number of transitions from k to l in path π

Page 20: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

Computing Q (ii)

1 1 1

1 1 1

( | ) ( | , ) ( , ) log( ( )) ( ) log

( | , ) ( , ) log( ( )) ( | , ) ( ) log

M M Mt t

k k kl kl

k b k l

M M Mt t

k k kl kl

k b k l

Q P x E b e b A a

P x E b e b P x A a

( | , ) ( ) kl

t

klP x A A

(( | , ) ( , ) )k

t

kP x E b E b

value probability expectation value probability expectation

Page 21: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

•So we want to find a set of parameters θt+1 that maximizes:

•Ek(b), Akl can be computed using forward/backward:

•For maximization, select:

Computing Q (iii)

1 1 1

( ) log( ( )) logM M M

k k kl kl

k b k l

E b e b A a

'

( ) , ( )

( ')

ij kij k

ik k

k b

A E ba e b

A E b

P(i=k, i+1=l | x, t) = [1/P(x)]·fk(i)·akl·el(xi+1)·bl(i+1)

Akl = [1/P(x)]· i fk(i) · akl · el(xi+1) · bl(i+1) similarly, Ek(b) = [1/P(x)] · {i|xi=b}

fk(i) · bk(i)

fk(i) = P(x0,…,xi, i=k)

bk(i) = P(xi+1,…xL | i=k)

Page 22: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG

Maximize:

Baum-Welch: EM for HMM

1 1 1

( ) log( ( )) logM M M

k k kl kl

k b k l

E b e b A a

chosen

ij

'

( ) (denote as a ), ( )

( ')

ij kij k

ik k

k b

A E ba e b

A E b

always positive

'

1 1 1 ' 1 '

'

'

1 ' 1

log log

log

chosen chosenM M M Mkl kl kl

kl kkother otherk l k k lkl kk kl

k

chosenM Mchosen kl

kk kl otherk k l kl

a A aA A

a A a

aA a

a

Difference between chosen set and some other:

Multiply and divide by same factor

Page 23: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG 23

Summary: Parameter Estimation in HMM When States are Unknown

Input: X1,…,Xn indep training sequences Baum-Welch alg. (1972): Expectation: • compute expected no. of kl state transitions: P(i=k, i+1=l | X, ) = [1/P(x)]·fk(i)·akl·el(xi+1)·bl(i+1) Akl = j[1/P(X

j)] · i fkj(i) · akl · el(x

ji+1) · bl

j(i+1) • compute expected no. of symbol b appearances in state k Ek(b) = j[1/P(X

j)] · {i|xji=b} fk

j(i) · bkj(i) (ex.)

Maximization: • re-compute new parameters from A, E using max.

likelihood.

repeat (1)+(2) until improvement

Page 24: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

CG 24 Lloyd Welch, USC Electrical Engineering Leonard Baum, many years after the IDA

Page 25: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

de novo motif discovery using EM

Slides sources:

Chaim Linhart, Danit Wider, Katherina Kechris

25

Page 26: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

Transcription Factors

• A transcription factor (TF) is a protein that regulates a gene by binding to a binding site (BS) in its vicinity, specific to the TF. • Binding sites vary in their sequences. Their sequence pattern is called a motif.

26

Page 27: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

Motif profile

a G g t a c T t C c A t a c g t Alignment a c g t T A g t a c g t C c A t C c g t a c g G

_________________ A 3 0 1 0 3 1 1 0 Profile C 2 4 0 0 1 4 0 0 G 0 1 4 0 0 0 3 1 T 0 0 0 5 1 0 1 4

_________________ Consensus A C G T A C G T

• Line up the patterns by their start indexes

s = (s1, s2, …, st)

• Construct matrix profile with frequencies of each nucleotide in columns

Motif finding: Given a set of co-regulated genes,

find a recurrent motif in their promoter regions. 27

Page 28: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

An example: Implanting Motif AAAAAAAGGGGGGG

atgaccgggatactgatAAAAAAAAGGGGGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataAAAAAAAAGGGGGGGa tgagtatccctgggatgacttAAAAAAAAGGGGGGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgAAAAAAAAGGGGGGGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAAAAAAAAGGGGGGGcttatag gtcaatcatgttcttgtgaatggatttAAAAAAAAGGGGGGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtAAAAAAAAGGGGGGGcaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttAAAAAAAAGGGGGGGctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatAAAAAAAAGGGGGGGaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttAAAAAAAAGGGGGGGa

28

Page 29: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

Where is the Implanted Motif? (*)

atgaccgggatactgataaaaaaaagggggggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataaaaaaaaaggggggga tgagtatccctgggatgacttaaaaaaaagggggggtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgaaaaaaaagggggggtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaataaaaaaaagggggggcttatag gtcaatcatgttcttgtgaatggatttaaaaaaaaggggggggaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtaaaaaaaagggggggcaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttaaaaaaaagggggggctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcataaaaaaaagggggggaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttaaaaaaaaggggggga

29

Page 30: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

Implanting Motif AAAAAAGGGGGGG with Four Mutations

atgaccgggatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacAAtAAAAcGGcGGGa tgagtatccctgggatgacttAAAAtAAtGGaGtGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgcAAAAAAAGGGattGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAtAAtAAAGGaaGGGcttatag gtcaatcatgttcttgtgaatggatttAAcAAtAAGGGctGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtAtAAAcAAGGaGGGccaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttAAAAAAtAGGGaGccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatActAAAAAGGaGcGGaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttActAAAAAGGaGcGGa

30

Page 31: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

Where is the Motif???

atgaccgggatactgatagaagaaaggttgggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacaataaaacggcggga tgagtatccctgggatgacttaaaataatggagtggtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgcaaaaaaagggattgtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatataataaaggaagggcttatag gtcaatcatgttcttgtgaatggatttaacaataagggctgggaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtataaacaaggagggccaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttaaaaaatagggagccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatactaaaaaggagcggaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttactaaaaaggagcgga

31

Page 32: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

MEME Multiple EM for Motif Elicitation

[Bailey, Elkan ISMB ’94]

Goal: Given a set of sequences, find a motif

(PWM) that maximizes the expected

likelihood of the data

Technique: EM (Expectation Maximization)

(based on [Lawrence, Reilly ’90])

32

Page 33: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

The Mixture Model Data: X = (X1,…,Xn) :

all (overlapping) l-mers in the input sequences

Assume Xi’s were generated by a two-component mixture model - θ = (θ1 , θ2 ) :

Model #1: θ1 = motif model:

fi,b = prob. of base b at pos i in motif, 1 ≤ i ≤ l

Model #2: θ2 = background (BG) model:

f0,b = prob. of base b

Mixing parameter: λ = (λ1 , λ2 )

λj = prob. that model #j is used (λ1+λ2=1)

Assume independence between l-mers 33

Page 34: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

Log Likelihood Missing data: Z = (Z1,…,Zn) :

Zi = (Zi1, Zi2); Zij = 1 if Xi from model #j ; 0 o/w

Complete Likelihood of model given data:

L (θ, λ | X, Z) = p (X, Z | θ, λ)

= Πi=1…n p (Xi, Zi | θ, λ)

p (Xi, Zi | θ, λ) = p (Xi | Zi ,θ, λ) p (Zi) =

= λ1 p(Xi|θ1) if Zi1=1; λ2 p(Xi|θ2) if Zi2=1

log L = Σi=1…n Σj=1,2 Zij log (λj p(Xi|θj))

34

Page 35: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

MEME: Algorithm

Goal: Maximize E[log L]

Outline of EM algorithm:

• Choose starting θ, λ

• Repeat until convergence of θ:

– E-step: Re-estimate Z from θ, λ, X

– M-step: Re-estimate θ, λ from X, Z

• Repeat all of the above for various θ, λ … 35

Page 36: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

E-step

Compute expectation of log L over Z:

E[log L] = Σi=1…n Σj=1,2 Z’ij log (λj p(Xi|θj))

where:

Z’ij = p(Zij=1| θ,λ,Xi) =

= p(Zij=1, Xi| θ,λ) / p(Xi| θ,λ) =

= p(Zij=1, Xi| θ,λ) / Σk=1,2 p(Zik=1, Xi| θ,λ) =

= λj p(Xi|θj) / Σk=1,2 λk p(Xi|θk)

36

Page 37: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

M-step

Find θ,λ that maximize E[log L]=Q(,| t ,t):

E[log L] = Σi=1…n Σj=1,2 Z’ij log (λj p(Xi|θj))

Finding λ:

Suffices to maximize L1= Σi=1…n Σj=1,2 Z’ij log λj

λ1+λ2=1 L1= Σi=1…n (Z’i1 log λ1 + Z’i2 log (1-λ1))

dL1/dλ1 = Σi=1…n (Z’i1 / λ1 – Z’i2 / (1-λ1))

37

Page 38: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

MEME: Algorithm

M-step (cont.):

dL1/dλ1 = Σi=1…n (Z’i1 / λ1 – Z’i2 / (1-λ1)) = 0

λ1 Σi=1…n Z’i2 = (1-λ1) Σi=1…n Z’i1

λ1 ( Σi=1…n (Z’i1+Z’i2) ) = Σi=1…n Z’i1

λ1 = ( Σi=1…n Z’i1 ) / n

λ2 = 1- λ1 = ( Σi=1…n Z’i2 ) / n

Finding θ: … 38

Page 39: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

39

Page 40: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

Tim Bailey, Charles Elkan

• Senior Research Fellow Institute for Molecular Bioscience , University of Queensland, Brisbane, Australia

• Professor Department of Computer Science and Engineering University of California, San Diego

40

Page 41: Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

GE © Ron Shamir

FIN

41