hidden markov models outline - people.cs.pitt.edupakdaman/tutorials/hmm_tutorial.pdf · hidden...

25
11/1/2007 1 Hidden Markov Models & Its application in Bioinformatics Bioinformatics Group, Electrical and computer Department University of Tehran, 2008 By: Mahdi Pakdaman 2 Pakdaman@ gmail.com 1387/7/29 1387/7/29 Markov models Hidden Markov models Definition Three basic problems Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Issues Applications in Bioinformatics Hidden Markov Models Hidden Markov Models Outline Outline

Upload: vanminh

Post on 13-Mar-2019

235 views

Category:

Documents


0 download

TRANSCRIPT

11/1/2007

1

Hidden Markov Models &Its application in Bioinformatics

Bioinformatics Group,Electrical and computer Department

University of Tehran, 2008

By: Mahdi Pakdaman

2Pakdaman@ gmail.com 1387/7/291387/7/29

Markov modelsHidden Markov models

DefinitionThree basic problems

Forward/Backward algorithmViterbi algorithmBaum-Welch estimation algorithmIssuesApplications in Bioinformatics

Hidden Markov ModelsHidden Markov ModelsOutlineOutline

11/1/2007

2

3Pakdaman@ gmail.com 1387/7/291387/7/29

Markov Models Markov Models

4Pakdaman@ gmail.com 1387/7/291387/7/29

Markov Model ExampleMarkov Model Example

Weather model: 3 states {rainy, cloudy, sunny}

Problem: Forecast weather state, based on the current weather state

11/1/2007

3

5Pakdaman@ gmail.com 1387/7/291387/7/29

Observable states: S1 ,S2 ,…,SN Below we will designate them simply as

1,2,…,N

Actual state at time t is qt.

q1 ,q2 ,…,qt ,…,qT

First order Markov assumption:

Stationary condition:

Observable Markov Models Observable Markov Models

1 2 1( | , ,...) ( | )t t t t tP q j q i q k P q j q i− − −= = = = = =

1 1( | ) ( | )t t t l t lP q j q i P q j q i− + + −= = = = =

6Pakdaman@ gmail.com 1387/7/291387/7/29

State transition matrix A:

where

Constraints on :

Markov ModelsMarkov Models

11 12 1 1

21 22 2 2

1 2

1 2

... ...

... ...

... ...

... ...

j N

j N

i i ij iN

N N Nj NN

a a a aa a a a

a a a a

a a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥

= ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

AM M M M M M

M M M M M M

1( | ) 1 ,ij t ta P q j q i i j N−= = = ≤ ≤

ija

1

0, ,

1,

ij

N

ijj

a i j

a i=

≥ ∀

= ∀∑

11/1/2007

4

7Pakdaman@ gmail.com 1387/7/291387/7/29

States:1. Rainy (R)2. Cloudy (C)3. Sunny (S)

State transition probability matrix:

Compute the probability of observing SSRRSCS given that today is S.

Markov Models: ExampleMarkov Models: Example

0.4 0.3 0.3⎡ ⎤⎢ ⎥= 0.2 0.6 0.2⎢ ⎥⎢ ⎥0.1 0.1 0.8⎣ ⎦

A

8Pakdaman@ gmail.com 1387/7/291387/7/29

Basic conditional probability rule:

The Markov chain rule:

Markov Models: ExampleMarkov Models: Example

( , ) ( | ) ( )P A B P A B P B=

1 2

1 2 1 1 2 1

1 1 2 1

1 1 2 1 2 2

1 1 2 2 1 1

( , ,..., )( | , ,..., ) ( , ,..., )( | ) ( , ,..., )( | ) ( | ) ( , ,..., )( | ) ( | )... ( | ) ( )

T

T T T

T T T

T T T T T

T T T T

P q q qP q q q q P q q qP q q P q q qP q q P q q P q q qP q q P q q P q q P q

− −

− −

− − − −

− − −

= = = =

11/1/2007

5

9Pakdaman@ gmail.com 1387/7/291387/7/29

Observation sequence O:

Using the chain rule we get:

Markov Models: ExampleMarkov Models: Example

33 33 31 11 13 32 23

2 4

( | model)( , , , , , , , | model)( ) ( | ) ( | ) ( | ) ( | )

( | ) ( | ) ( | )

(1)(0.8) (0.1)(0.4)(0.3)(0.1)(0.2) 1.536 10

P OP S S S R R S C SP S P S S P S S P R S P R R

P S R P C S P S Ca a a a a a aπ

= = × =

= = ×

( , , , , , , , )O S S S R R S C S=

10Pakdaman@ gmail.com 1387/7/291387/7/29

What is the probability that the sequence remains in state ifor exactly d time units?

Exponential Markov chain duration density.

What is the expected value of the duration d in state i?

Markov Models: ExampleMarkov Models: Example

1 2 1

1

( ) ( , ,..., , ,...)

( ) (1 )i d d

di ii ii

p d P q i q i q i q i

a aπ+

= = = = ≠

= −

1

1

1

( )

( ) (1 )

i id

dii ii

d

d dp d

d a a

=

∞−

=

=

= −

11/1/2007

6

11Pakdaman@ gmail.com 1387/7/291387/7/29

1

1

1

(1 ) ( )

(1 ) ( )

(1 )1

11

dii ii

d

dii ii

dii

iiii

ii ii

ii

a d a

a aa

aaa a

a

∞−

=

=

= −

∂ = −

⎛ ⎞∂ = − ⎜ ⎟∂ −⎝ ⎠

=−

Markov Models: ExampleMarkov Models: Example

12Pakdaman@ gmail.com 1387/7/291387/7/29

Markov Models: ExampleMarkov Models: Example

Avg. number of consecutive sunny days =

Avg. number of consecutive cloudy days = 2.5

Avg. number of consecutive rainy days = 1.67

33

1 1 51 1 0.8a

= =− −

11/1/2007

7

13Pakdaman@ gmail.com 1387/7/291387/7/29

States are not observable

Observations are probabilistic functions of state

State transitions are still probabilistic

Hidden Markov ModelsHidden Markov Models

14Pakdaman@ gmail.com 1387/7/291387/7/29

HMM: an exampleHMM: an example

Weather model: 3 “hidden” states

{rainy, cloudy, sunny}

Measure weather-related variables (e.g. temperature, humidity, barometric

pressure)Problem:

Forecast the weather state, given the current weather variables.

11/1/2007

8

15Pakdaman@ gmail.com 1387/7/291387/7/29

N urns containing colored balls

M distinct colors of balls

Each urn has a (possibly) different distribution of colors

Sequence generation algorithm:1. Pick initial urn according to some random process.

2. Randomly pick a ball from the urn and then replace it

3. Select another urn according a random selection process associated with the urn

4. Repeat steps 2 and 3

Urn and Ball ModelUrn and Ball Model

16Pakdaman@ gmail.com 1387/7/291387/7/29

N – the number of hidden statesQ – set of states Q={1,2,…,N}M – the number of symbolsV – set of symbols V ={1,2,…,M}A – the state-transition probability matrix

B – Observation probability distribution:

π - the initial state distribution:

λ – the entire model

Elements of Hidden Elements of Hidden Markov ModelsMarkov Models

, 1( | ) 1 ,i j t ta P q j q i i j N+= = = ≤ ≤

( ) ( | ) 1 ,j t k tB k P o v q j i j M= = = ≤ ≤

1( ) 1i P q i i Nπ = = ≤ ≤

( , , )A Bλ π=

11/1/2007

9

17Pakdaman@ gmail.com 1387/7/291387/7/29

1. EVALUATION – given observation O=(o1 , o2 ,…,oT )and model , efficiently compute

Hidden states complicate the evaluation

Given two models λ1 and λ1, this can be used to choose the better one.

2. DECODING - given observation O=(o1 , o2 ,…,oT ) and model λ find the optimal state sequence q=(q1 , q2 ,…,qT ) .

Optimality criterion has to be decided (e.g. maximum likelihood)

3. LEARNING – given O=(o1 , o2 ,…,oT ), estimate model parameters that maximize

Three Basic ProblemsThree Basic Problems

( , , )A Bλ π= ( | ).P O λ

( , , )A Bλ π= ( | ).P O λ

18Pakdaman@ gmail.com 1387/7/291387/7/29

Problem: Compute P(o1 , o2 ,…,oT |λ).

Algorithm:Let q=(q1 , q2 ,…,qT ) be a state sequence.

Assume the observations are independent:

Probability of a particular state sequence is:

Also,

Solution to Problem 1Solution to Problem 1

1

1 1 2 2

( | , ) ( | , )

( ) ( )... ( )

T

t ti

q q qT T

P O q P o q

b o b o b o

λ λ=

=

=

1 1 2 2 3 1( | ) ...q q q q q qT qTP q a a aλ π −==

( , | ) ( | , ) ( | )P O q P O q P qλ λ λ=

11/1/2007

10

19Pakdaman@ gmail.com 1387/7/291387/7/29

Enumerate paths and sum probabilities:

N T state sequences and O(T) calculations.

Complexity: O(T N T) calculations.

Solution to Problem 1Solution to Problem 1

( | ) ( | , ) ( | )q

P O P O q P qλ λ λ= ∑

20Pakdaman@ gmail.com 1387/7/291387/7/29

Forward Procedure: Forward Procedure: IntuitionIntuition

ST

AT

ES

N

3

2

1

v

aNk

t t+1

TIME

a3k

a1k

k

11/1/2007

11

21Pakdaman@ gmail.com 1387/7/291387/7/29

Define forward variable as:

is the probability of observing the partial sequence

such that the state qt is i. Induction:

1. Initialization:

2. Induction:

3. Termination:

4. Complexity:

Forward AlgorithmForward Algorithm

( )t iα1 1( ) ( , ,..., , | )t t ti P o o o q iα λ= =

1 1( , ,..., )to o o

1 1( ) ( )i ii b oα π=

1 11

( ) ( ) ( )N

t t ij j ti

j i a b oα α+ +=

⎡ ⎤= ⎢ ⎥

⎣ ⎦∑

1( | ) ( )

N

Ti

P O iλ α=

= ∑2( )O N T

( )t iα

22Pakdaman@ gmail.com 1387/7/291387/7/29

Consider the following coin-tossing experiment:

– state-transition probabilities equal to 1/3

– initial state probabilities equal to 1/3

ExampleExample

0.250.75

0.750.25

0.50.5

P(H)P(T)

State3State2State 1

11/1/2007

12

23Pakdaman@ gmail.com 1387/7/291387/7/29

1. You observe O=(H,H,H,H,T,H,T,T,T,T). What state

sequence q, is most likely? What is the joint probability,

, of the observation sequence and the state sequence?

2. What is the probability that the observation sequence came entirely of state 1?

3. Consider the observation sequence

4. How would your answers to parts 1 and 2 change?

ExampleExample

( , | )P O q λ

( , , , , , , , , , ). O H T T H T H H T T H=

24Pakdaman@ gmail.com 1387/7/291387/7/29

4. If the state transition probabilities were:

How would the new model λ’ change your answers to parts 1-3?

ExampleExample

0.9 0.45' 0.05 0.1

0.05 0.45A

0.45⎡ ⎤⎢ ⎥= 0.45⎢ ⎥⎢ ⎥ 0.1⎣ ⎦

11/1/2007

13

25Pakdaman@ gmail.com 1387/7/291387/7/29

Backward AlgorithmBackward Algorithm

STA

TES

N

4

3

2

1

OBSERVATION

1 2 t-1 t t+1 t+2 T-1 TTIME

o1 o2 ot-1 ot ot+1 ot+2 oT-1 oT

26Pakdaman@ gmail.com 1387/7/291387/7/29

Define backward variable as:

is the probability of observing the partial sequence

such that the state qt is i.

Induction:

1. Initialization:

2. Induction:

Backward AlgorithmBackward Algorithm

( )t iβ

( )t iβ1 2( ) ( , ,..., | , )t t t T ti P o o o q iβ λ+ += =

1 2( , ,..., )t t To o o+ +

( ) 1T iβ =

1 11

( ) ( ) ( ), 1 , 1,...,1N

t ij j t tj

i a b o j i N t Tβ β+ +=

= ≤ ≤ = −∑

11/1/2007

14

27Pakdaman@ gmail.com 1387/7/291387/7/29

Solution to Problem 2 (1)Solution to Problem 2 (1)

Difficulty lies in declaring optimality condition

Choose the state qt which is individually most likely so maximize the expected number of correct individual states

28Pakdaman@ gmail.com 1387/7/291387/7/29

Solution to Problem 2 (1)Solution to Problem 2 (1)

Define the probability of being in state Si at time t, given O and λ

),|()( λγ OiqPi tt ==

∑=

== N

itt

ttttt

ii

iiOP

iii

1

)().(

)().()|()().()(

βα

βαλ

βαγ

11/1/2007

15

29Pakdaman@ gmail.com 1387/7/291387/7/29

Solution to Problem 2 (1)Solution to Problem 2 (1)

TtiMaxq tNi

t ≤≤=≤≤

1)]([arg1

γ

Although maximize the expected number of correct states there could be some problems with the resulting state sequence!

30Pakdaman@ gmail.com 1387/7/291387/7/29

Choose the most likely path

Find the path (q1 , q2 ,…,qT ) that maximizes the likelihood:

Solution by Dynamic Programming

Define

is the best score (highest Prob.) along a single path accounts for the first t observations and ends in state Si

By induction we have:

Solution to Problem 2 (2)Solution to Problem 2 (2)

1 2 11 2 1 1, ,...,

( ) max ( , ,..., , , ,..., | )t

t t tq q qi P q q q i o o oδ λ

= =

1 2( , ,..., | , )TP q q q O λ

( )t iδ

1 1( ) max[ ( ) ] ( )t t ij j tij i a b oδ δ+ += ⋅

11/1/2007

16

31Pakdaman@ gmail.com 1387/7/291387/7/29

ViterbiViterbi AlgorithmAlgorithm

STA

TES

N

4

3

2

1

OBSERVATION

1 2 t-1 t t+1 t+2 T-1 TTIME

o1 o2 ot-1 ot ot+1 ot+2 oT-1 oT

aNk

k

a1k

32Pakdaman@ gmail.com 1387/7/291387/7/29

Initialization:

Recursion:

Termination:

Path (state sequence) backtracking:

Viterbi AlgorithmViterbi Algorithm

1 1

1

( ) ( ), 1( ) 0

i ii b o i Ni

δ πψ

= ≤ ≤=

11

11

( ) max[ ( ) ] ( )

( ) arg max[ ( ) ]

, 1

t t ij j ti N

t t iji N

j i a b o

j i a

t T j N

δ δ

ψ δ−≤ ≤

−≤ ≤

=

=

2 ≤ ≤ ≤ ≤

*

1

*

1

m ax[ ( )]

arg m ax[ ( )]

T Ti N

T Ti N

P i

q i

δ

δ≤ ≤

≤ ≤

=

=

* *1 1 1, 2, ...,1t t tq q t T Tψ + += ( ), = − −

11/1/2007

17

33Pakdaman@ gmail.com 1387/7/291387/7/29

Estimate to maximize No analytic method because of complexity – iterative solution.Baum-Welch Algorithm (actually EM algorithm) :

1. Let initial model be λ02. Compute new λ based on λ0 and observation O.3. Ιf 4. Else set λ0 λ and go to step 2

Solution to Problem 3Solution to Problem 3

( , , )A Bλ π= ( | )P O λ

0log ( | ) log ( | ) stopP O P O DELTAλ λ− <

34Pakdaman@ gmail.com 1387/7/291387/7/29

BaumBaum--Welch: PreliminariesWelch: Preliminaries

1( , ) ( , | , )t t ti j P q i q j Oξ λ+= = =

11/1/2007

18

35Pakdaman@ gmail.com 1387/7/291387/7/29

Define as the probability of being in state i at time t, and in state j at time t+1.

Define as the probability of being in state i at time t, given the observation sequence

BaumBaum--Welch: PreliminariesWelch: Preliminaries

( , )i jξ

1 1

1 1

1 11 1

( ) ( ) ( )( , )

( | )( ) ( ) ( )

( ) ( ) ( )

t ij j t t

t ij j t tN N

t ij j t ti j

i a b o ji j

P Oi a b o j

i a b o j

α βξ

λα β

α β

+ +

+ +

+ += =

=

=∑ ∑

( )t iγ

1

( ) ( , )N

t tj

i i jγ ξ=

= ∑

36Pakdaman@ gmail.com 1387/7/291387/7/29

is the expected number of times state i is visited.

is the expected number of transitions from state ito state j.

BaumBaum--Welch: PreliminariesWelch: Preliminaries

1( )T

ttiγ

=∑1

1( , )T

tti jξ−

=∑

11/1/2007

19

37Pakdaman@ gmail.com 1387/7/291387/7/29

expected frequency in state i at time (t=1) (expected number of transitions from state i to state j) / expected number of transitions from state i):

(expected number of times in state j and observing symbol k) / (expected number of times in state j ):

BaumBaum--Welch: Update RulesWelch: Update Rules_

iπ = 1( )iγ=_

ija =

_ ( , )( )

tij

t

i ja

iξγ

= ∑∑_

( )ib k =

=

=−== T

tt

T

tt

j

j

i

kb kvtOts

1

1

)(

)(

)( ..

γ

γ

38Pakdaman@ gmail.com 1387/7/291387/7/29

Some issuesSome issues

Limitations imposed byMarkov chain

ScalabilityLearning

InitialisationModel orderLocal maximaWeighting training sequences

11/1/2007

20

39Pakdaman@ gmail.com 1387/7/291387/7/29

HMM ApplicationsHMM Applications

Classification (e.g., Profile HMMs)Build an HMM for each class (profile HMMs)Classify a sequence using Bayes rule

Multiple sequence alignmentBuild an HMM based on a set of sequencesDecode each sequence to find a multiple alignment

Segmentation (e.g., gene finding)Use different states to model different regionsDecode a sequence to reveal the region boundaries

40Pakdaman@ gmail.com 1387/7/291387/7/29

HMMsHMMs for Classificationfor Classification

1{ ,..., }( | ) ( )( | )

( )* arg max ( | ) ( )

k

C

C C Cp X C p Cp C X

p XC p X C p C

=

=

p(X|C) is modeled by a profile HMM built specifically for C

Assuming example sequences are available for C

E.g., Protein families

Assign a family to X

11/1/2007

21

41Pakdaman@ gmail.com 1387/7/291387/7/29

HMMsHMMs for Motif Findingfor Motif Finding

Given a set of sequences S={X1, …,Xk}Design an HMM with two kinds of states

Background states: For outside a motifMotif states: For modeling a motif

Train the HMM, e.g., using Baum-Welch (finding the HMM that maximizes the probability of S)The “motif part” of the HMM gives a motif model (e.g., a PWM) The HMM can be used to scan any sequence (including Xi) to figure out where the motif is. We may also decode each sequence Xi to obtain a set of subsequences matched by the motif (e.g., a multiset of k-mers)

42Pakdaman@ gmail.com 1387/7/291387/7/29

HMMsHMMs for Multiple Alignmentfor Multiple Alignment

Given a set of sequences S={X1, …,Xk}Train an HMM, e.g., using Baum-Welch (finding the HMM that maximizes the probability of S)Decode each sequence XiAssemble the Viterbi paths to form a multiple alignment

The symbols belonging to the same state will be aligned to each other

11/1/2007

22

43Pakdaman@ gmail.com 1387/7/291387/7/29

HMMHMM--based Gene Findingbased Gene Finding

Design two types of states “Within Gene” States“Outside Gene” States

Use known genes to estimate the HMMDecode a new sequence to reveal which part is a geneExample software:

GENSCAN (Burge 1997)FGENESH (Solovyev 1997)HMMgene (Krogh 1997)GENIE (Kulp 1996)GENMARK (Borodovsky & McIninch 1993)VEIL (Henderson, Salzberg, & Fasman 1997)

44Pakdaman@ gmail.com 1387/7/291387/7/29

VEIL: Viterbi VEIL: Viterbi ExonExon--IntronIntronLocatorLocator

Exon HMM ModelUpstream

Start Codon

Exon

Stop Codon

Downstream

3’ Splice Site

Intron

5’ Poly-A Site

5’ Splice Site

• Enter: start codon or intron (3’ Splice Site)

• Exit: 5’ Splice site or three stop codons(taa, tag, tga)

VEIL Architecture

11/1/2007

23

45Pakdaman@ gmail.com 1387/7/291387/7/29

Solutions to the Local Maxima Solutions to the Local Maxima ProblemProblem

Repeat with different initializationsStart with the most reasonable initial modelSimulated annealing (slow down the convergence speed)

46Pakdaman@ gmail.com 1387/7/291387/7/29

Local Maxima: IllustrationLocal Maxima: Illustration

Global maximaLocal maxima

Good starting pointBad starting point

11/1/2007

24

47Pakdaman@ gmail.com 1387/7/291387/7/29

Optimal Model ConstructionOptimal Model Construction

( | ) ( )( | )( )

* arg max ( | )arg max ( | ) ( )

HMM

HMM

p X HMM p HMMp HMM Xp X

HMM p HMM Xp X HMM p HMM

=

==

Bayesian model selection: -P(HMM) should prefer simpler models

(i.e., more constrained, fewer states, fewer transitions)-P(HMM) could reflect our prior on the parameters

48Pakdaman@ gmail.com 1387/7/291387/7/29

Sequence WeightingSequence Weighting

Avoid over-counting similar sequences from the same organismsTypically compute a weight for a sequence based on an evolutionary treeMany ways to incorporate the weights, e.g.,

Unequal likelihoodUnequal weight contribution in parameter estimation

11/1/2007

25

49Pakdaman@ gmail.com 1387/7/291387/7/29

Toolkits for HMMToolkits for HMM

Hidden Markov Model Toolkit (HTK)http://htk.eng.cam.ac.uk/Hidden Markov Model (HMM) Toolbox for Matlabhttp://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.htmlTraining HMM for ASRhttp://cslu.cse.ogi.edu/tutordemos/nnet_training/tutorial.html#1.1_Setup

50Pakdaman@ gmail.com 1387/7/291387/7/29

L. Rabiner and B. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, p. 4--16, Jan. 1986.

Further ReadingFurther Reading