hidden markov models · 2020. 4. 9. · hidden state we’ll distinguish between the observed parts...

33
INTRODUCTION TO MACHINE LEARNING HIDDEN MARKOV MODELS Mingon Kang, Ph.D. Department of Computer Science @ UNLV * The contents are adapted from Dr. Jean Gao at UT Arlington

Upload: others

Post on 07-Aug-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

INTRODUCTION TO MACHINE LEARNING

HIDDEN MARKOV MODELS

Mingon Kang, Ph.D.

Department of Computer Science @ UNLV

* The contents are adapted from Dr. Jean Gao at UT Arlington

Page 2: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

A Simple HMM

A

TC

GA

TC

G

given say a T in our input sequence, which state emitted it?

Page 3: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Hidden State

we’ll distinguish between the observed parts of a problem

and the hidden parts

in the Markov models we’ve considered previously, it is

clear which state accounts for each part of the observed

sequence

in the model above, there are multiple parts that could

account for each part of the observed sequence

this is the hidden part of the problem

Page 4: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Observed: Lab, Coffee, Lab, Coffee, Lab, Lab, Bar, Bar, Coffee, Bar, Coffee

Hidden: H , H , H , H , H , H , D , D , D , D , D

Page 5: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

The Parameters of an HMM

since we’ve decoupled states and characters, we might also

have emission probabilities

)|Pr()( kbxbe iik ===

)|Pr( 1 kla iikl === −

probability of emitting character b in state k

probability of a transition from state k to l

represents a path (sequence of states) through

the model

as in Markov chain models, we have transition probabilities

Page 6: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

A Simple HMM

0.8

)A(2e

1 3a

probability of emitting character A in state 2

probability of a transition from state 1 to state 3

0.4

A 0.4

C 0.1

G 0.2

T 0.3

A 0.1

C 0.4

G 0.4

T 0.1

A 0.2

C 0.3

G 0.3

T 0.2

begin end

0.5

0.5

0.2

0.8

0.6

0.1

0.90.2

0 5

4

3

2

1

A 0.4

C 0.1

G 0.1

T 0.4

Page 7: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Simple HMM for Gene FindingFigure from A. Krogh, An Introduction to Hidden Markov Models for Biological Sequences

GCTAC

AAAAA

TTTTT

CTACG

CTACA

CTACC

CTACT

start

A

T

C

G

A

T

C

G

A

T

C

Gstart

Page 8: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Three Important Questions

How likely is a given sequence?

the Forward algorithm

What is the most probable “path” for generating a given

sequence?

the Viterbi algorithm

How can we learn the HMM parameters given a set of

sequences?

the Forward-Backward (Baum-Welch) algorithm

Page 9: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

How Likely is a Given Sequence?

the probability that a CERTAIN path is taken

and the sequence is generated:

(assuming begin/end are the only silent states on path)

=

+=

L

i

iNL iiiaxeaxx

1

01 110)()...,...Pr(

Lxx ...1

N ...0

Page 10: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

How Likely Is A Given Sequence?

A 0.1

C 0.4

G 0.4

T 0.1

A 0.4

C 0.1

G 0.1

T 0.4

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

6.03.08.04.02.04.05.0

)C()A()A(),AACPr( 35313111101

=

= aeaeaea

A 0.4

C 0.1

G 0.2

T 0.3

A 0.2

C 0.3

G 0.3

T 0.2

Page 11: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

How Likely is a Given Sequence?

the probability over all paths is:

)...,...Pr( )...Pr( 011 =

NLL xxxx

• but the number of paths can be exponential in the length of

the sequence...

• the Forward algorithm enables us to compute this

efficiently

Page 12: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

How Likely is a Given Sequence: The

Forward Algorithm

define to be the probability of being in state k

having observed the first i characters of x

)(ifk

• we want to compute ,

the probability of being in the end state having observed all

of x can define this recursively

)...,()( ,2,1 LrN xxx PLf =

),,...,,()( 21 kxxxPif iirk ==

Page 13: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

The Forward Algorithm

because of the Markov property, don’t have to explicitly

enumerate every path – use dynamic programming instead

)(4 if )1( ),1( 42 −− ifif• e.g. compute using

A 0.4

C 0.1

G 0.2

T 0.3

A 0.1

C 0.4

G 0.4

T 0.1

A 0.4

C 0.1

G 0.1

T 0.4

A 0.2

C 0.3

G 0.3

T 0.2

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

Page 14: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

The Forward Algorithm

initialization:

1)0(0 =f

statessilent not are that for ,0)0( kfk =

probability that we’re in start state and

have observed 0 characters from the sequence

Page 15: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

The Forward Algorithm

=k

klkl aifif )()(

• recursion for silent states:

−=k

klkill aifxeif )1()()(

• recursion for emitting states (i =1…L):

Page 16: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

The Forward Algorithm

termination:

===k

kNkNL aLfLfxxx )()()...Pr()Pr( 1

probability that we’re in the end state and

have observed the entire sequence

Page 17: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Forward Algorithm Example

A 0.4

C 0.1

G 0.2

T 0.3

A 0.1

C 0.4

G 0.4

T 0.1

A 0.4

C 0.1

G 0.1

T 0.4

A 0.2

C 0.3

G 0.3

T 0.2

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

given the sequence x = TAGA

Page 18: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Forward Algorithm Example

1)0(0 =f 0)0( 0)0( 51 == ff

given the sequence x = TAGA

initialization

• computing other values

( ) 15.02.005.013.0

))0()0(()()1( 11101011

=+

=+= afafTef

...

( ) 2.08.005.014.0)1(2 =+=f

( )2.015.05.004.0

))1()1(()()2( 11101011

+

=+= afafAef

))4()4(()4()Pr( 4543535 afaffTAGA +==

Page 19: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Forward Algorithm Note

in some cases, we can make the algorithm more efficient by

taking into account the minimum number of steps that must

be taken to reach a state

begin end

0 5

4

3

2

1

• e.g. for this HMM, we don’t

need to initialize or compute

the values

)1( ,)0(

,)0( ,)0(

55

43

ff

ff

Page 20: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Three Important Questions

How likely is a given sequence?

What is the most probable “path” for generating a given

sequence?

How can we learn the HMM parameters given a set of

sequences?

Page 21: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Finding the Most Probable Path:

The Viterbi Algorithm

define to be the probability of the most probable

path accounting for the first i characters of x and ending

in state k

)(ivk

• we want to compute , the probability of the most

probable path accounting for all of the sequence and

ending in the end state

• can define recursively

• can use DP to find efficiently

)(LvN

)(LvN

),...,,;,,...,,Pr(max)( 21121,,...,, 121

iiik xxxkii

== −−

Page 22: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Finding the Most Probable Path:

The Viterbi Algorithm

initialization:

1)0(0 =v

statessilent not are that for ,0)0( kvk =

Page 23: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

The Viterbi Algorithm

recursion for emitting states (i =1…L):

klkk

ill aivxeiv )1(max)()( −=

klkk

l aiviv )(max)( =

• recursion for silent states:

klkk

l aivi )(maxarg)(ptr =

klkk

l aivi )1(maxarg)(ptr −= keep track of most

probable path

Page 24: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

The Viterbi Algorithm

traceback: follow pointers back starting at

• termination:

L

( )kNkk

aLv )( maxargL =

( )kNkk

aLvx )( max),Pr( =

Page 25: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Forward & Viterbi Algorithms

begin end

Forward/Viterbi algorithms effectively consider all possible

paths for a sequence

Forward to find probability of a sequence

Viterbi to find most probable path

consider a sequence of length 4…

Page 26: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Three Important Questions

How likely is a given sequence?

What is the most probable “path” for generating a given

sequence?

How can we learn the HMM parameters given a set of

sequences?

Page 27: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

The Learning Task

given:

a model

a set of sequences (the training set)

do:

find the most likely parameters to explain the training

sequences

the goal is to find a model that generalizes well to

sequences we haven’t seen before

Page 28: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Learning Parameters

if we know the state path for each training sequence, learning the model parameters is simple

no hidden state during training

count how often each parameter is used

normalize/smooth to get probabilities

process just like it was for Markov chain models

if we don’t know the path for each training sequence, how can we determine the counts?

key insight: estimate the counts by considering every path weighted by its probability

Page 29: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Learning without Hidden State

begin end

learning is simple if we know the correct path for each

sequence in our training set

• estimate parameters by counting the number of times each

parameter is used across the training set

Page 30: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Learning with Hidden State

if we don’t know the correct path for each sequence in our

training set, consider all possible paths for the sequence

• estimate parameters through a procedure that counts the

expected number of times each parameter is used across

the training set

begin end

Page 31: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

Learning Parameters:

The Baum-Welch Algorithm

a.k.a the Forward-Backward algorithm

an Expectation Maximization (EM) algorithm

EM is a family of algorithms for learning probabilistic

models in problems that involve hidden state

in this context, the hidden state is the path that best

explains each training sequence

Page 32: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

HMM for DNA sequence

Eddy, “Hidden Markov models”, 1996

Page 33: HIDDEN MARKOV MODELS · 2020. 4. 9. · Hidden State we’ll distinguish between the observed parts of a problem and the hidden parts in the Markov models we’ve considered previously,

HMM for modeling eukaryotic genes

Yoon, “Hidden Markov Models and their Applications in Biological Sequence Analysis”,

Current Genomics, 2009