computational genomics lecture 10 hidden markov models (hmms) © ydo wexler & dan geiger...

31
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

Upload: mark-morgan

Post on 18-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

.

Computational Genomics Lecture 10

Hidden Markov Models (HMMs)

© Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU)

Modified by Benny Chor (TAU)

Page 2: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

2

Outline Finite, or Discrete, Markov Models Hidden Markov Models Three major questions: Q1.: Computing the probability of a given

observation. A1.: Forward – Backward (Baum Welch) dynamic

programming algorithm. Q2.: Computing the most probable sequence, given

an observation.

A2.: Viterbi’s dynamic programming Algorithm Q3.: Learn best model, given an observation,.

A3.: Expectation Maximization (EM): A Heuristic.

Page 3: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

3

Markov Models A discrete (finite) system:

N distinct states. Begins (at time t=1) in some initial state(s). At each time step (t=1,2,…) the system moves

from current to next state (possibly the same as

the current state) according to transition

probabilities associated with current state. This kind of system is called a finite, or discrete

Markov model

After Andrei Andreyevich Markov (1856 -1922)

Page 4: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

4

Outline

Markov Chains (Markov Models) Hidden Markov Chains (HMMs) Algorithmic Questions Biological Relevance

Page 5: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

5

Discrete Markov Model: Example Discrete Markov Model

with 5 states. Each aij represents the

probability of moving from state i to state j

The aij are given in a matrix A = {aij}

The probability to start in a given state i is i ,

The vector repre-sents these startprobabilities.

Page 6: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

6

Markov Property

• Markov Property: The state of the system at time t+1

depends only on the state of the system at time t

Xt=1 Xt=2 Xt=3 Xt=4 Xt=5

] x X | x P[X

] x X , x X , . . . , x X , x X | x P[X

tt11t

00111-t1-ttt11t

t

t

Page 7: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

7

Markov Chains

Stationarity Assumption

Probabilities independent of t when process is “stationary”

So,

This means that if system is in state i, the probability that

the system will next move to state j is pij , no matter what

the value of t is

t 1 j t i ijfor all t, P[X x |X x ] p

Page 8: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

8

• raining today rain tomorrow prr = 0.4

• raining today no rain tomorrow prn = 0.6

• no raining today rain tomorrow pnr = 0.2

• no raining today no rain tomorrow prr = 0.8

Simple Minded Weather Example

Page 9: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

9

Simple Minded Weather Example

Transition matrix for our example

• Note that rows sum to 1

• Such a matrix is called a Stochastic Matrix

• If the rows of a matrix and the columns of a matrix all

sum to 1, we have a Doubly Stochastic Matrix

8.02.0

6.04.0P

Page 10: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

10

Coke vs. Pepsi

Given that a person’s last cola purchase was Coke ™, there is a 90% chance that her next cola purchase will also be Coke ™.

If that person’s last cola purchase was Pepsi™, there is an 80% chance that her next cola purchase will also be Pepsi™.

coke pepsi

0.10.9 0.8

0.2

Page 11: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

11

Coke vs. PepsiGiven that a person is currently a Pepsi purchaser, what is the probability that she will purchase Coke two purchases from now?

66.034.0

17.083.0

8.02.0

1.09.0

8.02.0

1.09.02P

8.02.0

1.09.0P

The transition matrices are: (corresponding to

one purchase ahead)

Page 12: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

12

Coke vs. Pepsi

Given that a person is currently a Coke drinker, what is the probability that she will purchase Pepsi three purchases from now?

562.0438.0

219.0781.0

66.034.0

17.083.0

8.02.0

1.09.03P

Page 13: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

13

Coke vs. Pepsi

Assume each person makes one cola purchase per week. Suppose 60% of all people now drink Coke, and 40% drink Pepsi.

What fraction of people will be drinking Coke three weeks from now?

6438.0438.04.0781.06.0)0( )3(101

)3(000

1

0

)3(03

pQpQpQXPi

ii

Let (Q0,Q1)=(0.6,0.4) be the initial probabilities.

We will regard Coke as 0 and Pepsi as 1

We want to find P(X3=0)

8.02.0

1.09.0P

P00

562.0438.0

219.0781.03P

Page 14: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

14

Equilibrium (Stationary) Distribution

coke pepsi

0.10.9 0.8

0.2

Suppose 60% of all people now drink Coke, and 40% drink Pepsi. What fraction will be drinking Coke 10,100,1000,10000 … weeks from now?

For each week, probability is well defined. But does it converge to some equilibrium distribution [p0,p1]?

If it does, then eqs. : 0.9p0+0.2p1=p0, 0.8p1+0.1p0 =p1

must hold, yielding p0= 2/3, p1=1/3 .

Page 15: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

15

Simulation:

Markov ProcessCoke vs. Pepsi Example (cont)

week - i

Pr[

Xi =

Cok

e]

2/3 3

13

23

13

2

8.02.0

1.09.0

stationary distribution

coke pepsi

0.10.9 0.8

0.2

Page 16: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

16

Equilibrium (Stationary) Distribution

Whether or not there is a stationary distribution, and whether or not it is unique if it does exist, are determined by certain properties of the process. Irreducible means that every state is accessible from every other state. Aperiodic means that there exists at least one state for which the transition from that state to itself is possible. Positive recurrent means that the expected return time is finite for every state.

coke pepsi

0.10.9 0.8

0.2http://en.wikipedia.org/wiki/Markov_chain

Page 17: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

17

Equilibrium (Stationary) Distribution

If the Markov chain is positive recurrent, there exists a stationary distribution. If it is positive recurrent and irreducible, there exists a unique stationary distribution, and furthermore the process constructed by taking the stationary distribution as the initial distribution is ergodic. Then the average of a function f over samples of the Markov chain is equal to the average with respect to the stationary distribution,

http://en.wikipedia.org/wiki/Markov_chain

Page 18: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

18

Equilibrium (Stationary) Distribution

Writing P for the transition matrix, a stationary distribution is a vector π which satisfies the equation

Pπ = π . In this case, the stationary distribution π is an

eigenvector of the transition matrix, associated with the eigenvalue 1.

http://en.wikipedia.org/wiki/Markov_chain

Page 19: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

19

Discrete Markov Model - Example

States – Rainy:1, Cloudy:2, Sunny:3

Matrix A –

Problem – given that the weather on day 1 (t=1) is sunny(3), what is the probability for the observation O:

Page 20: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

20

Discrete Markov Model – Example (cont.)

The answer is -

Page 21: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

21

Types of Models

Ergodic model

Strongly connected - directed

path w/ positive probabilities

from each state i to state j

(but not necessarily complete

directed graph)

Page 22: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

24

Let Us Change Gear

Enough with these simple Markov chains.

Our next destination: Hidden Markov chains.

0.9

Fair loaded

head head

tailtail

0.9

0.1

0.1

1/2 1/4

3/41/2

Start1/2 1/2

Page 23: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

25

Hidden Markov Models (probabilistic finite state automata)

Often we face scenarios where states cannot be

directly observed.

We need an extension: Hidden Markov Models

a11 a22a33 a44

a12 a23a34

b11 b14

b12

b13

12 3

4

Observed

phenomenon

aij are state transition probabilities.

bik are observation (output) probabilities.

b11 + b12 + b13 + b14 = 1,

b21 + b22 + b23 + b24 = 1, etc.

Page 24: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

26

Hidden Markov Models - HMM

H1 H2 HL-1 HL

X1 X2 XL-1 XL

Hi

Xi

Hidden variables

Observed data

Page 25: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

27

Example: Dishonest Casino

Actually, what is hidden in this model?

Page 26: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

28

0.9

fair loaded

H HT T

0.90.1

0.1

1/2 1/43/41/2

Hidden Markov Models - HMMCoin-Tossing Example

Fair/Loaded

Head/Tail

X1 X2 XL-1 XLXi

H1 H2 HL-1 HLHi

transition probabilities

emission probabilities

Q1.: What is the probability of the sequence of observed outcome (e.g. HHHTHTTHHT), given the model?

Page 27: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

29

H1 H2 HL-1 HL

X1 X2 XL-1 XL

Hi

Xi

L tosses

Fair/Loaded

Head/Tail

0.9

Fair loaded

head head

tailtail

0.9

0.1

0.1

1/2 1/4

3/41/2

Start1/2 1/2

Loaded Coin Example (cont.)

Q1.: What is the probability of the sequence of observed outcome (e.g. HHHTHTTHHT), given the model?

Page 28: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

30

HMMs – Question I

Given an observation sequence O = (O1 O2 O3 … OL),

and a model M = {A, B, }how do we efficiently compute P( O | M ), the probability that the given model M produces the observation O in a run of length L ?

This probability can be viewed as a measure of the quality of the model M. Viewed this way, it enables

discrimination/selection among alternative models

M1, M2, M3…

Page 29: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

31

HMM Recognition (question I)

For a given model M = { A, B, p} and a given state sequence Q1 Q2 Q3 … QL ,, the probability of an observation sequence O1 O2 O3 … OL is

P(O|Q,M) = bQ1O1 bQ2O2 bQ3O3 …

bQTOT

For a given hidden Markov model M = { A, B, p}the probability of the state sequence Q1 Q2 Q3 … QL

is (the initial probability of Q1 is taken to be Q1)

P(Q|M) = pQ1 aQ1Q2 aQ2Q3 aQ3Q4 …

aQL-1QL

So, for a given HMM, Mthe probability of an observation sequence O1O2O3 … OT

is obtained by summing over all possible state sequences

Page 30: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

32

HMM – Recognition (cont.)

P(O| M) = QP(O|Q) P(Q|M)

= Q Q1 bQ1O1 aQ1Q2 bQ2O2 aQ2Q3 bQ2O2 …

Requires summing over exponentially many paths Can this be made more efficient?

Page 31: Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

33

HMM – Recognition (cont.) Why isn’t it efficient? – O(2LQL)

For a given state sequence of length L we have about 2L calculationsP(Q|M) = Q1 aQ1Q2 aQ2Q3 aQ3Q4

… aQT-1QT

P(O|Q) = bQ1O1 bQ2O2 bQ3O3 …

bQTOT

There are QL possible state sequence So, if Q=5, and L=100, then the algorithm requires

200x5100 computations Instead, we will use the forward-backward (F-B)

algorithm of Baum (68) to do things more efficiently.