hidden markov models (ii): log-likelihood (forward algorithm)

67
Hidden Markov Models (II): Log-Likelihood (Forward Algorithm) CMSC 473/673 UMBC October 11 th , 2017

Upload: others

Post on 02-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Hidden Markov Models (II):Log-Likelihood (Forward Algorithm)

CMSC 473/673UMBC

October 11th, 2017

Page 2: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Course Announcement: Assignment 2

Due next Saturday, 10/21 at 11:59 PM (~10 days)

Any questions?

Page 3: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Recap from last time…

Page 4: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Expectation Maximization (EM)0. Assume some value for your parameters

Two step, iterative algorithm

1. E-step: count under uncertainty, assuming these parameters

2. M-step: maximize log-likelihood, assuming these uncertain counts

estimated counts

Page 5: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Counting Requires Marginalizing

w z1 & w z2 & w z3 & w z4 & w

𝑝𝑝 𝑀𝑀 = 𝑝𝑝 𝑧𝑧1,𝑀𝑀 + 𝑝𝑝 𝑧𝑧2,𝑀𝑀 + 𝑝𝑝 𝑧𝑧3,𝑀𝑀 + 𝑝𝑝 𝑧𝑧4,𝑀𝑀 = �𝑧𝑧=1

4

𝑝𝑝(𝑧𝑧𝑖𝑖 ,𝑀𝑀)

E-step: count under uncertainty, assuming these parameters

break into 4 disjoint pieces

Page 6: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Agenda

HMM Motivation (Part of Speech) and Brief Definition

What is Part of Speech?

HMM Detailed Definition

HMM Tasks

Page 7: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Parts of Speech

Adapted from Luke Zettlemoyer

Closed class words

Open class words

Nouns

milk cat

cats

UMBC

Baltimorebread

speak

give

cando

may

Verbs

Adjectives

would-be

wettestlargehappy

red

fake

Kamp & Partee (1995)

Adverbs

recentlyhappily

thenthere (location)

intransitive

run

ditransitive

transitive

subsective

non-subsective

modals, auxiliaries

I you

Determiners

PrepositionsConjunctions

Pronouns

athe

everywhat

inunder

topParticles

(set) up

so (far)not

(call) off

and or ifbecause

there

Numbers

one

1,324

because

Page 8: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Penn Treebank Part of Speech

3SLP: Chapter 10

Page 9: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

http://universaldependencies.org/

Page 10: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Hidden Markov Models: Part of Speech

p(British Left Waffles on Falkland Islands)

Adjective Noun Verb

Noun Verb Noun Prep Noun Noun

Prep Noun Noun(i):

(ii):

Class-basedmodel

Bigram modelof the classes

Model allclass sequences

𝑝𝑝 𝑀𝑀𝑖𝑖|𝑧𝑧𝑖𝑖 𝑝𝑝 𝑧𝑧𝑖𝑖|π‘§π‘§π‘–π‘–βˆ’1 �𝑧𝑧1,..,𝑧𝑧𝑁𝑁

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁

Page 11: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Hidden Markov Model

Goal: maximize (log-)likelihood

In practice: we don’t actually observe these zvalues; we just see the words w

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁 = 𝑝𝑝 𝑧𝑧1| 𝑧𝑧0 𝑝𝑝 𝑀𝑀1|𝑧𝑧1 ⋯𝑝𝑝 𝑧𝑧𝑁𝑁| π‘§π‘§π‘π‘βˆ’1 𝑝𝑝 𝑀𝑀𝑁𝑁|𝑧𝑧𝑁𝑁= οΏ½

𝑖𝑖

𝑝𝑝 𝑀𝑀𝑖𝑖|𝑧𝑧𝑖𝑖 𝑝𝑝 𝑧𝑧𝑖𝑖| π‘§π‘§π‘–π‘–βˆ’1

if we knew the probability parametersthen we could estimate z and evaluate

likelihood… but we don’t! :(

if we did observe z, estimating the probability parameters would be easy…

but we don’t! :(

Page 12: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Hidden Markov Model Terminology

Each zi can take the value of one of K latent states

Transition and emission distributions do not change

Q: How many different probability values are there with K states and V vocab items?

A: VK emission values and K2 transition values

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁 = 𝑝𝑝 𝑧𝑧1| 𝑧𝑧0 𝑝𝑝 𝑀𝑀1|𝑧𝑧1 ⋯𝑝𝑝 𝑧𝑧𝑁𝑁| π‘§π‘§π‘π‘βˆ’1 𝑝𝑝 𝑀𝑀𝑁𝑁|𝑧𝑧𝑁𝑁= οΏ½

𝑖𝑖

𝑝𝑝 𝑀𝑀𝑖𝑖|𝑧𝑧𝑖𝑖 𝑝𝑝 𝑧𝑧𝑖𝑖| π‘§π‘§π‘–π‘–βˆ’1

emissionprobabilities/parameters

transitionprobabilities/parameters

Page 13: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Hidden Markov Model Representation𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁 = 𝑝𝑝 𝑧𝑧1| 𝑧𝑧0 𝑝𝑝 𝑀𝑀1|𝑧𝑧1 ⋯𝑝𝑝 𝑧𝑧𝑁𝑁| π‘§π‘§π‘π‘βˆ’1 𝑝𝑝 𝑀𝑀𝑁𝑁|𝑧𝑧𝑁𝑁

= �𝑖𝑖

𝑝𝑝 𝑀𝑀𝑖𝑖|𝑧𝑧𝑖𝑖 𝑝𝑝 𝑧𝑧𝑖𝑖| π‘§π‘§π‘–π‘–βˆ’1emission

probabilities/parameterstransitionprobabilities/parameters

z1

w1

…

w2 w3 w4

z2 z3 z4

𝑝𝑝 𝑀𝑀1|𝑧𝑧1 𝑝𝑝 𝑀𝑀2|𝑧𝑧2 𝑝𝑝 𝑀𝑀3|𝑧𝑧3 𝑝𝑝 𝑀𝑀4|𝑧𝑧4

𝑝𝑝 𝑧𝑧2| 𝑧𝑧1 𝑝𝑝 𝑧𝑧3| 𝑧𝑧2 𝑝𝑝 𝑧𝑧4| 𝑧𝑧3𝑝𝑝 𝑧𝑧1| 𝑧𝑧0

initial starting distribution

(β€œBOS”)

Each zi can take the value of one of K latent states

Transition and emission distributions do not change

Graphical Models(see 478/678)

Page 14: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Example: 2-state Hidden Markov Model as a Lattice

z1 = N

w1

…

w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀2|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉𝑝𝑝 𝑉𝑉| start

…𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀4|𝑉𝑉𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉𝑝𝑝 𝑀𝑀1|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁

Page 15: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Hidden Markov Model

Unigram Class-based Language Model (β€œK” coins)

Unigram Language Model

Comparison of Joint Probabilities𝑝𝑝 𝑀𝑀1,𝑀𝑀2, … ,𝑀𝑀𝑁𝑁 = 𝑝𝑝 𝑀𝑀1 𝑝𝑝 𝑀𝑀2 ⋯𝑝𝑝 𝑀𝑀𝑁𝑁 = οΏ½

𝑖𝑖

𝑝𝑝 𝑀𝑀𝑖𝑖

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁 = 𝑝𝑝 𝑧𝑧1 𝑝𝑝 𝑀𝑀1|𝑧𝑧1 ⋯𝑝𝑝 𝑧𝑧𝑁𝑁 𝑝𝑝 𝑀𝑀𝑁𝑁|𝑧𝑧𝑁𝑁= οΏ½

𝑖𝑖

𝑝𝑝 𝑀𝑀𝑖𝑖|𝑧𝑧𝑖𝑖 𝑝𝑝 𝑧𝑧𝑖𝑖

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁 = 𝑝𝑝 𝑧𝑧1| 𝑧𝑧0 𝑝𝑝 𝑀𝑀1|𝑧𝑧1 ⋯𝑝𝑝 𝑧𝑧𝑁𝑁| π‘§π‘§π‘π‘βˆ’1 𝑝𝑝 𝑀𝑀𝑁𝑁|𝑧𝑧𝑁𝑁= οΏ½

𝑖𝑖

𝑝𝑝 𝑀𝑀𝑖𝑖|𝑧𝑧𝑖𝑖 𝑝𝑝 𝑧𝑧𝑖𝑖| π‘§π‘§π‘–π‘–βˆ’1

Page 16: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Agenda

HMM Motivation (Part of Speech) and Brief Definition

What is Part of Speech?

HMM Detailed Definition

HMM Tasks

Page 17: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Hidden Markov Model Tasks

Calculate the (log) likelihood of an observed sequence w1, …, wN

Calculate the most likely sequence of states (for an observed sequence)

Learn the emission and transition parameters

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁 = 𝑝𝑝 𝑧𝑧1| 𝑧𝑧0 𝑝𝑝 𝑀𝑀1|𝑧𝑧1 ⋯𝑝𝑝 𝑧𝑧𝑁𝑁| π‘§π‘§π‘π‘βˆ’1 𝑝𝑝 𝑀𝑀𝑁𝑁|𝑧𝑧𝑁𝑁= οΏ½

𝑖𝑖

𝑝𝑝 𝑀𝑀𝑖𝑖|𝑧𝑧𝑖𝑖 𝑝𝑝 𝑧𝑧𝑖𝑖| π‘§π‘§π‘–π‘–βˆ’1emission

probabilities/parameterstransitionprobabilities/parameters

Page 18: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Hidden Markov Model Tasks

Calculate the (log) likelihood of an observed sequence w1, …, wN

Calculate the most likely sequence of states (for an observed sequence)

Learn the emission and transition parameters

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁 = 𝑝𝑝 𝑧𝑧1| 𝑧𝑧0 𝑝𝑝 𝑀𝑀1|𝑧𝑧1 ⋯𝑝𝑝 𝑧𝑧𝑁𝑁| π‘§π‘§π‘π‘βˆ’1 𝑝𝑝 𝑀𝑀𝑁𝑁|𝑧𝑧𝑁𝑁= οΏ½

𝑖𝑖

𝑝𝑝 𝑀𝑀𝑖𝑖|𝑧𝑧𝑖𝑖 𝑝𝑝 𝑧𝑧𝑖𝑖| π‘§π‘§π‘–π‘–βˆ’1emission

probabilities/parameterstransitionprobabilities/parameters

Page 19: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

HMM Likelihood Task

Marginalize over all latent sequence joint likelihoods

𝑝𝑝 𝑀𝑀1,𝑀𝑀2, … ,𝑀𝑀𝑁𝑁 = �𝑧𝑧1,β‹―,𝑧𝑧𝑁𝑁

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁

Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there?

Page 20: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

HMM Likelihood Task

Marginalize over all latent sequence joint likelihoods

𝑝𝑝 𝑀𝑀1,𝑀𝑀2, … ,𝑀𝑀𝑁𝑁 = �𝑧𝑧1,β‹―,𝑧𝑧𝑁𝑁

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁

Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there?

A: KN

Page 21: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

HMM Likelihood Task

Marginalize over all latent sequence joint likelihoods

𝑝𝑝 𝑀𝑀1,𝑀𝑀2, … ,𝑀𝑀𝑁𝑁 = �𝑧𝑧1,β‹―,𝑧𝑧𝑁𝑁

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁

Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there?

A: KN

Goal:Find a way to compute this exponential sum efficiently

(in polynomial time)

Page 22: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

HMM Likelihood Task

Marginalize over all latent sequence joint likelihoods

𝑝𝑝 𝑀𝑀1,𝑀𝑀2, … ,𝑀𝑀𝑁𝑁 = �𝑧𝑧1,β‹―,𝑧𝑧𝑁𝑁

𝑝𝑝 𝑧𝑧1,𝑀𝑀1, 𝑧𝑧2,𝑀𝑀2, … , 𝑧𝑧𝑁𝑁,𝑀𝑀𝑁𝑁

Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there?

A: KN

Goal:Find a way to compute this exponential sum efficiently

(in polynomial time)

Like in language modeling, you need to model when to stop generating.

This ending state is generally not included in β€œK.”

Page 23: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1

…

w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀2|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉𝑝𝑝 𝑉𝑉| start

…𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀4|𝑉𝑉𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉𝑝𝑝 𝑀𝑀1|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁

Q: What are the latent sequences here (EOS excluded)?

Page 24: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1

…

w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀2|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉𝑝𝑝 𝑉𝑉| start

…𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀4|𝑉𝑉𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉𝑝𝑝 𝑀𝑀1|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁

(N, w1), (N, w2), (N, w3), (N, w4)(N, w1), (N, w2), (N, w3), (V, w4)(N, w1), (N, w2), (V, w3), (N, w4)(N, w1), (N, w2), (V, w3), (V, w4)

A:(N, w1), (V, w2), (N, w3), (N, w4)(N, w1), (V, w2), (N, w3), (V, w4)(N, w1), (V, w2), (V, w3), (N, w4)(N, w1), (V, w2), (V, w3), (V, w4)

(V, w1), (N, w2), (N, w3), (N, w4)(V, w1), (N, w2), (N, w3), (V, w4)… (six more)

Q: What are the latent sequences here (EOS excluded)?

Page 25: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1

…

w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀2|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉𝑝𝑝 𝑉𝑉| start

…𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀4|𝑉𝑉𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉𝑝𝑝 𝑀𝑀1|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁

(N, w1), (N, w2), (N, w3), (N, w4)(N, w1), (N, w2), (N, w3), (V, w4)(N, w1), (N, w2), (V, w3), (N, w4)(N, w1), (N, w2), (V, w3), (V, w4)

A:(N, w1), (V, w2), (N, w3), (N, w4)(N, w1), (V, w2), (N, w3), (V, w4)(N, w1), (V, w2), (V, w3), (N, w4)(N, w1), (V, w2), (V, w3), (V, w4)

(V, w1), (N, w2), (N, w3), (N, w4)(V, w1), (N, w2), (N, w3), (V, w4)… (six more)

Q: What are the latent sequences here (EOS excluded)?

Page 26: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1

…

w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀2|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉𝑝𝑝 𝑉𝑉| start

…𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀4|𝑉𝑉𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉𝑝𝑝 𝑀𝑀1|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 27: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1

…

w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀2|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉𝑝𝑝 𝑉𝑉| start

…𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀4|𝑉𝑉𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉𝑝𝑝 𝑀𝑀1|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁

Q: What’s the probability of(N, w1), (V, w2), (V, w3), (N, w4)?N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 28: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉

Q: What’s the probability of(N, w1), (V, w2), (V, w3), (N, w4)?

A: (.7*.7) * (.8*.6) * (.35*.1) * (.6*.05) =0.0002822

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 29: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉

Q: What’s the probability of(N, w1), (V, w2), (V, w3), (N, w4) with ending included (unique ending symbol β€œ#”)?A: (.7*.7) * (.8*.6) * (.35*.1) * (.6*.05) *

(.05 * 1) =0.00001235

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4 #

N .7 .2 .05 .05 0

V .2 .6 .1 .1 0end 0 0 0 0 1

Page 30: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1

…

w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀2|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉 𝑝𝑝 𝑉𝑉| 𝑉𝑉𝑝𝑝 𝑉𝑉| start

…𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁 𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉 𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀4|𝑉𝑉𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉𝑝𝑝 𝑀𝑀1|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁 𝑝𝑝 𝑁𝑁| 𝑁𝑁

Q: What’s the probability of(N, w1), (V, w2), (N, w3), (N, w4)?N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 31: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁

Q: What’s the probability of(N, w1), (V, w2), (N, w3), (N, w4)?

A: (.7*.7) * (.8*.6) * (.6*.05) * (.15*.05) =0.00007056

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 32: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁

Q: What’s the probability of(N, w1), (V, w2), (N, w3), (N, w4) with ending (unique symbol β€œ#”)?

A: (.7*.7) * (.8*.6) * (.6*.05) * (.15*.05) *(.05 * 1) =

0.000002646

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4 #

N .7 .2 .05 .05 0

V .2 .6 .1 .1 0end 0 0 0 0 1

Page 33: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| start

z2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉

Page 34: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| start

z2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉

Up until here, all the computation was the same

Page 35: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3)-State HMM Likelihood

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| start

z2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑁𝑁𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2 w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉𝑝𝑝 𝑀𝑀2|𝑉𝑉

Up until here, all the computation was the same

Let’s reuse what computations we can

Page 36: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #1

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

(.7*.7)

(.7*.7) * (.8*.6)

(.7*.7) * (.8*.6) * (.4*.1)

(.7*.7) * (.8*.6) * (.6*.05)

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 37: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #1

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

(.7*.7)

(.7*.7) * (.8*.6)

(.7*.7) * (.8*.6) * (.4*.1)

(.7*.7) * (.8*.6) * (.6*.05)

Remember: these are only two of the 16 paths through the

trellis

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 38: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #1

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

(.7*.7)

aN[2, V] =(.7*.7) * (.8*.6)

aN[2, V] * (.4*.1)

aN[2, V] * (.6*.05)

Remember: these are only two of the 16 paths through the

trellis

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 39: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #1

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

a[1,N] =(.7*.7)

aN[2, V] =a[1, V] * (.8*.6)

aN[2, V] * (.4*.1)

aN[2, V] * (.6*.05)

Remember: these are only two of the 16 paths through the

trellis

These aP(t, s) values

represent the probability of a state s at time

t in a particular shared path P

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 40: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #1

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

a[1,N] =(.7*.7)

aN[2, V] =a[1, V] * (.8*.6)

aN[2, V] * (.4*.1)

aN[2, V] * (.6*.05)

Remember: these are only two of the 16 paths through the

trellis

These aP(t, s) values

represent the probability of a state s at time

t in a particular shared path P

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

β€œNV” is path prefix

Page 41: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #1

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

(.7*.7)

aN[2, V] =(.7*.7) * (.8*.6)

aN[2, V] * (.4*.1)

aN [2, V] * (.6*.05)

Remember: these are only two of the 16 paths through the

trellis

How do we handle the other 14 paths?

Marginalize at each time step

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

Page 42: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #1

so far, we’ve only considered β€œparticular shared path (AB) *”

aAB(i, *) = aA(i-1, B) * p(* | B) * p(obs at i | *)

zi-1= C

zi-1= B

zi-1 = A

zi = C

zi = B

zi = A

zi-2= C

zi-2= B

zi-2 = A

what about any of the other paths (AAB, ACB, etc.)?

aA(i-1, B)a(i-2, A) aAB(i, *)

Page 43: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #2

zi-1= C

zi-1= B

zi-1 = A

zi = C

zi = B

zi = A

let’s first consider β€œany shared path ending with B (AB, BB, or CB) B”

zi-2= C

zi-2= B

zi-2 = A

Page 44: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #2

let’s first consider β€œany shared path ending with B (AB, BB, or CB) B”marginalize across the previous hidden state values

𝛼𝛼 𝑖𝑖,𝐡𝐡 = �𝑠𝑠

𝛼𝛼 𝑖𝑖 βˆ’ 1, 𝑠𝑠 βˆ— 𝑝𝑝 𝐡𝐡 𝑠𝑠) βˆ— 𝑝𝑝(obs at 𝑖𝑖 | 𝐡𝐡)

zi-1= C

zi-1= B

zi-1 = A

zi = C

zi = B

zi = A

zi-2= C

zi-2= B

zi-2 = A

Page 45: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #2

let’s first consider β€œany shared path ending with B (AB, BB, or CB) B”marginalize across the previous hidden state values

𝛼𝛼 𝑖𝑖,𝐡𝐡 = �𝑠𝑠

𝛼𝛼 𝑖𝑖 βˆ’ 1, 𝑠𝑠 βˆ— 𝑝𝑝 𝐡𝐡 𝑠𝑠) βˆ— 𝑝𝑝(obs at 𝑖𝑖 | 𝐡𝐡)

aAB(i, B) = aA(i-1, B) * p(B | B) * p(obs at i | B)

zi-1= C

zi-1= B

zi-1 = A

zi = C

zi = B

zi = A

zi-2= C

zi-2= B

zi-2 = A

Page 46: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Reusing Computation: Attempt #2

let’s first consider β€œany shared path ending with B (AB, BB, or CB) B”marginalize across the previous hidden state values

𝛼𝛼 𝑖𝑖,𝐡𝐡 = �𝑠𝑠

𝛼𝛼 𝑖𝑖 βˆ’ 1, 𝑠𝑠 βˆ— 𝑝𝑝 𝐡𝐡 𝑠𝑠) βˆ— 𝑝𝑝(obs at 𝑖𝑖 | 𝐡𝐡)

aAB(i, B) = aA(i-1, B) * p(B | B) * p(obs at i | B)

computing Ξ± at time i-1 will correctly incorporate paths

through time i-2: we correctly obey the Markov property

zi-1= C

zi-1= B

zi-1 = A

zi = C

zi = B

zi = A

zi-2= C

zi-2= B

zi-2 = A

Page 47: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward Probability

let’s first consider β€œany shared path ending with B (AB, BB, or CB) B”marginalize across the previous hidden state values

𝛼𝛼 𝑖𝑖,𝐡𝐡 = �𝑠𝑠′𝛼𝛼 𝑖𝑖 βˆ’ 1, 𝑠𝑠′ βˆ— 𝑝𝑝 𝐡𝐡 𝑠𝑠′) βˆ— 𝑝𝑝(obs at 𝑖𝑖 | 𝐡𝐡)

aAB(i, B) = aA(i-1, B) * p(B | B) * p(obs at i | B)

computing Ξ± at time i-1 will correctly incorporate paths

through time i-2: we correctly obey the Markov property

Ξ±(i, B) is the total probability of all paths to that state B from the

beginning

zi-1= C

zi-1= B

zi-1 = A

zi = C

zi = B

zi = A

zi-2= C

zi-2= B

zi-2 = A

Page 48: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward Probability

Ξ±(i, s) is the total probability of all paths:1. that start from the beginning

2. that end (currently) in s at step i3. that emit the observation obs at i

𝛼𝛼 𝑖𝑖, 𝑠𝑠 = �𝑠𝑠′𝛼𝛼 𝑖𝑖 βˆ’ 1, 𝑠𝑠′ βˆ— 𝑝𝑝 𝑠𝑠 𝑠𝑠′) βˆ— 𝑝𝑝(obs at 𝑖𝑖 | 𝑠𝑠)

Page 49: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward Probability

Ξ±(i, s) is the total probability of all paths:1. that start from the beginning

2. that end (currently) in s at step i3. that emit the observation obs at i

𝛼𝛼 𝑖𝑖, 𝑠𝑠 = �𝑠𝑠′𝛼𝛼 𝑖𝑖 βˆ’ 1, 𝑠𝑠′ βˆ— 𝑝𝑝 𝑠𝑠 𝑠𝑠′) βˆ— 𝑝𝑝(obs at 𝑖𝑖 | 𝑠𝑠)

how likely is it to get into state s this way?

what are the immediate ways to get into state s?

what’s the total probability up until now?

Page 50: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3) -State HMM Likelihood with Forward Probabilities

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

Ξ±[1, N] =(.7*.7)

Ξ±[2, V] =Ξ±[1, N] * (.8*.6) +Ξ±[1, V] * (.35*.6)

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

𝑝𝑝 𝑉𝑉| start

Ξ±[3, V] =Ξ±[2, V] * (.35*.1)+Ξ±[2, N] * (.8*.1)

Ξ±[3, N] =Ξ±[2, V] * (.6*.05) +Ξ±[2, N] * (.15*.05)

Page 51: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3) -State HMM Likelihood with Forward Probabilities

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

Ξ±[1, N] =(.7*.7)

Ξ±[2, V] =Ξ±[1, N] * (.8*.6) +Ξ±[1, V] * (.35*.6)

Ξ±[3, V] =Ξ±[2, V] * (.35*.1)+Ξ±[2, N] * (.8*.1)

Ξ±[3, N] =Ξ±[2, V] * (.6*.05) +Ξ±[2, N] * (.15*.05)

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

𝑝𝑝 𝑉𝑉| start

Ξ±[1, V] =(.2*.2)

Page 52: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3) -State HMM Likelihood with Forward Probabilities

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

Ξ±[1, N] =(.7*.7) = .49

Ξ±[2, V] =Ξ±[1, N] * (.8*.6) +Ξ±[1, V] * (.35*.6) =0.2436

Ξ±[3, V] =Ξ±[2, V] * (.35*.1)+Ξ±[2, N] * (.8*.1)

Ξ±[3, N] =Ξ±[2, V] * (.6*.05) +Ξ±[2, N] * (.2*.05)

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

𝑝𝑝 𝑉𝑉| start

Ξ±[1, V] =(.2*.2) = .04

Ξ±[2, N] =Ξ±[1, N] * (.15*.2) +Ξ±[1, V] * (.6*.2) =.0195

Page 53: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

2 (3) -State HMM Likelihood with Forward Probabilities

w3 w4

𝑝𝑝 𝑀𝑀3|𝑁𝑁 𝑝𝑝 𝑀𝑀4|𝑁𝑁

z3 = N

z4 = N

z3 = V

z4 = V

𝑝𝑝 𝑁𝑁| 𝑁𝑁

z1 = N

w1 w2

w3 w4

𝑝𝑝 𝑀𝑀1|𝑁𝑁

𝑝𝑝 𝑀𝑀4|𝑁𝑁

𝑝𝑝 𝑁𝑁| startz2 = N

z3 = N

z4 = N

z1 = V

z2 = V

z3 = V

z4 = V

𝑝𝑝 𝑉𝑉| 𝑉𝑉

𝑝𝑝 𝑉𝑉| 𝑁𝑁

𝑝𝑝 𝑁𝑁| 𝑉𝑉

𝑝𝑝 𝑀𝑀3|𝑉𝑉

𝑝𝑝 𝑀𝑀2|𝑉𝑉

𝑝𝑝 𝑁𝑁| 𝑉𝑉

Ξ±[1, N] =(.7*.7) = .49

Ξ±[2, V] =Ξ±[1, N] * (.8*.6) +Ξ±[1, V] * (.35*.6) =0.2436

Ξ±[3, V] =Ξ±[2, V] * (.35*.1)+Ξ±[2, N] * (.8*.1)

Ξ±[3, N] =Ξ±[2, V] * (.6*.05) +Ξ±[2, N] * (.2*.05)

N V end

start .7 .2 .1

N .15 .8 .05

V .6 .35 .05

w1 w2 w3 w4

N .7 .2 .05 .05

V .2 .6 .1 .1

𝑝𝑝 𝑉𝑉| start

Ξ±[1, V] =(.2*.2) = .04

Ξ±[2, N] =Ξ±[1, N] * (.15*.2) +Ξ±[1, V] * (.6*.2) =.0195

Use dynamic programming to build the Ξ± left-to-right

Page 54: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward Algorithm

Ξ±: a 2D table, N+2 x K*N+2: number of observations (+2 for the

BOS & EOS symbols)K*: number of states

Use dynamic programming to build the Ξ± left-to-right

Page 55: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0

for(i = 1; i ≀ N+1; ++i) {

}

Page 56: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

}}

Page 57: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = pemission(obsi | state)

}}

Page 58: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = pemission(obsi | state)for(old = 0; old < K*; ++old) {

pmove = ptransition(state | old)Ξ±[i][state] += Ξ±[i-1][old] * pobs * pmove

}}

}

Page 59: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = pemission(obsi | state)for(old = 0; old < K*; ++old) {

pmove = ptransition(state | old)Ξ±[i][state] += Ξ±[i-1][old] * pobs * pmove

}}

}

we still need to learn these (EM)

Page 60: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = pemission(obsi | state)for(old = 0; old < K*; ++old) {

pmove = ptransition(state | old)Ξ±[i][state] += Ξ±[i-1][old] * pobs * pmove

}}

}

Q: What do we return? (How do we return the likelihood of the sequence?)

Page 61: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = pemission(obsi | state)for(old = 0; old < K*; ++old) {

pmove = ptransition(state | old)Ξ±[i][state] += Ξ±[i-1][old] * pobs * pmove

}}

}

Q: What do we return? (How do we return the likelihood of the sequence?)

A: Ξ±[N+1][end]

Page 62: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Interactive HMM Example

https://goo.gl/rbHEoc (Jason Eisner, 2002)

Original: http://www.cs.jhu.edu/~jason/465/PowerPoint/lect24-hmm.xls

Page 63: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]

Ξ±[0][*] = 0.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {

pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],

Ξ±[i-1][old] + pobs + pmove)}

}}

Page 64: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]

Ξ±[0][*] = 0.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {

pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],

Ξ±[i-1][old] + pobs + pmove)}

}}

logadd 𝑙𝑙𝑝𝑝, 𝑙𝑙𝑙𝑙 = log exp 𝑙𝑙𝑝𝑝 + exp(𝑙𝑙𝑙𝑙)

Page 65: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]

Ξ±[0][*] = 0.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {

pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],

Ξ±[i-1][old] + pobs + pmove)}

}}

logadd 𝑙𝑙𝑝𝑝, 𝑙𝑙𝑙𝑙 = log exp 𝑙𝑙𝑝𝑝 + exp(𝑙𝑙𝑙𝑙)

this can still overflow! (why?)

Page 66: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]

Ξ±[0][*] = 0.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {

pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],

Ξ±[i-1][old] + pobs + pmove)}

}} logadd 𝑙𝑙𝑝𝑝, 𝑙𝑙𝑙𝑙 = οΏ½ 𝑙𝑙𝑝𝑝 + log 1 + exp 𝑙𝑙𝑙𝑙 βˆ’ 𝑙𝑙𝑝𝑝 , 𝑙𝑙𝑝𝑝 β‰₯ 𝑙𝑙𝑙𝑙

𝑙𝑙𝑙𝑙 + log 1 + exp 𝑙𝑙𝑝𝑝 βˆ’ 𝑙𝑙𝑙𝑙 , 𝑙𝑙𝑙𝑙 > 𝑙𝑙𝑝𝑝

Page 67: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]

Ξ±[0][*] = 0.0

for(i = 1; i ≀ N+1; ++i) {for(state = 0; state < K*; ++state) {

pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {

pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],

Ξ±[i-1][old] + pobs + pmove)}

}} logadd 𝑙𝑙𝑝𝑝, 𝑙𝑙𝑙𝑙 = οΏ½ 𝑙𝑙𝑝𝑝 + log 1 + exp 𝑙𝑙𝑙𝑙 βˆ’ 𝑙𝑙𝑝𝑝 , 𝑙𝑙𝑝𝑝 β‰₯ 𝑙𝑙𝑙𝑙

𝑙𝑙𝑙𝑙 + log 1 + exp 𝑙𝑙𝑝𝑝 βˆ’ 𝑙𝑙𝑙𝑙 , 𝑙𝑙𝑙𝑙 > 𝑙𝑙𝑝𝑝

scipy.misc.logsumexp