part of speech taggingusers.umiacs.umd.edu/~jbg/teaching/cmsc_723/10b_viterbi.pdf · part of speech...
TRANSCRIPT
![Page 1: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/1.jpg)
Part of Speech Tagging
Natural Language Processing: JordanBoyd-GraberUniversity of MarylandVITERBI
Adapted from material by Jimmy Lin and Jason Eisner
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 1 / 17
![Page 2: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/2.jpg)
Finding Tag Sequences
Viterbi Algorithm
� Given an unobserved sequence of length L, {x1, . . . ,xL}, we want to finda sequence {z1 . . .zL} with the highest probability.
� It’s impossible to compute K L possibilities.
� So, we use dynamic programming to compute most likely tags for eachtoken subsequence from 0 to t that ends in state k .
� Memoization: fill a table of solutions of sub-problems
� Solve larger problems by composing sub-solutions
� Base case:δ1(k) =πkβk ,xi
(1)
� Recursion:δn(k) =max
j
�
δn−1(j)θj ,k
�
βk ,xn(2)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 2 / 17
![Page 3: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/3.jpg)
Finding Tag Sequences
Viterbi Algorithm
� Given an unobserved sequence of length L, {x1, . . . ,xL}, we want to finda sequence {z1 . . .zL} with the highest probability.
� It’s impossible to compute K L possibilities.
� So, we use dynamic programming to compute most likely tags for eachtoken subsequence from 0 to t that ends in state k .
� Memoization: fill a table of solutions of sub-problems
� Solve larger problems by composing sub-solutions
� Base case:δ1(k) =πkβk ,xi
(1)
� Recursion:δn(k) =max
j
�
δn−1(j)θj ,k
�
βk ,xn(2)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 2 / 17
![Page 4: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/4.jpg)
Finding Tag Sequences
Viterbi Algorithm
� Given an unobserved sequence of length L, {x1, . . . ,xL}, we want to finda sequence {z1 . . .zL} with the highest probability.
� It’s impossible to compute K L possibilities.
� So, we use dynamic programming to compute most likely tags for eachtoken subsequence from 0 to t that ends in state k .
� Memoization: fill a table of solutions of sub-problems
� Solve larger problems by composing sub-solutions
� Base case:δ1(k) =πkβk ,xi
(1)
� Recursion:δn(k) =max
j
�
δn−1(j)θj ,k
�
βk ,xn(2)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 2 / 17
![Page 5: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/5.jpg)
Finding Tag Sequences
Viterbi Algorithm
� Given an unobserved sequence of length L, {x1, . . . ,xL}, we want to finda sequence {z1 . . .zL} with the highest probability.
� It’s impossible to compute K L possibilities.
� So, we use dynamic programming to compute most likely tags for eachtoken subsequence from 0 to t that ends in state k .
� Memoization: fill a table of solutions of sub-problems
� Solve larger problems by composing sub-solutions
� Base case:δ1(k) =πkβk ,xi
(1)
� Recursion:δn(k) =max
j
�
δn−1(j)θj ,k
�
βk ,xn(2)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 2 / 17
![Page 6: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/6.jpg)
Finding Tag Sequences
Viterbi Algorithm
� Given an unobserved sequence of length L, {x1, . . . ,xL}, we want to finda sequence {z1 . . .zL} with the highest probability.
� It’s impossible to compute K L possibilities.
� So, we use dynamic programming to compute most likely tags for eachtoken subsequence from 0 to t that ends in state k .
� Memoization: fill a table of solutions of sub-problems
� Solve larger problems by composing sub-solutions
� Base case:δ1(k) =πkβk ,xi
(1)
� Recursion:δn(k) =max
j
�
δn−1(j)θj ,k
�
βk ,xn(2)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 2 / 17
![Page 7: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/7.jpg)
Finding Tag Sequences
� The complexity of this is now K 2L.
� In class: example that shows why you need all O(KL) table cells(garden pathing)
� But just computing the max isn’t enough. We also have to rememberwhere we came from. (Breadcrumbs from best previous state.)
Ψn = argmaxjδn−1(j)θj ,k (3)
� Let’s do that for the sentence “come and get it”
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 3 / 17
![Page 8: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/8.jpg)
Finding Tag Sequences
� The complexity of this is now K 2L.
� In class: example that shows why you need all O(KL) table cells(garden pathing)
� But just computing the max isn’t enough. We also have to rememberwhere we came from. (Breadcrumbs from best previous state.)
Ψn = argmaxjδn−1(j)θj ,k (3)
� Let’s do that for the sentence “come and get it”
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 3 / 17
![Page 9: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/9.jpg)
Viterbi Algorithm
POS πk βk ,x1logδ1(k)
MOD 0.234 0.024 -5.18DET 0.234 0.032 -4.89
CONJ 0.234 0.024 -5.18N 0.021 0.016 -7.99
PREP 0.021 0.024 -7.59PRO 0.021 0.016 -7.99
V 0.234 0.121 -3.56come and get it
Why logarithms?
1. More interpretable than a float with lots of zeros.
2. Underflow is less of an issue
3. Addition is cheaper than multiplication
log(ab) = log(a)+ log(b) (4)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 4 / 17
![Page 10: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/10.jpg)
Viterbi Algorithm
POS logδ1(j)
logδ1(j)θj ,CONJ
logδ2(CONJ)MOD -5.18
-8.48
DET -4.89
-7.72
CONJ -5.18
-8.47 ??? -6.02
N -7.99
≤−7.99
PREP -7.59
≤−7.59
PRO -7.99
≤−7.99
V -3.56
-5.21
come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 11: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/11.jpg)
Viterbi Algorithm
POS logδ1(j)
logδ1(j)θj ,CONJ
logδ2(CONJ)MOD -5.18
-8.48
DET -4.89
-7.72
CONJ -5.18
-8.47
???
-6.02
N -7.99
≤−7.99
PREP -7.59
≤−7.59
PRO -7.99
≤−7.99
V -3.56
-5.21
come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 12: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/12.jpg)
Viterbi Algorithm
POS logδ1(j) logδ1(j)θj ,CONJ logδ2(CONJ)MOD -5.18
-8.48
DET -4.89
-7.72
CONJ -5.18
-8.47
???
-6.02
N -7.99
≤−7.99
PREP -7.59
≤−7.59
PRO -7.99
≤−7.99
V -3.56
-5.21
come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 13: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/13.jpg)
Viterbi Algorithm
POS logδ1(j) logδ1(j)θj ,CONJ logδ2(CONJ)MOD -5.18
-8.48
DET -4.89
-7.72
CONJ -5.18
-8.47
???
-6.02
N -7.99
≤−7.99
PREP -7.59
≤−7.59
PRO -7.99
≤−7.99
V -3.56
-5.21
come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 14: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/14.jpg)
Viterbi Algorithm
POS logδ1(j) logδ1(j)θj ,CONJ logδ2(CONJ)MOD -5.18
-8.48
DET -4.89
-7.72
CONJ -5.18
-8.47
???
-6.02
N -7.99
≤−7.99
PREP -7.59
≤−7.59
PRO -7.99
≤−7.99
V -3.56 -5.21come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 15: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/15.jpg)
Viterbi Algorithm
POS logδ1(j) logδ1(j)θj ,CONJ logδ2(CONJ)MOD -5.18
-8.48
DET -4.89
-7.72
CONJ -5.18
-8.47
???
-6.02
N -7.99 ≤−7.99PREP -7.59 ≤−7.59PRO -7.99 ≤−7.99
V -3.56 -5.21come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 16: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/16.jpg)
Viterbi Algorithm
POS logδ1(j) logδ1(j)θj ,CONJ logδ2(CONJ)MOD -5.18 -8.48DET -4.89 -7.72
CONJ -5.18 -8.47 ???
-6.02
N -7.99 ≤−7.99PREP -7.59 ≤−7.59PRO -7.99 ≤−7.99
V -3.56 -5.21come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 17: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/17.jpg)
Viterbi Algorithm
POS logδ1(j) logδ1(j)θj ,CONJ logδ2(CONJ)MOD -5.18 -8.48DET -4.89 -7.72
CONJ -5.18 -8.47 ???
-6.02
N -7.99 ≤−7.99PREP -7.59 ≤−7.59PRO -7.99 ≤−7.99
V -3.56 -5.21come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 18: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/18.jpg)
Viterbi Algorithm
POS logδ1(j) logδ1(j)θj ,CONJ logδ2(CONJ)MOD -5.18 -8.48DET -4.89 -7.72
CONJ -5.18 -8.47
??? -6.02
N -7.99 ≤−7.99PREP -7.59 ≤−7.59PRO -7.99 ≤−7.99
V -3.56 -5.21come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 19: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/19.jpg)
Viterbi Algorithm
POS logδ1(j) logδ1(j)θj ,CONJ logδ2(CONJ)MOD -5.18 -8.48DET -4.89 -7.72
CONJ -5.18 -8.47
??? -6.02
N -7.99 ≤−7.99PREP -7.59 ≤−7.59PRO -7.99 ≤−7.99
V -3.56 -5.21come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and = −5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 20: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/20.jpg)
Viterbi Algorithm
POS logδ1(j) logδ1(j)θj ,CONJ logδ2(CONJ)MOD -5.18 -8.48DET -4.89 -7.72
CONJ -5.18 -8.47
???
-6.02N -7.99 ≤−7.99
PREP -7.59 ≤−7.59PRO -7.99 ≤−7.99
V -3.56 -5.21come and get it
log�
δ0(V)θV, CONJ�
= logδ0(k)+ logθV, CONJ =−3.56+−1.65
logδ1(k) =−5.21− logβCONJ, and =
−5.21−0.64
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 5 / 17
![Page 21: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/21.jpg)
Viterbi Algorithm
POS δ1(k) δ2(k) b2 δ3(k) b3 δ4(k) b4
MOD -5.18
-0.00 X -0.00 X -0.00 X
DET -4.89
-0.00 X -0.00 X -0.00 X
CONJ -5.18 -6.02 V
-0.00 X -0.00 X
N -7.99
-0.00 X -0.00 X -0.00 X
PREP -7.59
-0.00 X -0.00 X -0.00 X
PRO -7.99
-0.00 X -0.00 X -14.6 V
V -3.56
-0.00 X -9.03 CONJ -0.00 X
WORD come and get it
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 6 / 17
![Page 22: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/22.jpg)
Viterbi Algorithm
POS δ1(k) δ2(k) b2 δ3(k) b3 δ4(k) b4
MOD -5.18 -0.00 X
-0.00 X -0.00 X
DET -4.89 -0.00 X
-0.00 X -0.00 X
CONJ -5.18 -6.02 V
-0.00 X -0.00 X
N -7.99 -0.00 X
-0.00 X -0.00 X
PREP -7.59 -0.00 X
-0.00 X -0.00 X
PRO -7.99 -0.00 X
-0.00 X -14.6 V
V -3.56 -0.00 X
-9.03 CONJ -0.00 X
WORD come and get it
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 6 / 17
![Page 23: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/23.jpg)
Viterbi Algorithm
POS δ1(k) δ2(k) b2 δ3(k) b3 δ4(k) b4
MOD -5.18 -0.00 X -0.00 X
-0.00 X
DET -4.89 -0.00 X -0.00 X
-0.00 X
CONJ -5.18 -6.02 V -0.00 X
-0.00 X
N -7.99 -0.00 X -0.00 X
-0.00 X
PREP -7.59 -0.00 X -0.00 X
-0.00 X
PRO -7.99 -0.00 X -0.00 X
-14.6 V
V -3.56 -0.00 X -9.03 CONJ
-0.00 X
WORD come and get it
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 6 / 17
![Page 24: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/24.jpg)
Viterbi Algorithm
POS δ1(k) δ2(k) b2 δ3(k) b3 δ4(k) b4
MOD -5.18 -0.00 X -0.00 X -0.00 XDET -4.89 -0.00 X -0.00 X -0.00 X
CONJ -5.18 -6.02 V -0.00 X -0.00 XN -7.99 -0.00 X -0.00 X -0.00 X
PREP -7.59 -0.00 X -0.00 X -0.00 XPRO -7.99 -0.00 X -0.00 X -14.6 V
V -3.56 -0.00 X -9.03 CONJ -0.00 XWORD come and get it
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 6 / 17
![Page 25: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/25.jpg)
EM Algorithm
What if you don’t have training data?
� You can still learn a hmm� Using a general technique called expectation maximization
� Take a guess at the parameters� Figure out latent variables� Find the parameters that best explain the latent variables� Repeat
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 7 / 17
![Page 26: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/26.jpg)
EM Algorithm
What if you don’t have training data?
� You can still learn a hmm� Using a general technique called expectation maximization� Take a guess at the parameters� Figure out latent variables� Find the parameters that best explain the latent variables� Repeat
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 7 / 17
![Page 27: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/27.jpg)
EM Algorithm
em for hmm
Model Parameters
We need to start with model parameters
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 8 / 17
![Page 28: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/28.jpg)
EM Algorithm
em for hmm
Model Parameters
π, β, θ
We can initialize these any way we want
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 8 / 17
![Page 29: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/29.jpg)
EM Algorithm
em for hmm
Model Parameters
E stepπ, β, θ
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 8 / 17
![Page 30: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/30.jpg)
EM Algorithm
em for hmm
come and get it
Model Parameters Latent Variables
E stepπ, β, θ
We compute the E-step based on our data
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 8 / 17
![Page 31: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/31.jpg)
EM Algorithm
em for hmm
V
C
P
comeV
C
P
andV
C
P
getV
C
P
it
Model Parameters Latent Variables
E stepπ, β, θ
Each word in our dataset could take any part of speech
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 8 / 17
![Page 32: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/32.jpg)
EM Algorithm
em for hmm
come and get it
Model Parameters Latent Variables
E stepπ, β, θ
But we don’t know which state was used for each word
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 8 / 17
![Page 33: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/33.jpg)
EM Algorithm
em for hmm
come and get it
Model Parameters Latent Variables
E stepπ, β, θ
Determine the probability of being in each latent state using Forward /Backward
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 8 / 17
![Page 34: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/34.jpg)
EM Algorithm
em for hmm
come and get it
Model Parameters Latent Variables
E step
M step
π, β, θ
Calculate new parameters:
θi =ni +αi
∑
k Ep [nk ]+αk(5)
Where the expected counts are from the lattice
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 8 / 17
![Page 35: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/35.jpg)
EM Algorithm
em for hmm
come and get it
π, β, θ
Model Parameters Latent Variables
E step
M step
π, β, θ
Replace old parameters (and start over)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 8 / 17
![Page 36: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/36.jpg)
EM Algorithm
Hard vs. Full EM
Hard EM
Train only on the most likely sentence(Viterbi)
� Faster: E-step is faster
� Faster: Fewer iterations
Full EM
Compute probability of all possiblesequences
� More accurate: Doesn’t get stuckin local optima as easily
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 9 / 17
![Page 37: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/37.jpg)
EM Algorithm
Warning about next homework(s)
� Kaggle competition
� Thus, late days not very useful
� Following homework is not computational
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 10 / 17
![Page 38: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/38.jpg)
EM Algorithm
Garden Pathing
What is the probability of the sequence “a/Det blue/Adj boat/N”?
πdβd ,theθd ,aβa,blueθa,nβn,boat = (5)
0.3 ∗0.6 ∗0.4 ∗0.3 ∗0.5 ∗0.1= 0.00108 (6)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 11 / 17
![Page 39: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/39.jpg)
EM Algorithm
Garden Pathing
What is the probability of the sequence “a/Det blue/Adj boat/N”?
πdβd ,theθd ,aβa,blueθa,nβn,boat = (5)
0.3 ∗0.6 ∗0.4 ∗0.3 ∗0.5 ∗0.1= 0.00108 (6)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 11 / 17
![Page 40: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/40.jpg)
EM Algorithm
Garden Pathing
What is the probability of the sequence “a/Det blue/Adj boat/N”?
πdβd ,theθd ,aβa,blueθa,nβn,boat = (5)
0.3 ∗0.6 ∗0.4 ∗0.3 ∗0.5 ∗0.1= 0.00108 (6)
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 11 / 17
![Page 41: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/41.jpg)
EM Algorithm
Example . . .
Base case
1. δ1(a) =−4.6
2. δ1(v) =−5.7
3. δ1(d) =−1.7
4. δ1(n) =−4.6
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 12 / 17
![Page 42: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/42.jpg)
EM Algorithm
Example . . .
Base case
1. δ1(a) =−4.6
2. δ1(v) =−5.7
3. δ1(d) =−1.7
4. δ1(n) =−4.6
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 12 / 17
![Page 43: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/43.jpg)
EM Algorithm
Example . . .
Base case
1. δ1(a) =−4.6
2. δ1(v) =−5.7
3. δ1(d) =−1.7
4. δ1(n) =−4.6
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 12 / 17
![Page 44: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/44.jpg)
EM Algorithm
Example . . .
Base case
1. δ1(a) =−4.6
2. δ1(v) =−5.7
3. δ1(d) =−1.7
4. δ1(n) =−4.6
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 12 / 17
![Page 45: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/45.jpg)
EM Algorithm
Example . . .
Base case
1. δ1(a) =−4.6
2. δ1(v) =−5.7
3. δ1(d) =−1.7
4. δ1(n) =−4.6
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 12 / 17
![Page 46: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/46.jpg)
EM Algorithm
Example . . .
Second position
1. δ2(a)=max
−5.8︸︷︷︸
a
,−7.3︸︷︷︸
v
,−2.6︸︷︷︸
d
,−7.6︸︷︷︸
n
!
+−1.2=−2.6+−1.2=−3.8
2. δ2(v)=max
−6.9︸︷︷︸
a
,−7.3︸︷︷︸
v
,−4.7︸︷︷︸
d
,−4.8︸︷︷︸
n
!
+−2.3=−4.7+−2.3=−7.0
3. δ2(d)=max
−6.9︸︷︷︸
a
,−6.9︸︷︷︸
v
,−4.0︸︷︷︸
d
,−7.6︸︷︷︸
n
!
+−3.7=−4.0+−3.7=−7.7
4. δ2(n)=max
−5.3︸︷︷︸
a
,−6.9︸︷︷︸
v
,−2.5︸︷︷︸
d
,−6.9︸︷︷︸
n
!
+−1.9=−2.5+−1.9=−4.4
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 13 / 17
![Page 47: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/47.jpg)
EM Algorithm
Example . . .
Second position
1. δ2(a)=max
−5.8︸︷︷︸
a
,−7.3︸︷︷︸
v
,−2.6︸︷︷︸
d
,−7.6︸︷︷︸
n
!
+−1.2=−2.6+−1.2=−3.8
2. δ2(v)=max
−6.9︸︷︷︸
a
,−7.3︸︷︷︸
v
,−4.7︸︷︷︸
d
,−4.8︸︷︷︸
n
!
+−2.3=−4.7+−2.3=−7.0
3. δ2(d)=max
−6.9︸︷︷︸
a
,−6.9︸︷︷︸
v
,−4.0︸︷︷︸
d
,−7.6︸︷︷︸
n
!
+−3.7=−4.0+−3.7=−7.7
4. δ2(n)=max
−5.3︸︷︷︸
a
,−6.9︸︷︷︸
v
,−2.5︸︷︷︸
d
,−6.9︸︷︷︸
n
!
+−1.9=−2.5+−1.9=−4.4
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 13 / 17
![Page 48: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/48.jpg)
EM Algorithm
Example . . .
Second position
1. δ2(a)=max
−5.8︸︷︷︸
a
,−7.3︸︷︷︸
v
,−2.6︸︷︷︸
d
,−7.6︸︷︷︸
n
!
+−1.2=−2.6+−1.2=−3.8
2. δ2(v)=max
−6.9︸︷︷︸
a
,−7.3︸︷︷︸
v
,−4.7︸︷︷︸
d
,−4.8︸︷︷︸
n
!
+−2.3=−4.7+−2.3=−7.0
3. δ2(d)=max
−6.9︸︷︷︸
a
,−6.9︸︷︷︸
v
,−4.0︸︷︷︸
d
,−7.6︸︷︷︸
n
!
+−3.7=−4.0+−3.7=−7.7
4. δ2(n)=max
−5.3︸︷︷︸
a
,−6.9︸︷︷︸
v
,−2.5︸︷︷︸
d
,−6.9︸︷︷︸
n
!
+−1.9=−2.5+−1.9=−4.4
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 13 / 17
![Page 49: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/49.jpg)
EM Algorithm
Example . . .
Second position
1. δ2(a)=max
−5.8︸︷︷︸
a
,−7.3︸︷︷︸
v
,−2.6︸︷︷︸
d
,−7.6︸︷︷︸
n
!
+−1.2=−2.6+−1.2=−3.8
2. δ2(v)=max
−6.9︸︷︷︸
a
,−7.3︸︷︷︸
v
,−4.7︸︷︷︸
d
,−4.8︸︷︷︸
n
!
+−2.3=−4.7+−2.3=−7.0
3. δ2(d)=max
−6.9︸︷︷︸
a
,−6.9︸︷︷︸
v
,−4.0︸︷︷︸
d
,−7.6︸︷︷︸
n
!
+−3.7=−4.0+−3.7=−7.7
4. δ2(n)=max
−5.3︸︷︷︸
a
,−6.9︸︷︷︸
v
,−2.5︸︷︷︸
d
,−6.9︸︷︷︸
n
!
+−1.9=−2.5+−1.9=−4.4
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 13 / 17
![Page 50: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/50.jpg)
EM Algorithm
Example . . .
Third position
1. δ3(a)=max
−5.0︸︷︷︸
a
,−8.6︸︷︷︸
v
,−8.6︸︷︷︸
d
,−7.4︸︷︷︸
n
!
+−2.3=−5.0+−2.3=−7.3
2. δ3(v)=max
−6.1︸︷︷︸
a
,−8.6︸︷︷︸
v
,−10.7︸ ︷︷ ︸
d
,−4.6︸︷︷︸
n
!
+−0.9=−4.6+−0.9=−5.5
3. δ3(d)=max
−6.1︸︷︷︸
a
,−8.2︸︷︷︸
v
,−10.0︸ ︷︷ ︸
d
,−7.4︸︷︷︸
n
!
+−3.7=−6.1+−3.7=−9.8
4. δ3(n)=max
−4.5︸︷︷︸
a
,−8.2︸︷︷︸
v
,−8.5︸︷︷︸
d
,−6.7︸︷︷︸
n
!
+−0.9=−4.5+−0.9=−5.4
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 14 / 17
![Page 51: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/51.jpg)
EM Algorithm
Example . . .
Third position
1. δ3(a)=max
−5.0︸︷︷︸
a
,−8.6︸︷︷︸
v
,−8.6︸︷︷︸
d
,−7.4︸︷︷︸
n
!
+−2.3=−5.0+−2.3=−7.3
2. δ3(v)=max
−6.1︸︷︷︸
a
,−8.6︸︷︷︸
v
,−10.7︸ ︷︷ ︸
d
,−4.6︸︷︷︸
n
!
+−0.9=−4.6+−0.9=−5.5
3. δ3(d)=max
−6.1︸︷︷︸
a
,−8.2︸︷︷︸
v
,−10.0︸ ︷︷ ︸
d
,−7.4︸︷︷︸
n
!
+−3.7=−6.1+−3.7=−9.8
4. δ3(n)=max
−4.5︸︷︷︸
a
,−8.2︸︷︷︸
v
,−8.5︸︷︷︸
d
,−6.7︸︷︷︸
n
!
+−0.9=−4.5+−0.9=−5.4
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 14 / 17
![Page 52: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/52.jpg)
EM Algorithm
Example . . .
Third position
1. δ3(a)=max
−5.0︸︷︷︸
a
,−8.6︸︷︷︸
v
,−8.6︸︷︷︸
d
,−7.4︸︷︷︸
n
!
+−2.3=−5.0+−2.3=−7.3
2. δ3(v)=max
−6.1︸︷︷︸
a
,−8.6︸︷︷︸
v
,−10.7︸ ︷︷ ︸
d
,−4.6︸︷︷︸
n
!
+−0.9=−4.6+−0.9=−5.5
3. δ3(d)=max
−6.1︸︷︷︸
a
,−8.2︸︷︷︸
v
,−10.0︸ ︷︷ ︸
d
,−7.4︸︷︷︸
n
!
+−3.7=−6.1+−3.7=−9.8
4. δ3(n)=max
−4.5︸︷︷︸
a
,−8.2︸︷︷︸
v
,−8.5︸︷︷︸
d
,−6.7︸︷︷︸
n
!
+−0.9=−4.5+−0.9=−5.4
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 14 / 17
![Page 53: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/53.jpg)
EM Algorithm
Example . . .
Third position
1. δ3(a)=max
−5.0︸︷︷︸
a
,−8.6︸︷︷︸
v
,−8.6︸︷︷︸
d
,−7.4︸︷︷︸
n
!
+−2.3=−5.0+−2.3=−7.3
2. δ3(v)=max
−6.1︸︷︷︸
a
,−8.6︸︷︷︸
v
,−10.7︸ ︷︷ ︸
d
,−4.6︸︷︷︸
n
!
+−0.9=−4.6+−0.9=−5.5
3. δ3(d)=max
−6.1︸︷︷︸
a
,−8.2︸︷︷︸
v
,−10.0︸ ︷︷ ︸
d
,−7.4︸︷︷︸
n
!
+−3.7=−6.1+−3.7=−9.8
4. δ3(n)=max
−4.5︸︷︷︸
a
,−8.2︸︷︷︸
v
,−8.5︸︷︷︸
d
,−6.7︸︷︷︸
n
!
+−0.9=−4.5+−0.9=−5.4
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 14 / 17
![Page 54: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/54.jpg)
EM Algorithm
Example . . .
Fourth position
1. δ4(a)=max
−8.5︸︷︷︸
a
,−7.2︸︷︷︸
v
,−10.7︸ ︷︷ ︸
d
,−8.4︸︷︷︸
n
!
+−3.4=−7.2+−3.4=−10.6
2. δ4(v)=max
−9.6︸︷︷︸
a
,−7.2︸︷︷︸
v
,−12.8︸ ︷︷ ︸
d
,−5.7︸︷︷︸
n
!
+−3.4=−5.7+−3.4=−9.1
3. δ4(d)=max
−9.6︸︷︷︸
a
,−6.8︸︷︷︸
v
,−12.1︸ ︷︷ ︸
d
,−8.4︸︷︷︸
n
!
+−0.5=−6.8+−0.5=−7.3
4. δ4(n)=max
−8.0︸︷︷︸
a
,−6.8︸︷︷︸
v
,−10.6︸ ︷︷ ︸
d
,−7.7︸︷︷︸
n
!
+−3.4=−6.8+−3.4=−10.2
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 15 / 17
![Page 55: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/55.jpg)
EM Algorithm
Example . . .
Fourth position
1. δ4(a)=max
−8.5︸︷︷︸
a
,−7.2︸︷︷︸
v
,−10.7︸ ︷︷ ︸
d
,−8.4︸︷︷︸
n
!
+−3.4=−7.2+−3.4=−10.6
2. δ4(v)=max
−9.6︸︷︷︸
a
,−7.2︸︷︷︸
v
,−12.8︸ ︷︷ ︸
d
,−5.7︸︷︷︸
n
!
+−3.4=−5.7+−3.4=−9.1
3. δ4(d)=max
−9.6︸︷︷︸
a
,−6.8︸︷︷︸
v
,−12.1︸ ︷︷ ︸
d
,−8.4︸︷︷︸
n
!
+−0.5=−6.8+−0.5=−7.3
4. δ4(n)=max
−8.0︸︷︷︸
a
,−6.8︸︷︷︸
v
,−10.6︸ ︷︷ ︸
d
,−7.7︸︷︷︸
n
!
+−3.4=−6.8+−3.4=−10.2
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 15 / 17
![Page 56: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/56.jpg)
EM Algorithm
Example . . .
Fourth position
1. δ4(a)=max
−8.5︸︷︷︸
a
,−7.2︸︷︷︸
v
,−10.7︸ ︷︷ ︸
d
,−8.4︸︷︷︸
n
!
+−3.4=−7.2+−3.4=−10.6
2. δ4(v)=max
−9.6︸︷︷︸
a
,−7.2︸︷︷︸
v
,−12.8︸ ︷︷ ︸
d
,−5.7︸︷︷︸
n
!
+−3.4=−5.7+−3.4=−9.1
3. δ4(d)=max
−9.6︸︷︷︸
a
,−6.8︸︷︷︸
v
,−12.1︸ ︷︷ ︸
d
,−8.4︸︷︷︸
n
!
+−0.5=−6.8+−0.5=−7.3
4. δ4(n)=max
−8.0︸︷︷︸
a
,−6.8︸︷︷︸
v
,−10.6︸ ︷︷ ︸
d
,−7.7︸︷︷︸
n
!
+−3.4=−6.8+−3.4=−10.2
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 15 / 17
![Page 57: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/57.jpg)
EM Algorithm
Example . . .
Fourth position
1. δ4(a)=max
−8.5︸︷︷︸
a
,−7.2︸︷︷︸
v
,−10.7︸ ︷︷ ︸
d
,−8.4︸︷︷︸
n
!
+−3.4=−7.2+−3.4=−10.6
2. δ4(v)=max
−9.6︸︷︷︸
a
,−7.2︸︷︷︸
v
,−12.8︸ ︷︷ ︸
d
,−5.7︸︷︷︸
n
!
+−3.4=−5.7+−3.4=−9.1
3. δ4(d)=max
−9.6︸︷︷︸
a
,−6.8︸︷︷︸
v
,−12.1︸ ︷︷ ︸
d
,−8.4︸︷︷︸
n
!
+−0.5=−6.8+−0.5=−7.3
4. δ4(n)=max
−8.0︸︷︷︸
a
,−6.8︸︷︷︸
v
,−10.6︸ ︷︷ ︸
d
,−7.7︸︷︷︸
n
!
+−3.4=−6.8+−3.4=−10.2
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 15 / 17
![Page 58: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/58.jpg)
EM Algorithm
Example . . .
Fifth position
1. δ5(a)=max
−11.8︸ ︷︷ ︸
a
,−10.7︸ ︷︷ ︸
v
,−8.2︸︷︷︸
d
,−13.2︸ ︷︷ ︸
n
!
+−2.3=−8.2+−2.3=−11
2. δ5(v)=max
−12.9︸ ︷︷ ︸
a
,−10.7︸ ︷︷ ︸
v
,−10.3︸ ︷︷ ︸
d
,−10.4︸ ︷︷ ︸
n
!
+−1.6=−10.3+−1.6=−12
3. δ5(d)=max
−12.9︸ ︷︷ ︸
a
,−10.3︸ ︷︷ ︸
v
,−9.6︸︷︷︸
d
,−13.2︸ ︷︷ ︸
n
!
+−3.7=−9.6+−3.7=−13
4. δ5(n)=max
−11.3︸ ︷︷ ︸
a
,−10.3︸ ︷︷ ︸
v
,−8.1︸︷︷︸
d
,−12.5︸ ︷︷ ︸
n
!
+−1.2=−8.1+−1.2=−9.3
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 16 / 17
![Page 59: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/59.jpg)
EM Algorithm
Example . . .
Fifth position
1. δ5(a)=max
−11.8︸ ︷︷ ︸
a
,−10.7︸ ︷︷ ︸
v
,−8.2︸︷︷︸
d
,−13.2︸ ︷︷ ︸
n
!
+−2.3=−8.2+−2.3=−11
2. δ5(v)=max
−12.9︸ ︷︷ ︸
a
,−10.7︸ ︷︷ ︸
v
,−10.3︸ ︷︷ ︸
d
,−10.4︸ ︷︷ ︸
n
!
+−1.6=−10.3+−1.6=−12
3. δ5(d)=max
−12.9︸ ︷︷ ︸
a
,−10.3︸ ︷︷ ︸
v
,−9.6︸︷︷︸
d
,−13.2︸ ︷︷ ︸
n
!
+−3.7=−9.6+−3.7=−13
4. δ5(n)=max
−11.3︸ ︷︷ ︸
a
,−10.3︸ ︷︷ ︸
v
,−8.1︸︷︷︸
d
,−12.5︸ ︷︷ ︸
n
!
+−1.2=−8.1+−1.2=−9.3
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 16 / 17
![Page 60: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/60.jpg)
EM Algorithm
Example . . .
Fifth position
1. δ5(a)=max
−11.8︸ ︷︷ ︸
a
,−10.7︸ ︷︷ ︸
v
,−8.2︸︷︷︸
d
,−13.2︸ ︷︷ ︸
n
!
+−2.3=−8.2+−2.3=−11
2. δ5(v)=max
−12.9︸ ︷︷ ︸
a
,−10.7︸ ︷︷ ︸
v
,−10.3︸ ︷︷ ︸
d
,−10.4︸ ︷︷ ︸
n
!
+−1.6=−10.3+−1.6=−12
3. δ5(d)=max
−12.9︸ ︷︷ ︸
a
,−10.3︸ ︷︷ ︸
v
,−9.6︸︷︷︸
d
,−13.2︸ ︷︷ ︸
n
!
+−3.7=−9.6+−3.7=−13
4. δ5(n)=max
−11.3︸ ︷︷ ︸
a
,−10.3︸ ︷︷ ︸
v
,−8.1︸︷︷︸
d
,−12.5︸ ︷︷ ︸
n
!
+−1.2=−8.1+−1.2=−9.3
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 16 / 17
![Page 61: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/61.jpg)
EM Algorithm
Example . . .
Fifth position
1. δ5(a)=max
−11.8︸ ︷︷ ︸
a
,−10.7︸ ︷︷ ︸
v
,−8.2︸︷︷︸
d
,−13.2︸ ︷︷ ︸
n
!
+−2.3=−8.2+−2.3=−11
2. δ5(v)=max
−12.9︸ ︷︷ ︸
a
,−10.7︸ ︷︷ ︸
v
,−10.3︸ ︷︷ ︸
d
,−10.4︸ ︷︷ ︸
n
!
+−1.6=−10.3+−1.6=−12
3. δ5(d)=max
−12.9︸ ︷︷ ︸
a
,−10.3︸ ︷︷ ︸
v
,−9.6︸︷︷︸
d
,−13.2︸ ︷︷ ︸
n
!
+−3.7=−9.6+−3.7=−13
4. δ5(n)=max
−11.3︸ ︷︷ ︸
a
,−10.3︸ ︷︷ ︸
v
,−8.1︸︷︷︸
d
,−12.5︸ ︷︷ ︸
n
!
+−1.2=−8.1+−1.2=−9.3
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 16 / 17
![Page 62: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/62.jpg)
EM Algorithm
Example . . .
Reconstruction
For “the old man”, the reconstruction starts with the best part of speech atPosition 3, which is noun (-5.4), which has an adjective back pointer, whichas a back pointer to determiner. The overall sequence is “The/det old/adjman/n”.
For “the old man the boats”, the reconstruction starts with the best part ofspeech at Position 5, which is a noun (-9.3), which leads to the sequence“The/det old/n man/v the/det boats/n”.
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 17 / 17
![Page 63: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/63.jpg)
EM Algorithm
Example . . .
ReconstructionFor “the old man”, the reconstruction starts with the best part of speech atPosition 3, which is noun (-5.4), which has an adjective back pointer, whichas a back pointer to determiner. The overall sequence is “The/det old/adjman/n”.
For “the old man the boats”, the reconstruction starts with the best part ofspeech at Position 5, which is a noun (-9.3), which leads to the sequence“The/det old/n man/v the/det boats/n”.
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 17 / 17
![Page 64: Part of Speech Taggingusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/10b_viterbi.pdf · Part of Speech Tagging Natural Language Processing: Jordan Boyd-Graber University of Maryland](https://reader035.vdocuments.site/reader035/viewer/2022071212/6027981ad61c41470c553159/html5/thumbnails/64.jpg)
EM Algorithm
Example . . .
ReconstructionFor “the old man”, the reconstruction starts with the best part of speech atPosition 3, which is noun (-5.4), which has an adjective back pointer, whichas a back pointer to determiner. The overall sequence is “The/det old/adjman/n”.
For “the old man the boats”, the reconstruction starts with the best part ofspeech at Position 5, which is a noun (-9.3), which leads to the sequence“The/det old/n man/v the/det boats/n”.
Natural Language Processing: Jordan Boyd-Graber | UMD Part of Speech Tagging | 17 / 17