other sequence models · 2019. 11. 18. · slides courtesy rebecca knowles. preprocessing whereas...
TRANSCRIPT
![Page 1: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/1.jpg)
Latent Models:Sequence Models Beyond HMMs and
Machine Translation Alignment
CMSC 473/673
UMBC
![Page 2: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/2.jpg)
Outline
Review: EM for HMMs
Machine Translation Alignment
Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields
Recurrent Neural NetworksBasic DefinitionsExample in PyTorch
![Page 3: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/3.jpg)
Why Do We Need Both the Forward and Backward Algorithms? Compute posteriors
α(i, s) * p(s’ | s) * p(obs at i+1 | s’) * β(i+1, s’) =total probability of paths through the s→s’ arc (at time i)
α(i, s) * β(i, s) = total probability of paths through state s at step i
𝑝 𝑧𝑖 = 𝑠 𝑤1, ⋯ , 𝑤𝑁) =𝛼 𝑖, 𝑠 ∗ 𝛽(𝑖, 𝑠)
𝛼(𝑁 + 1, END)
𝑝 𝑧𝑖 = 𝑠, 𝑧𝑖+1 = 𝑠′ 𝑤1, ⋯ , 𝑤𝑁) =𝛼 𝑖, 𝑠 ∗ 𝑝 𝑠′ 𝑠 ∗ 𝑝 obs𝑖+1 𝑠′ ∗ 𝛽(𝑖 + 1, 𝑠′)
𝛼(𝑁 + 1, END)
![Page 4: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/4.jpg)
EM for HMMs0. Assume some value for your parameters
Two step, iterative algorithm
1. E-step: count under uncertainty, assuming these parameters
2. M-step: maximize log-likelihood, assuming these uncertain counts
estimated counts
pobs(w | s)
ptrans(s’ | s)
𝑝∗ 𝑧𝑖 = 𝑠 𝑤1, ⋯ ,𝑤𝑁) =𝛼 𝑖, 𝑠 ∗ 𝛽(𝑖, 𝑠)
𝛼(𝑁 + 1, END)
𝑝∗ 𝑧𝑖 = 𝑠, 𝑧𝑖+1 = 𝑠′ 𝑤1, ⋯ ,𝑤𝑁) =𝛼 𝑖, 𝑠 ∗ 𝑝 𝑠′ 𝑠 ∗ 𝑝 obs𝑖+1 𝑠′ ∗ 𝛽(𝑖 + 1, 𝑠′)
𝛼(𝑁 + 1, END)
![Page 5: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/5.jpg)
EM For HMMs (Baum-Welch
Algorithm)
α = computeForwards()
β = computeBackwards()
L = α[N+1][END]
for(i = N; i ≥ 0; --i) {
for(next = 0; next < K*; ++next) {
cobs(obsi+1 | next) += α[i+1][next]* β[i+1][next]/L
for(state = 0; state < K*; ++state) {
u = pobs(obsi+1 | next) * ptrans (next | state)
ctrans(next| state) +=
α[i][state] * u * β[i+1][next]/L
}
}
}
update pobs, ptrans using cobs, ctrans
![Page 6: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/6.jpg)
Semi-Supervised Learning
? ? ?? ? ?? ? ?? ? ?? ? ?? ? ?? ? ?? ? ?
labeled data:• human annotated• relatively small/few
examples
unlabeled data:• raw; not annotated• plentiful
EM
![Page 7: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/7.jpg)
Outline
Review: EM for HMMs
Machine Translation Alignment
Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields
Recurrent Neural NetworksBasic DefinitionsExample in PyTorch
![Page 8: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/8.jpg)
Warren Weaver’s Note
When I look at an article in Russian, I say “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.”
(Warren Weaver, 1947)http://www.mt-archive.info/Weaver-1949.pdf
Slides courtesy Rebecca Knowles
![Page 9: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/9.jpg)
Noisy Channel Model
language
язы́к Decode
speak
text
word
language
Rerank
speak
text
word
language
written in (clean) English
observed Russian (noisy)
text
translation/decode model
(clean) language model
English
Slides courtesy Rebecca Knowles
![Page 10: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/10.jpg)
Noisy Channel Model
Decode Rerank
written in (clean) English
observed Russian (noisy)
text
translation/decode model
(clean) language model
English
language
язы́к
speak
text
word
language
speak
text
word
language
Slides courtesy Rebecca Knowles
![Page 11: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/11.jpg)
Noisy Channel Model
Decode Rerank
written in (clean) English
observed Russian (noisy)
text
translation/decode model
(clean) language model
English
language
язы́к
speak
text
word
language
speak
text
word
language
Slides courtesy Rebecca Knowles
![Page 12: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/12.jpg)
Translation
Translate French (observed) into English:
The cat is on the chair.
Le chat est sur la chaise.
Slides courtesy Rebecca Knowles
![Page 13: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/13.jpg)
Translation
Translate French (observed) into English:
The cat is on the chair.
Le chat est sur la chaise.
Slides courtesy Rebecca Knowles
![Page 14: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/14.jpg)
Translation
Translate French (observed) into English:
The cat is on the chair.
Le chat est sur la chaise.
Slides courtesy Rebecca Knowles
![Page 15: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/15.jpg)
?
Alignment
The cat is on the chair.
Le chat est sur la chaise.
The cat is on the chair.
Le chat est sur la chaise.
Slides courtesy Rebecca Knowles
![Page 16: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/16.jpg)
Parallel Texts
Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world,
Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people,
Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law,
Whereas it is essential to promote the development of friendly relations between nations,…
http://www.un.org/en/universal-declaration-human-rights/
Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj.
Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli.
Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan.
Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan.…
http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn
Slides courtesy Rebecca Knowles
![Page 17: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/17.jpg)
Preprocessing
Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world,
Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people,
Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law,
Whereas it is essential to promote the development of friendly relations between nations,…
http://www.un.org/en/universal-declaration-human-rights/Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj.
Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli.
Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan.
Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan.…
http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn
• Sentence align• Clean corpus• Tokenize• Handle case• Word segmentation
(morphological, BPE, etc.)• Language-specific
preprocessing (example: pre-reordering)
• ...
Slides courtesy Rebecca Knowles
![Page 18: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/18.jpg)
Alignments
If we had word-aligned text, we could easily estimate P(f|e).
But we don’t usually have word alignments, and they are expensive to produce by hand…
If we had P(f|e) we could produce alignments automatically.
Slides courtesy Rebecca Knowles
![Page 19: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/19.jpg)
IBM Model 1 (1993)
• Lexical Translation Model• Word Alignment Model• The simplest of the original IBM models• For all IBM models, see the original paper
(Brown et al, 1993): http://www.aclweb.org/anthology/J93-2003
Slides courtesy Rebecca Knowles
![Page 20: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/20.jpg)
Simplified IBM 1
• We’ll work through an example with a simplified version of IBM Model 1
• Figures and examples are drawn from A Statistical MT Tutorial Workbook, Section 27, (Knight, 1999)
• Simplifying assumption: each source word must translate to exactly one target word and vice versa
Slides courtesy Rebecca Knowles
![Page 21: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/21.jpg)
IBM Model 1 (1993)
f: vector of French words
(visualization of alignment)
e: vector of English words
a: vector of alignment indices
Le chat est sur la chaise verte
The cat is on the green chair
0 1 2 3 4 6 5
Slides courtesy Rebecca Knowles
![Page 22: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/22.jpg)
IBM Model 1 (1993)
f: vector of French words
(visualization of alignment)
e: vector of English words
a: vector of alignment indices
t(fj|ei) : translation probability of the word fj given the word ei
Le chat est sur la chaise verte
The cat is on the green chair
0 1 2 3 4 6 5
Slides courtesy Rebecca Knowles
![Page 23: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/23.jpg)
Model and Parameters
Want: P(f|e)But don’t know how to train this directly…
Solution: Use P(a, f|e), where a is an alignmentRemember:
Slides courtesy Rebecca Knowles
![Page 24: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/24.jpg)
Model and Parameters: Intuition
Translation prob.:
Example:
Interpretation:How probable is it that we see fj given ei
Slides courtesy Rebecca Knowles
![Page 25: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/25.jpg)
Model and Parameters: Intuition
Alignment/translation prob.:
Example (visual representation of a):
P( | “the cat”) < P( | “the cat”)
Interpretation:How probable are the alignment a and the translation f (given e)
le chat
the cat
le chat
the cat
Slides courtesy Rebecca Knowles
![Page 26: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/26.jpg)
Model and Parameters: Intuition
Alignment prob.:Example:
P( | “le chat”, “the cat”) < P( | “le chat”, “the cat”)
Interpretation:How probable is alignment a (given e and f)
Slides courtesy Rebecca Knowles
![Page 27: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/27.jpg)
Model and Parameters
How to compute:
Slides courtesy Rebecca Knowles
![Page 28: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/28.jpg)
Parameters
For IBM model 1, we can compute all parameters given translation parameters:
How many of these are there?
Slides courtesy Rebecca Knowles
![Page 29: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/29.jpg)
Parameters
For IBM model 1, we can compute all parameters given translation parameters:
How many of these are there?|French vocabulary| x |English vocabulary|
Slides courtesy Rebecca Knowles
![Page 30: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/30.jpg)
Data
Two sentence pairs:
English French
b c x y
b y
Slides courtesy Rebecca Knowles
![Page 31: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/31.jpg)
All Possible Alignments
x y
b c
x y
b c
y
b
(French: x, y)
(English: b, c)
Remember:simplifying assumption that each word must be aligned exactly once
Slides courtesy Rebecca Knowles
![Page 32: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/32.jpg)
Expectation Maximization (EM)0. Assume some value for and compute other parameter values
Two step, iterative algorithm
1. E-step: count alignments and translations under uncertainty, assuming these parameters
2. M-step: maximize log-likelihood (update parameters), using uncertain counts
estimated counts
P( | “the cat”)
P( | “the cat”)le chat
le chat
Slides courtesy Rebecca Knowles
![Page 33: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/33.jpg)
Review of IBM Model 1 & EM
Iteratively learned an alignment/translation model from sentence-aligned text (without “gold standard” alignments)
Model can now be used for alignment and/or word-level translation
We explored a simplified version of this; IBM Model 1 allows more types of alignments
Slides courtesy Rebecca Knowles
![Page 34: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/34.jpg)
Why is Model 1 insufficient?
Why won’t this produce great translations?Indifferent to order (language model may help?)Translates one word at a timeTranslates each word in isolation...
Slides courtesy Rebecca Knowles
![Page 35: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/35.jpg)
Uses for Alignments
Component of machine translation systems
Produce a translation lexicon automatically
Cross-lingual projection/extraction of information
Supervision for training other models (for example, neural MT systems)
Slides courtesy Rebecca Knowles
![Page 36: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/36.jpg)
Evaluating Machine Translation
Human evaluations:Test set (source, human reference translations, MT output)
Humans judge the quality of MT output (in one of several possible ways)
Koehn (2017), http://mt-class.org/jhu/slides/lecture-evaluation.pdf
Slides courtesy Rebecca Knowles
![Page 37: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/37.jpg)
Evaluating Machine Translation
Automatic evaluations:Test set (source, human reference translations, MT output)
Aim to mimic (correlate with) human evaluations
Many metrics:TER (Translation Error/Edit Rate)
HTER (Human-Targeted Translation Edit Rate)
BLEU (Bilingual Evaluation Understudy)
METEOR (Metric for Evaluation of Translation with Explicit Ordering)
Slides courtesy Rebecca Knowles
![Page 38: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/38.jpg)
Machine Translation Alignment Now
Explicitly with fancier IBM models
Implicitly/learned jointly with attention in recurrent neural networks (RNNs)
![Page 39: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/39.jpg)
Outline
Review: EM for HMMs
Machine Translation Alignment
Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields
Recurrent Neural NetworksBasic DefinitionsExample in PyTorch
![Page 40: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/40.jpg)
Recall: N-gram to Maxent to Neural Language Models
predict the next word
given some context…
𝑝 𝑤𝑖 𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1) ∝ 𝑐𝑜𝑢𝑛𝑡(𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1, 𝑤𝑖)
wi-3 wi-2
wi
wi-1
compute beliefs about what is likely…
![Page 41: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/41.jpg)
Recall: N-gram to Maxent to Neural Language Models
predict the next word
given some context…wi-3 wi-2
wi
wi-1
compute beliefs about what is likely…
𝑝 𝑤𝑖 𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1) = softmax(𝜃 ⋅ 𝑓(𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1, 𝑤𝑖))
![Page 42: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/42.jpg)
Hidden Markov Model Representation
𝑝 𝑧1, 𝑤1, 𝑧2, 𝑤2, … , 𝑧𝑁, 𝑤𝑁 = 𝑝 𝑧1| 𝑧0 𝑝 𝑤1|𝑧1 ⋯𝑝 𝑧𝑁| 𝑧𝑁−1 𝑝 𝑤𝑁|𝑧𝑁
=ෑ
𝑖
𝑝 𝑤𝑖|𝑧𝑖 𝑝 𝑧𝑖| 𝑧𝑖−1emission
probabilities/parameterstransitionprobabilities/parameters
z1
w1
…
w2 w3 w4
z2 z3 z4
represent the probabilities and independence assumptions in a graph
![Page 43: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/43.jpg)
A Different Model’s Representation
z1
w1
…
w2 w3 w4
z2 z3 z4
represent the probabilities and independence assumptions in a graph
![Page 44: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/44.jpg)
A Different Model’s Representation
z1
w1
…
w2 w3 w4
z2 z3 z4
represent the probabilities and independence assumptions in a graph
𝑝 𝑧1, 𝑧2, … , 𝑧𝑁|𝑤1, 𝑤2, … , 𝑤𝑁 = 𝑝 𝑧1| 𝑧0, 𝑤1 ⋯𝑝 𝑧𝑁| 𝑧𝑁−1, 𝑤𝑁
=ෑ
𝑖
𝑝 𝑧𝑖| 𝑧𝑖−1, 𝑤𝑖
![Page 45: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/45.jpg)
A Different Model’s Representation
z1
w1
…
w2 w3 w4
z2 z3 z4
represent the probabilities and independence assumptions in a graph
𝑝 𝑧𝑖 𝑧𝑖−1, 𝑤𝑖) ∝ exp( 𝜃𝑇𝑓 𝑤𝑖 , 𝑧𝑖−1, 𝑧𝑖 )
𝑝 𝑧1, 𝑧2, … , 𝑧𝑁|𝑤1, 𝑤2, … , 𝑤𝑁 = 𝑝 𝑧1| 𝑧0, 𝑤1 ⋯𝑝 𝑧𝑁| 𝑧𝑁−1, 𝑤𝑁
=ෑ
𝑖
𝑝 𝑧𝑖| 𝑧𝑖−1, 𝑤𝑖
![Page 46: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/46.jpg)
A Different Model’s Representation
z1
w1
…
w2 w3 w4
z2 z3 z4
represent the probabilities and independence assumptions in a graph
Maximum Entropy Markov Model (MEMM)
𝑝 𝑧𝑖 𝑧𝑖−1, 𝑤𝑖) ∝ exp( 𝜃𝑇𝑓 𝑤𝑖 , 𝑧𝑖−1, 𝑧𝑖 )
𝑝 𝑧1, 𝑧2, … , 𝑧𝑁|𝑤1, 𝑤2, … , 𝑤𝑁 = 𝑝 𝑧1| 𝑧0, 𝑤1 ⋯𝑝 𝑧𝑁| 𝑧𝑁−1, 𝑤𝑁
=ෑ
𝑖
𝑝 𝑧𝑖| 𝑧𝑖−1, 𝑤𝑖
![Page 47: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/47.jpg)
MEMMs
Discriminative: don’t care about generating observed sequence at all
Maxent: use features
Problem: Label-Bias problem
z1
w1
…
w2 w3 w4
z2 z3 z4
![Page 48: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/48.jpg)
Label-Bias Problem
zi
wi
![Page 49: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/49.jpg)
Label-Bias Problem
zi
wi
1incoming mass must
sum to 1
![Page 50: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/50.jpg)
Label-Bias Problem
zi
wi
1 1incoming mass must
sum to 1outgoing mass must
sum to 1
![Page 51: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/51.jpg)
Label-Bias Problem
zi
wi
1 1incoming mass must
sum to 1outgoing mass must
sum to 1
observe, but do not generate (explain) the
observation
Take-aways:• the model can learn to
ignore observations• the model can get itself
stuck on “bad” paths
![Page 52: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/52.jpg)
Outline
Review: EM for HMMs
Machine Translation Alignment
Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields
Recurrent Neural NetworksBasic DefinitionsExample in PyTorch
![Page 53: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/53.jpg)
(Linear Chain) Conditional Random Fields
Discriminative: don’t care about generating observed sequence at all
Condition on the entire observed word sequence w1…wN
Maxent: use features
Solves the label-bias problem
z1 …
w1 w2 w3 w4 …
z2 z3 z4
![Page 54: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/54.jpg)
(Linear Chain) Conditional Random Fields
z1 …
w1 w2 w3 w4 …
z2 z3 z4
𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁)
∝ෑ
𝑖
exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝑤1, … , 𝑤𝑁 )
![Page 55: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/55.jpg)
(Linear Chain) Conditional Random Fields
z1 …
w1 w2 w3 w4 …
z2 z3 z4
𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁)
∝ෑ
𝑖
exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )
condition on entire sequence
![Page 56: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/56.jpg)
CRFs are Very Popular for {POS, NER, other sequence tasks}
• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =
(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))
z1 …
w1 w2 w3 w4 …
z2 z3 z4
𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝
ෑ
𝑖
exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )
![Page 57: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/57.jpg)
CRFs are Very Popular for {POS, NER, other sequence tasks}
• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =
(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))
• NERfpath p(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =
(𝑧𝑖−1 == Per & 𝑧𝑖 == Per &(syntactic path p involving 𝑤𝑖 exists ))
z1 …
w1 w2 w3 w4 …
z2 z3 z4
𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝
ෑ
𝑖
exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )
![Page 58: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/58.jpg)
CRFs are Very Popular for {POS, NER, other sequence tasks}
• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =
(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))
• NERfpath p(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =
(𝑧𝑖−1 == Per & 𝑧𝑖 == Per &(syntactic path p involving 𝑤𝑖 exists ))
z1 …
w1 w2 w3 w4 …
z2 z3 z4
𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝
ෑ
𝑖
exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )
Can’t easily do these with an HMM
➔
Conditional modelscan allow richer
features
![Page 59: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/59.jpg)
CRFs are Very Popular for {POS, NER, other sequence tasks}
• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =
(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))
• NERfpath p(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =
(𝑧𝑖−1 == Per & 𝑧𝑖 == Per &(syntactic path p involving 𝑤𝑖 exists ))
z1 …
w1 w2 w3 w4 …
z2 z3 z4
𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝
ෑ
𝑖
exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )
Can’t easily do these with an HMM
➔
Conditional modelscan allow richer
features
We’ll cover syntactic paths next class
![Page 60: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/60.jpg)
CRFs are Very Popular for {POS, NER, other sequence tasks}
• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =
(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))
• NERfpath p(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =
(𝑧𝑖−1 == Per & 𝑧𝑖 == Per &(syntactic path p involving 𝑤𝑖 exists ))
z1 …
w1 w2 w3 w4 …
z2 z3 z4
𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝
ෑ
𝑖
exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )
Can’t easily do these with an HMM
➔
Conditional modelscan allow richer
features
CRFs can be used in neural networks too:https://www.tensorflow.org/versions/r1.15/api_docs/python
/tf/contrib/crf/CrfForwardRnnCellhttps://pytorch-crf.readthedocs.io/en/stable/
![Page 61: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/61.jpg)
Conditional vs. Sequence
CRF Tutorial, Fig 1.2, Sutton & McCallum (2012)
We’ll cover these in 691: Graphical and Statistical
Models of Learning
![Page 62: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/62.jpg)
Outline
Review: EM for HMMs
Machine Translation Alignment
Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields
Recurrent Neural NetworksBasic DefinitionsExample in PyTorch
![Page 63: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/63.jpg)
Recall: N-gram to Maxent to NeuralLanguage Models
predict the next word
given some context…wi-3 wi-2
wi
wi-1
compute beliefs about what is likely…
𝑝 𝑤𝑖 𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1) = softmax(𝜃𝑤𝑖⋅ 𝒇(𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1))
create/use “distributed representations”… ei-3 ei-2 ei-1
combine these representations… C = f
matrix-vector product
ew
θwi
![Page 64: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/64.jpg)
A More Typical View of Recurrent Neural Language Modeling
wi-3 wi-2 wiwi-1
hi-3 hi-2 hi-1 hi
wi-2 wi-1 wi+1wi
![Page 65: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/65.jpg)
A More Typical View of Recurrent Neural Language Modeling
wi-3 wi-2 wiwi-1
hi-3 hi-2 hi-1 hi
wi-2 wi-1 wi+1wi
observe these words one at a time
![Page 66: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/66.jpg)
A More Typical View of Recurrent Neural Language Modeling
wi-3 wi-2 wiwi-1
hi-3 hi-2 hi-1 hi
wi-2 wi-1 wi+1wi
observe these words one at a time
predict the next word
![Page 67: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/67.jpg)
A More Typical View of Recurrent Neural Language Modeling
wi-3 wi-2 wiwi-1
hi-3 hi-2 hi-1 hi
wi-2 wi-1 wi+1wi
observe these words one at a time
predict the next word
from these hidden states
![Page 68: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/68.jpg)
wi-3 wi-2 wiwi-1
hi-3 hi-2 hi-1 hi
wi-2 wi-1 wi+1wi
observe these words one at a time
predict the next word
from these hidden states
“cell”
A More Typical View of Recurrent Neural Language Modeling
![Page 69: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/69.jpg)
wiwi-1
hi-1 hi
wi+1wi
A Recurrent Neural Network Cell
![Page 70: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/70.jpg)
wiwi-1
hi-1 hi
wi+1wi
A Recurrent Neural Network Cell
W W
![Page 71: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/71.jpg)
encoding
wiwi-1
hi-1 hi
wi+1wi
A Recurrent Neural Network Cell
W W
U U
![Page 72: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/72.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Recurrent Neural Network Cell
W W
U U
S S
![Page 73: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/73.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Simple Recurrent Neural Network Cell
W W
U U
S S
ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)
![Page 74: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/74.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Simple Recurrent Neural Network Cell
W W
U U
S S
ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)𝜎 𝑥 =
1
1 + exp(−𝑥)
![Page 75: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/75.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Simple Recurrent Neural Network Cell
W W
U U
S S
ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)𝜎 𝑥 =
1
1 + exp(−𝑥)
![Page 76: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/76.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Simple Recurrent Neural Network Cell
W W
U U
S S
ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)𝜎 𝑥 =
1
1 + exp(−𝑥)
![Page 77: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/77.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Simple Recurrent Neural Network Cell
W W
U U
S S
ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)𝜎 𝑥 =
1
1 + exp(−𝑥)ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)
![Page 78: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/78.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Simple Recurrent Neural Network Cell
W W
U U
S S
ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)
ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)
must learn matrices U, S, W
![Page 79: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/79.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Simple Recurrent Neural Network Cell
W W
U U
S S
ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)
ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)
must learn matrices U, S, W
suggested solution: gradient descent on prediction ability
![Page 80: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/80.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Simple Recurrent Neural Network Cell
W W
U U
S S
ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)
ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)
must learn matrices U, S, W
suggested solution: gradient descent on prediction ability
problem: they’re tied across inputs/timesteps
![Page 81: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/81.jpg)
decoding
encoding
wiwi-1
hi-1 hi
wi+1wi
A Simple Recurrent Neural Network Cell
W W
U U
S S
ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)
ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)
must learn matrices U, S, W
suggested solution: gradient descent on prediction ability
problem: they’re tied across inputs/timesteps
good news for you: many toolkits do this automatically
![Page 82: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/82.jpg)
Why Is Training RNNs Hard?
Conceptually, it can get strange
But really getting the gradient just requires many applications of the chain rule for derivatives
![Page 83: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/83.jpg)
Why Is Training RNNs Hard?
Conceptually, it can get strange
But really getting the gradient just requires many applications of the chain rule for derivatives
Vanishing gradients
Multiply the same matrices at eachtimestep➔multiply many matrices in the gradients
![Page 84: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/84.jpg)
Why Is Training RNNs Hard?Conceptually, it can get strange
But really getting the gradient just requires many applications of the chain rule for derivatives
Vanishing gradients
Multiply the same matrices at eachtimestep➔multiply many matrices in the gradients
One solution: clip the gradients to a max value
![Page 85: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/85.jpg)
Outline
Review: EM for HMMs
Machine Translation Alignment
Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields
Recurrent Neural NetworksBasic DefinitionsExample in PyTorch
![Page 86: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/86.jpg)
Natural Language Processing
from torch import *from keras import *
![Page 87: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/87.jpg)
Pick Your Toolkit
PyTorch
Deeplearning4j
TensorFlow
DyNet
Caffe
Keras
MxNet
Gluon
CNTK
…
Comparisons:https://en.wikipedia.org/wiki/Comparison_of_deep_learning_softwarehttps://deeplearning4j.org/compare-dl4j-tensorflow-pytorchhttps://github.com/zer0n/deepframeworks (older---2015)
![Page 88: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/88.jpg)
Defining A Simple RNN in Python (Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
wi-2
wi-1
wi-1
wi
wi
wi+1
hi-2 hi-1 hi
![Page 89: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/89.jpg)
Defining A Simple RNN in Python (Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
wi-2
wi-1
wi-1
wi
wi
wi+1
hi-2 hi-1 hi
![Page 90: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/90.jpg)
Defining A Simple RNN in Python (Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
wi-2
wi-1
wi-1
wi
wi
wi+1
hi-2 hi-1 hi
![Page 91: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/91.jpg)
Defining A Simple RNN in Python (Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
![Page 92: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/92.jpg)
Defining A Simple RNN in Python (Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
encode
wi-2
wi-1
wi-1
wi
wi
wi+1
hi-2 hi-1 hi
![Page 93: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/93.jpg)
Defining A Simple RNN in Python (Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
decode
wi-2
wi-1
wi-1
wi
wi
wi+1
hi-2 hi-1 hi
![Page 94: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/94.jpg)
Training A Simple RNN in Python(Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
![Page 95: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/95.jpg)
Training A Simple RNN in Python(Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
Negative log-likelihood
![Page 96: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/96.jpg)
Training A Simple RNN in Python(Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
Negative log-likelihood
get predictions
wi-2
wi-1
wi-1
wi
wi
wi+1
hi-2 hi-1 hi
![Page 97: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/97.jpg)
Training A Simple RNN in Python(Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
Negative log-likelihood
get predictions
eval predictions
![Page 98: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/98.jpg)
Training A Simple RNN in Python(Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
Negative log-likelihood
get predictions
eval predictions
compute gradient
![Page 99: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/99.jpg)
Training A Simple RNN in Python(Modified Very Slightly)
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
Negative log-likelihood
get predictions
eval predictions
compute gradient
perform SGD
![Page 100: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/100.jpg)
Another Solution: LSTMs/GRUs
LSTM: Long Short-Term Memory (Hochreiter & Schmidhuber, 1997)
GRU: Gated Recurrent Unit (Cho et al., 2014)
Basic Ideas: learn to forgethttp://colah.github.io/posts/2015-08-Understanding-LSTMs/
forget line
representation line
![Page 101: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all](https://reader035.vdocuments.site/reader035/viewer/2022070212/6102da8b0802201bfc0d3cdd/html5/thumbnails/101.jpg)
Outline
Review: EM for HMMs
Machine Translation Alignment
Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields
Recurrent Neural NetworksBasic DefinitionsExample in PyTorch