algorithms for nlp - carnegie mellon universitydemo.clab.cs.cmu.edu/11711fa18/slides/fa18...
TRANSCRIPT
![Page 1: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/1.jpg)
Machine Translation I
Yulia Tsvetkov – CMU
Slides: Chris Dyer – DeepMind; Taylor Berg-Kirkpatrick – CMU/UCSD, Dan Klein – UC Berkeley
Algorithms for NLP
![Page 2: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/2.jpg)
Dependency representation
![Page 3: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/3.jpg)
Dependency vs Constituency trees
![Page 4: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/4.jpg)
I prefer the morning flight through Denver
Я предпочитаю утренний перелет через ДенверЯ предпочитаю через Денвер утренний перелетУтренний перелет я предпочитаю через ДенверПерелет утренний я предпочитаю через ДенверЧерез Денвер я предпочитаю утренний перелетЯ через Денвер предпочитаю утренний перелет...
Languages with free word order
![Page 5: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/5.jpg)
Dependency Constraints
▪ Syntactic structure is complete (connectedness)▪ connectedness can be enforced by adding a special root node
▪ Syntactic structure is hierarchical (acyclicity)▪ there is a unique pass from the root to each vertex
▪ Every word has at most one syntactic head (single-head constraint)▪ except root that does not have incoming arcs
This makes the dependencies a tree
![Page 6: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/6.jpg)
Projectivity
▪ Projective parse▪ arcs don’t cross each other▪ mostly true for English
▪ Non-projective structures are needed to account for▪ long-distance dependencies▪ flexible word order
![Page 7: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/7.jpg)
Parsing algorithms
▪ Transition based▪ greedy choice of local transitions guided by a goodclassifier▪ deterministic▪ MaltParser (Nivre et al. 2008)
▪ Graph based▪ Minimum Spanning Tree for a sentence▪ McDonald et al.’s (2005) MSTParser▪ Martins et al.’s (2009) Turbo Parser
![Page 8: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/8.jpg)
Configuration for transition-based parsing
Buffer: unprocessed words
Stack: partially processed words
Oracle: a classifier
At each step choose:
▪ Shift▪ LeftArc or Reduce left▪ RightArc or Reduce right
![Page 9: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/9.jpg)
Shift-Reduce Parsing
Configuration:
▪ Stack, Buffer, Oracle, Set of dependency relations
Operations by a classifier at each step:
▪ Shift▪ remove w1 from the buffer, add it to the top of the stack as s1
▪ LeftArc or Reduce left▪ assert a head-dependent relation between s1 and s2▪ remove s2 from the stack
▪ RightArc or Reduce right ▪ assert a head-dependent relation between s2 and s1▪ remove s1 from the stack
![Page 10: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/10.jpg)
Shift-Reduce Parsing (arc-standard)
![Page 11: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/11.jpg)
Training an Oracle
▪ How to extract the training set?▪ if LeftArc → LeftArc▪ if RightArc▪ if s1 dependents have been processed → RightArc
▪ else → Shift
![Page 12: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/12.jpg)
Arc-Eager
▪ LEFTARC: Assert a head-dependent relation between s1 and b1; pop the stack.
▪ RIGHTARC: Assert a head-dependent relation between s1 and b1; shift b1 to be s1.
▪ SHIFT: Remove b1 and push it to be s1.▪ REDUCE: Pop the stack.
![Page 13: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/13.jpg)
Arc-Eager
![Page 14: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/14.jpg)
Graph-Based Parsing Algorithms
▪ Start with a fully-connected directed graph▪ Find a Minimum Spanning Tree▪ Chu and Liu (1965) and Edmonds (1967) algorithm
edge-factored approaches
![Page 15: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/15.jpg)
Chu-Liu Edmonds algorithm
Select best incoming edge for each node
Subtract its score from all incoming edges
Contract nodes if there are cycles
Stopping condition
Recursively compute MST
Expand contracted nodes
![Page 16: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/16.jpg)
Summary
▪ Transition-based▪ + Fast▪ + Rich features of context ▪ - Greedy decoding
▪ Graph-based▪ + Exact or close to exact decoding▪ - Weaker features
Well-engineered versions of the approaches achieve comparable accuracy (on English), but make different errors
→ combining the strategies results in a substantial boost in performance
![Page 17: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/17.jpg)
End of Previous Lecture
![Page 18: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/18.jpg)
Machine Translation
![Page 19: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/19.jpg)
Two Views of MT
▪ Direct modeling (aka pattern matching)▪ I have really good learning algorithms and a bunch of example
inputs (source language sentences) and outputs (target language translations)
▪ Code breaking (aka the noisy channel, Bayes rule)▪ I know the target language ▪ I have example translations texts (example enciphered data)
![Page 20: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/20.jpg)
MT as Direct Modeling
▪ one model does everything▪ trained to reproduce a corpus of translations
![Page 21: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/21.jpg)
MT as Code Breaking
![Page 22: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/22.jpg)
Noisy Channel Model
![Page 23: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/23.jpg)
Which is better?
▪ Noisy channel -▪ easy to use monolingual target language data▪ search happens under a product of two models (individual models
can be simple, product can be powerful)▪ obtaining probabilities requires renormalizing
▪ Direct model -▪ directly model the process you care about▪ model must be very powerful
![Page 24: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/24.jpg)
Where are we in 2018?
▪ Direct modeling is where most of the action is▪ Neural networks are very good at generalizing and conceptually
very simple▪ Inference in “product of two models” is hard
▪ Noisy channel ideas are incredibly important and still play a big role in how we think about translation
![Page 25: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/25.jpg)
A common problem
Both models must assign probabilities to how a sentence in one language translates into a sentence in another language.
![Page 26: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/26.jpg)
Levels of Transfer
![Page 27: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/27.jpg)
Levels of Transfer
![Page 28: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/28.jpg)
Levels of Transfer
![Page 29: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/29.jpg)
Levels of Transfer
![Page 30: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/30.jpg)
Levels of Transfer
![Page 31: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/31.jpg)
Levels of Transfer
![Page 32: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/32.jpg)
Levels of Transfer
![Page 33: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/33.jpg)
Levels of Transfer
![Page 34: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/34.jpg)
Levels of Transfer
![Page 35: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/35.jpg)
Levels of Transfer: The Vauquois triangle
![Page 36: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/36.jpg)
![Page 37: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/37.jpg)
![Page 38: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/38.jpg)
Ambiguities
▪ words▪ morphology▪ syntax▪ semantics▪ pragmatics
![Page 39: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/39.jpg)
Machine Translation: Examples
![Page 40: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/40.jpg)
Word-Level MT: Examples
▪ la politique de la haine . (Foreign Original)
▪ politics of hate . (Reference Translation)
▪ the policy of the hatred . (IBM4+N-grams+Stack)
▪ nous avons signé le protocole . (Foreign Original)
▪ we did sign the memorandum of agreement . (Reference Translation)
▪ we have signed the protocol . (IBM4+N-grams+Stack)
▪ où était le plan solide ? (Foreign Original)
▪ but where was the solid plan ? (Reference Translation)
▪ where was the economic base ? (IBM4+N-grams+Stack)
![Page 41: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/41.jpg)
Phrasal MT: Examples
![Page 42: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/42.jpg)
Learning from Data
![Page 43: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/43.jpg)
![Page 44: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/44.jpg)
![Page 45: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/45.jpg)
![Page 46: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/46.jpg)
![Page 47: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/47.jpg)
http://opus.nlpl.eu
![Page 48: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/48.jpg)
Learning from Data:The Noisy Channel
![Page 49: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/49.jpg)
![Page 50: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/50.jpg)
![Page 51: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/51.jpg)
▪ There is a lot more monolingual data in the world than translated data
▪ Easy to get about 1 trillion words of English by crawling the web
▪ With some work, you can get 1 billion translated words of English-French▪ What about English-German?▪ What about Japanese-Turkish?
![Page 52: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/52.jpg)
Phrase-Based MT
fesource phrase
target phrase
translation features
Translation Model
f
Language Model
e f
Reranking Model
feature weights
Parallel corpus
Monolingual corpus
Held-out parallel corpus
![Page 53: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/53.jpg)
Neural MT: Conditional Language Modeling
Slide credit: Kyunghyun Cho
![Page 54: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/54.jpg)
Research Problems
▪ How can we formalize the process of learning to translate from examples?
▪ How can we formalize the process of finding translations for new inputs?
▪ If our model produces many outputs, how do we find the best one?
▪ If we have a gold standard translation, how can we tell if our output is good or bad?
![Page 55: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/55.jpg)
MT Evaluation is Hard▪ Language variability: there is no single correct translation
▪ Human evaluation is subjective
▪ How good is good enough? Depends on the application of MT (publication, reading, …)
▪ Is system A better than system B?
▪ MT Evaluation is a research topic on its own.▪ How do we do the evaluation?▪ How do we measure whether an evaluation method is good?
![Page 56: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/56.jpg)
Human Evaluation
▪ Adequacy and Fluency ▪ Usually on a Likert scale (1 “not adequate at all” to 5 “completely
adequate”)
▪ Ranking of the outputs of different systems at the system level
![Page 57: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/57.jpg)
Human Evaluation
▪ Adequacy and Fluency ▪ Usually on a Likert scale (1 “not adequate at all” to 5 “completely
adequate”)
▪ Ranking of the outputs of different systems at the system level
▪ Post editing effort: how much effort does it take for a translator (or even monolingual) to “fix” the MT output so it is “good”
▪ Task-based evaluation: was the performance of the MT system sufficient to perform a task.
![Page 58: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/58.jpg)
Automatic Evaluation
▪ The BLEU score proposed by IBM (Papineni et al., 2002) ▪ Exact matches of n-grams ▪ Match against a set of reference translations for greater
discrimination between good and bad translations▪ Account for adequacy by looking at word precision▪ Account for fluency by calculating n-gram precisions for n=1,2,3,4 ▪ No recall (because difficult with multiple references) ▪ To compensate for recall: “brevity penalty”. Translates that are too
short are penalized ▪ Final score is the geometric average of the n-gram precisions, times
the brevity penalty ▪ Calculate the aggregate score over a large test set
![Page 59: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/59.jpg)
BLEU vs. Human Scores
![Page 60: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/60.jpg)
BLEU Scores
▪ More reference human translations results in better and more accurate scores
▪ General interpretability of scale▪ Scores over 30 (single reference) are generally understandable ▪ Scores over 50 (single reference) are generally good and fluent
![Page 61: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/61.jpg)
WMT 2018
http://www.statmt.org/wmt18/
![Page 62: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/62.jpg)
Systems Overview
![Page 63: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/63.jpg)
Corpus-Based MT
Modeling correspondences between languages
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Yo lo haré prontoNovel Sentence
I will do it soon
I will do it around
See you tomorrow
Machine translation system:
Model of translation
![Page 64: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/64.jpg)
Phrase-Based MT
fesource phrase
target phrase
translation features
Translation Model
f
Language Model
e f
Reranking Model
feature weights
Parallel corpus
Monolingual corpus
Held-out parallel corpus
![Page 65: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/65.jpg)
Phrase-Based System Overview
Sentence-aligned corpus
cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …
Phrase table(translation model)Word alignments
Many slides and examples from Philipp Koehn or John DeNero
![Page 66: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/66.jpg)
Word Alignment
![Page 67: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/67.jpg)
Lexical Translation
▪ How do we translate a word? Look it up in the dictionary
Haus : house, home, shell, household
▪ Multiple translations▪ Different word senses, different registers, different inflections (?)▪ house, home are common
▪ shell is specialized (the Haus of a snail is a shell)
![Page 68: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/68.jpg)
How common is each?
![Page 69: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/69.jpg)
MLE
![Page 70: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/70.jpg)
▪ Goal: a model▪ where e and f are complete English and Foreign sentences
Lexical Translation
![Page 71: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/71.jpg)
The Alignment Function
▪ Alignments can be visualized in by drawing links between two sentences, and they are represented as vectors of positions:
![Page 72: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/72.jpg)
Reordering
▪ Words may be reordered during translation.
![Page 73: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/73.jpg)
Word Dropping
▪ A source word may not be translated at all
![Page 74: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/74.jpg)
Word Insertion
▪ Words may be inserted during translation ▪ English just does not have an equivalent▪ But it must be explained - we typically assume every source
sentence contains a NULL token
![Page 75: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/75.jpg)
One-to-many Translation
▪ A source word may translate into more than one target word
![Page 76: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/76.jpg)
Many-to-one Translation
▪ More than one source word may not translate as a unit in lexical translation
![Page 77: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/77.jpg)
Mary did not slap the green witch
?
Generative Story
![Page 78: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/78.jpg)
Generative Story
Mary did not slap the green witch
Mary not slap slap slap the green witch n(3|slap)fertility
![Page 79: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/79.jpg)
Generative Story
Mary did not slap the green witch
Mary not slap slap slap the green witch
Mary not slap slap slap NULL the green witch
n(3|slap)
P(NULL)
fertility
NULL insertion
![Page 80: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/80.jpg)
Generative Story
Mary did not slap the green witch
Mary not slap slap slap the green witch
Mary not slap slap slap NULL the green witch
n(3|slap)
Mary no daba una botefada a la verde bruja
P(NULL)
t(la|the)
fertility
NULL insertion
lexical translation
![Page 81: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/81.jpg)
Generative Story
Mary did not slap the green witch
Mary not slap slap slap the green witch
Mary not slap slap slap NULL the green witch
n(3|slap)
Mary no daba una botefada a la verde bruja
_ _ _ _ _ _ _ _ _
P(NULL)
t(la|the)
d(j|i)
fertility
NULL insertion
lexical translation
distortion
![Page 82: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/82.jpg)
The IBM Models (Brown et al. 93)
Mary did not slap the green witch
Mary not slap slap slap the green witch
Mary not slap slap slap NULL the green witch
n(3|slap)
Mary no daba una botefada a la verde bruja
Mary no daba una botefada a la bruja verde
P(NULL)
t(la|the)
d(j|i)
[from Al-Onaizan and Knight, 1998]
fertility
NULL insertion
lexical translation
distortion
![Page 83: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/83.jpg)
Alignment Models
▪ IBM Model 1: lexical translation▪ IBM Model 2: alignment model, global monotonicity▪ HMM model: local monotonicity ▪ fastalign: efficient reparametrization of Model 2▪ IBM Model 3: fertility▪ IBM Model 4: relative alignment model▪ IBM Model 5: deficiency▪ ...
![Page 84: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/84.jpg)
P(e,a|f)
P(e, alignment|f) = ∏pf∏p
t∏p
d
Mary did not slap the green witch
Mary not slap slap slap the green witch
Mary not slap slap slap NULL the green witch
n(3|slap)
Mary no daba una botefada a la verde bruja
Mary no daba una botefada a la bruja verde
P(NULL)
t(la|the)
d(j|i)
fertility
NULL insertion
lexical translation
distortion
![Page 85: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/85.jpg)
P(e|f)
P(e|f) = ∑all_possible_alignments
∏pf∏p
t∏p
d
Mary did not slap the green witch
Mary not slap slap slap the green witch
Mary not slap slap slap NULL the green witch
n(3|slap)
Mary no daba una botefada a la verde bruja
Mary no daba una botefada a la bruja verde
P(NULL)
t(la|the)
d(j|i)
fertility
NULL insertion
lexical translation
distortion
![Page 86: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/86.jpg)
Evaluating Alignment Models
▪ How do we measure quality of a word-to-word model?
▪ Method 1: use in an end-to-end translation system▪ Hard to measure translation quality▪ Option: human judges▪ Option: reference translations (NIST, BLEU)▪ Option: combinations (HTER)▪ Actually, no one uses word-to-word models alone as TMs
▪ Method 2: measure quality of the alignments produced▪ Easy to measure▪ Hard to know what the gold alignments should be▪ Often does not correlate well with translation quality (like perplexity in LMs)
![Page 87: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/87.jpg)
Alignment Error Rate
![Page 88: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/88.jpg)
Alignment Error Rate
![Page 89: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/89.jpg)
Alignment Error Rate
![Page 90: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/90.jpg)
Alignment Error Rate
![Page 91: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/91.jpg)
Alignment Error Rate
![Page 92: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/92.jpg)
Problems with Lexical Translation
▪ Complexity -- exponential in sentence length▪ Weak reordering -- the output is not fluent▪ Many local decisions -- error propagation
![Page 93: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/93.jpg)
Phrase-Based Translation
P(e, alignment|f) = psegmentation
ptranslation
preorderings
![Page 94: Algorithms for NLP - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711... · Graph-Based Parsing Algorithms Start with a fully-connected directed graph Find](https://reader033.vdocuments.site/reader033/viewer/2022050206/5f58fd4e6d6a426b4e40875b/html5/thumbnails/94.jpg)
Phrase-Based MT
fesource phrase
target phrase
translation features
Translation Model
f
Language Model
e f
Reranking Model
feature weights
Parallel corpus
Monolingual corpus
Held-out parallel corpus