![Page 1: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/1.jpg)
Machine TranslationWiSe 2016/2017
Example-Based Machine Translation
Dr. Mariana Neves January 23th, 2017
![Page 2: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/2.jpg)
Overview
● Example-Based Machine Translation (EBMT)
● EBMT‘s Workflow
● EMBT‘s Working
● EBMT vs. Case-Based Reasoning (CBR)
● Text Similarity
● Recombination
● Conclusions
2
![Page 3: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/3.jpg)
Overview
● Example-Based Machine Translation (EBMT)
● EBMT‘s Workflow
● EMBT‘s Working
● EBMT vs. Case-Based Reasoning (CBR)
● Text Similarity
● Recombination
● Conclusions
3
![Page 4: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/4.jpg)
Fundamental ideas about MT (Nagao 1984)
„Man does not translate a simple sentence by doing deep linguistic analysis.“
4
(http://nlp.stanford.edu:8080/parser/)
![Page 5: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/5.jpg)
Fundamental ideas about MT (Nagao 1984)
„Man does the translation …
● first, by properly decomposing an input sentence into certain fragmental phrases [...]
● then by translation these fragmental phrases into other language phrases,
● and finally by properly composing these fragmental translations into one long sentence.“
5
My dog
likes eating
also
sausage
![Page 6: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/6.jpg)
Fundamental ideas about MT (Nagao 1984)
„Man does the translation …
● first, by properly decomposing an input sentence into certain fragmental phrases [...]
● then by translating these fragmental phrases into other language phrases,
● and finally by properly composing these fragmental translations into one long sentence.“
6
Mein Hund
isst gern
auch
Wurst
![Page 7: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/7.jpg)
Fundamental ideas about MT (Nagao 1984)
„Man does the translation …
● first, by properly decomposing an input sentence into certain fragmental phrases [...]
● then by translation these fragmental phrases into other language phrases,
● and finally by properly composing these fragmental translations into one long sentence.“
7
Mein Hund isst auch gern Wurst
![Page 8: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/8.jpg)
Example-Based MT
● Translation of fragmental phrases by analogy
● It is similar to SMT‘s decoding process
8
Machine Translation
Knowledge-based MT
Rule-based MT
Data-driven MT
Example-based MT Statistical MT Neural MT
![Page 9: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/9.jpg)
Example-Based MT
9
(http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?plugin=attach&refer=KUROHASHI-KAWAHARA-LAB&openfile=EBMT.png)
![Page 10: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/10.jpg)
Example-Based MT (EBMT)
● Analogy (text similarity) is the key in EBMT
● Requirements for text similarity:
● (1) measure of similarity: similar documents should be measured as similar, and vice-versa;
● (2) large lexical knowledge networks to support similarity, e.g., WordNet, Wikipedia, etc.
10
![Page 11: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/11.jpg)
Processing of source sentences
11
Rule-Based MT(meaning graphs)
Statistical MT(phrases with probabilities)
Example-Based MT(matched templates)
my dog ↔ mein Hund (p=0.75)also ↔ auch (p=0.66)..
(http://demo.ark.cs.cmu.edu/parse)
my dog ↔ mein Hundalso ↔ auch sausage ↔ Wurst
![Page 12: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/12.jpg)
Entry into the target language
● Rule-Based MT (interlingual): Late (top of the triangle)
● Statistical MT: immediate
● Example-Based MT (similar to RBMT transfer): intermediate level of NLP processing (middle of the triangle)
12
![Page 13: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/13.jpg)
Workflows
● Rule-Based MT
● Lexical access, morphology generation, syntax planning
● Statistical MT
● Mapping source sentence fragments, concatenation, scoring
● Example-Based MT
● Template matching, recombination
13
![Page 14: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/14.jpg)
Vauquois triangle for EBMT
14
source text target text
EXACT MATCHdirect translation
ALINGMENTtransfer
MATCHINGanalysis
RECOMBINATIONgeneration
![Page 15: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/15.jpg)
Data-driven MT: EBMT vs. SMT
● Statistical MT
● Probabilities to access the merit of candidates
● Probabilities to rank candidates (decoding)
● Stitching together translated fragments
● Example-based BMT
● Similarity score between input fragments to fragments in database (most critical step!)
● Syntactic and/or semantic similarity to rank candidates
● NLP layers until (possibly) deep semantic analysis (closer to RBMT)
● Stitching together translated fragments (closer to SMT)
15
![Page 16: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/16.jpg)
Overview
● Example-Based Machine Translation (EBMT)
● EBMT‘s Workflow
● EMBT‘s Working
● EBMT vs. Case-Based Reasoning (CBR)
● Text Similarity
● Recombination
● Conclusions
16
![Page 17: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/17.jpg)
Essential steps in EBMT
● Phrase fragment matching
● Translation of segments
● Recombination
17
![Page 18: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/18.jpg)
Base of examples
● He buys mangoes.
● Er kauft Mangos.
● This is a matter of international politics.
● Das ist ein Thema der internationalen Politik.
● They read a book.
● Sie lesen ein Buch.
18
![Page 19: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/19.jpg)
New sentence:
„He buys a book on international politics.“
19
He buys Er kauft (Mangos)
a book (Sie lesen) ein Buch
international politics (Das ist ein Thema der) internationalen Politik
“Er kauf ein Buch (on) internationalen Politik”
![Page 20: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/20.jpg)
Essential questions
● Which sentences from the example base are useful?
● Which parts of the input sentence to match? (words, phrases, etc)
● How is the matching done efficiently?
● Should the matching be on instances or classes? (dog,cat vs. animal)
20
![Page 21: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/21.jpg)
Essential questions
● How are function words (articles, prepositions) inserted and where?
● Which function words are picked? (functional or non-functional)
21
![Page 22: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/22.jpg)
Overview
● Example-Based Machine Translation (EBMT)
● EBMT‘s Workflow
● EMBT‘s Working
● EBMT vs. Case-Based Reasoning (CBR)
● Text Similarity
● Recombination
● Conclusions
22
![Page 23: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/23.jpg)
Translation of a sentence
● (1) Parts of the new sentence match parts of existing sentences (similar to SMT)
23
He buys Er kauft (Mangos)
a book (Sie lesen) ein Buch
international politics (Das ist ein Thema der) internationalen Politik
![Page 24: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/24.jpg)
Translation of a sentence
● (2) Properties of the new sentence match properties of existing sentences (linguistically-enriched SMT)
● Morphosyntactic structures (POS tags, chunks)
● Parse trees (constituent and dependency)
● Semantic graphs (disambiguated words, semantic roles, speech acts)
24
![Page 25: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/25.jpg)
Parse tree similarity
25
(ROOT (S (NP (PRP He)) (VP (VBZ buys) (NP (NP (DT a) (NN book)) (PP (IN on) (NP (JJ international) (NNS politics))))) (. .)))
(ROOT (S (NP (PRP He)) (VP (VBZ buys) (NP (NNS mangoes))) (. .)))
(ROOT (S (NP (DT This)) (VP (VBZ is) (NP (NP (DT a) (NN matter)) (PP (IN of) (NP (JJ international) (NNS politics))))) (. .)))
(ROOT (S (NP (PRP They)) (VP (VBP read) (NP (DT a) (NN book))) (. .)))
![Page 26: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/26.jpg)
Word matching
● Explotation of semantic similarity
● Use of knowledge networks, such as WordNet
● hypernyms: „mango“ and „fruit“
● sisters: „water“ and „tea“
● Measures for semantic similarity: Resnick; Lin; Jiang and Conrath
26
![Page 27: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/27.jpg)
Ambiguity (semantic role labeling)
● „know“ (English) to „kennen“ or „wissen“ (German)
27
I know the aircraft and flight number.
I know my husband since school
(http://demo.ark.cs.cmu.edu/parse)
![Page 28: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/28.jpg)
How does one know which words are important?
● Semantic role labeling to get the roles of the words
28
![Page 29: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/29.jpg)
EBMT is good for sublanguage phenomena
● e.g., learning various templates for possesives
29
子供の犬 Children's dog
木の根 Root of a tree
N1 の N2 N1's N2
N1 の N2 N2 of N1
本のページ Pages in book N1 の N2 N2 in N1
![Page 30: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/30.jpg)
Overview
● Example-Based Machine Translation (EBMT)
● EBMT‘s Workflow
● EMBT‘s Working
● EBMT vs. Case-Based Reasoning (CBR)
● Text Similarity
● Recombination
● Conclusions
30
![Page 31: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/31.jpg)
Case-based reasoning (CBR)
● CBR: learning by analogy
● EBMT: translation by analogy
31
![Page 32: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/32.jpg)
32 (https://www.uni-trier.de/index.php?id=44502&L=2)
CBR
![Page 33: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/33.jpg)
CBR and EBMT
33
(new sentence to be translated)
Case base: parallel corpora (example base)
(matched sentences found by similarity)
(matching rules exploiting features)
(rules for recombination, e.g., extraction of sentence parts and templates)
(intermediate output)
(recombination rules for solving boundaries friction)
(translation)
(new sentence-translation pair)
![Page 34: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/34.jpg)
Overview
● Example-Based Machine Translation (EBMT)
● EBMT‘s Workflow
● EMBT‘s Working
● EBMT vs. Case-Based Reasoning (CBR)
● Text Similarity
● Recombination
● Conclusions
34
![Page 35: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/35.jpg)
Text similarity
● Computing similarity for a pair of sentences containing a set of words:
● Word-based similarity
● Tree and graph-based similarity
● CBR‘s similarity adapted to EBMT
35
S1 :u1 ,u2 , u3 ...un
S2 : v1 , v2 , v3 ... vn
![Page 36: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/36.jpg)
Word-based similarity
● Edit distance: number of insertions, deletions and substitutions to transform one sentence into another:
● Levenshtein distance
36
(https://en.wikipedia.org/wiki/Levenshtein_distance)
![Page 37: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/37.jpg)
Word-based similarity
● Bag of words: unordered set of words of each sentence
37
B(S1)={u1 ,u2 , u3 ...un}
B (S2)={v1 , v2 , v3 ...vn}
Dice (B (S1) ,B(S2))=|B(S1)∩B(S2)||B(S1)|+|B(S2)|
Jackard (B(S1) ,B(S2))=|B (S1)∩B (S2)||B (S1)∪B (S2)|
![Page 38: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/38.jpg)
Word-based similarity
● Bag of words: unordered set of words of each sentence
38
Dice (B (S1) ,B(S2))=7
7+9
Jackard (B (S1) ,B (S2))=79
S1 = “Peter hired a car for the trip.”
S2 = “For the trip, a car was hired by Peter.”
![Page 39: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/39.jpg)
Word-based similarity
● Vector-based similarity: used for information retrieval and based on the vocabulary of the corpus
39
V (S1)={0,0,1,0,. ..1}
V (S2)={1,0,0,0,. ..1}
cosine (V (S1) ,V (S2))=V (S1)⋅V (S2)
|V (S1)|⋅|V (S2)|
(dot product)
(scalar product)
![Page 40: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/40.jpg)
Word-based similarity
● Vectors with term frequencies: vectors of integers instead of binaries
40
V (S1)={0,0,2,0,. ..1}
V (S2)={4,0,0,0,. ..2}
cosine (V (S1) ,V (S2))=V (S1)⋅V (S2)
|V (S1)|⋅|V (S2)|
![Page 41: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/41.jpg)
Word-based similarity
● Vectors with TF and IDF: term frequencies (TF) and inverse document (sentence) frequencies (IDF)
41
idf (w)=log (N
|S ;w∈S|)
tf (w p∈S1)⋅idf (w p)
Each component of V(S1) and V(S
2) is then given by:
![Page 42: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/42.jpg)
Tree and graph-based similarity
● Based on constituency and dependency parse trees of S1 and S2 .
42
(ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (S (VP (VBG eating) (NP (NN sausage))))) (. .)))
nmod:poss(dog-2, My-1)nsubj(likes-4, dog-2)advmod(likes-4, also-3)root(ROOT-0, likes-4)xcomp(likes-4, eating-5)dobj(eating-5, sausage-6)
(http://nlp.stanford.edu:8080/parser/)
![Page 43: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/43.jpg)
Tree and graph-based similarity
● For constituency matching, the non-terminals and terminals of the two trees should match when the trees are transversed in identical order.
43
(ROOT (S (NP (PRP He)) (VP (VBZ buys) (NP (NP (DT a) (NN book)) (PP (IN on) (NP (JJ international) (NNS politics))))) (. .)))
(ROOT (S (NP (PRP He)) (VP (VBZ buys) (NP (NNS mangoes))) (. .)))
![Page 44: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/44.jpg)
Tree and graph-based similarity
● Constituency tree similarity:
44
S (S1 , S2)=M
max(N 1 , N 2)
N1 is the number of nodes in S
1 constituency tree;
N2 is the number of nodes in S
2 constituency tree;
M is the number of nodes matched in a particular traversal order.
If S(S1,S
2) is above a certain threshold, the sentences are considered similar.
![Page 45: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/45.jpg)
Tree and graph-based similarity
● Time complexity is linear to the length of the longer S.
● For performance issues, suffix trees is usually employed:
45 (https://www.cs.jhu.edu/~ccb/publications/compact-data-structure-for-searchable-translation-memories.pdf)
![Page 46: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/46.jpg)
Tree and graph-based similarity
● For dependency tree matching, we compare the relations and their arguments
46
nsubj(buys-2, he-1)root(ROOT-0, buys-2)det(book-4, a-3)dobj(buys-2, book-4)case(politics-7, on-5)amod(politics-7, international-6)nmod(book-4, politics-7)
nsubj(buys-2, he-1)root(ROOT-0, buys-2)dobj(buys-2, mangoes-3)
![Page 47: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/47.jpg)
Tree and graph-based similarity
● Dependency tree matching is given by a weighted matching
● Highest weight for predicate/relation
● Less weight for the two arguments
47
w r+warg1+warg2=1
S (S1 , S2)=∑i=1
|D1|
∑j=1
|D2|
[wr δ(Ri1 , R j
2)+warg1δ(A1
i , A2i)+warg2δ(B1
j ,B2j)]
max (D1 , D2)
D1: no. of relations in S
1
D2: no. of relations in S
2
Ri1: ith relation in S
1
Rj2: jth relation in S
2
A1i: first argument of the ith relation in S
1
A2j: second argument of the ith relation in S
1
B1i: first argument of the jth relation in S
2
B2j: second argument of the jth relation in S
2
δ(x , y)=1, if (x= y )
![Page 48: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/48.jpg)
Tree and graph-based similarity
● Deep semantic graph-based similarity: deep semantic include sense disambiguation, relations, speech acts, co-references, etc.
● An advanced similarity can be calculated based on such graphs
48(Figure from Bhattacharyya 2015)
![Page 49: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/49.jpg)
CBR‘s similarity adapted to EBMT
● The similarity below is a common measure in CBR:
49
S ( I , R)=∑i=1
n
wi×s( f iI , f i
R)
∑i=1
n
wi
I: input case (sentence)R: retrieved case (sentence)
f are features on which s operates as similarity measure with respective weighting function w
![Page 50: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/50.jpg)
Sentence features and similarities
50
(Table from Bhattacharyya 2015)
![Page 51: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/51.jpg)
Sentence features and similarities
51(http://demo.patrickpantel.com/demos/verbocean/)
![Page 52: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/52.jpg)
Sentence features and similarities
52(http://verbs.colorado.edu/propbank/framesets-english-aliases/abandon.html)
![Page 53: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/53.jpg)
Sentence features and similarities
53
(https://framenet.icsi.berkeley.edu/fndrupal/about)
![Page 54: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/54.jpg)
Overview
● Example-Based Machine Translation (EBMT)
● EBMT‘s Workflow
● EMBT‘s Working
● EBMT vs. Case-Based Reasoning (CBR)
● Text Similarity
● Recombination
● Conclusions
54
![Page 55: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/55.jpg)
Recombination
● Once examples are retrieved from the database, their translation need to be adapted to produce the target output
● Based on sentence parts
● Based on properties of sentence parts
● Based on parts of semantic graphs
55
![Page 56: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/56.jpg)
Recombination based on Sentence Parts
● Null adaptation: exact match of sentences.
● Reinstantiation: Input and example are structurally similar but differ in values of elements.
56
Input: “Tomorrow, today will be yesterday.”
Example: “Yesterday, today was tomorrow.”
(Tomorrow, today and yesterday are hyponyms of day.)
(will be and was both derived from the verb to be)
![Page 57: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/57.jpg)
Recombination based on Sentence Parts
● Adjustment (boundary friction) in predicates and matching arguments are necessary
57
Input: “Tomorrow, today will be yesterday.”
Example: “Yesterday, today was tomorrow.”
Example translation: “Gerstern, heute war morgen.”
Output translation: “Morgen, heute ist gerstern.”
![Page 58: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/58.jpg)
Recombination based on Properties of Sentence Pairs
● Abstraction and respecialization:
● When input and example differ in small parts
● Look for abstraction of small pieces
● Try specialization of the abstraction
● Need a hierarchical organization of concepts
58
![Page 59: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/59.jpg)
Recombination based on Properties of Sentence Pairs
● Abstraction and respecialization (example):
59
Input: “Wir malen die Wand.”
Example: “Wir malen die Mauer.”
Example translation: “We paint the wall.”
Output: “We paint the wall.”
![Page 60: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/60.jpg)
Recombination based on Properties of Sentence Pairs
● Case-based substitution: there are matches in the attributes of the words of the input sentence and the example sentence.
– Properties can be features such as word, lemma, gender, number (singular/plural), person (3rd, 2nd), tense (past,future), voice (passive,active), POS tag, etc.
60
![Page 61: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/61.jpg)
Recombination based on Properties of Sentence Pairs
● Case-based substitution (example):
61
Input: “The new museum was inaugurated.”
Example 1: “The new stadium was inaugurated.”
Example 2: “The stadium is new.”
![Page 62: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/62.jpg)
Recombination based on Parts of Semantic Graph
● Correspondences that do not appear in linear sequences show up clearly in syntax trees and semantic graphs.
62
Example: “In Kolkata, Sachin donated a bat to the cricket museum on Sunday.”
Input: “Sachin hit a century in Kolkata against Bangladesh.”
![Page 63: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/63.jpg)
Recombination based on Parts of Semantic Graph
63
(example sentence)(input sentence)
(Figures from Bhattacharyya 2015)
![Page 64: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/64.jpg)
Overview
● Example-Based Machine Translation (EBMT)
● EBMT‘s Workflow
● EMBT‘s Working
● EBMT vs. Case-Based Reasoning (CBR)
● Text Similarity
● Recombination
● Conclusions
64
![Page 65: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/65.jpg)
EBMT vs. Translation Memory (TM)
● TM includes input from users (human translators)
65 (http://www.languagestudio.com/LanguageStudioDesktop.aspx)
![Page 66: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/66.jpg)
Hybrid systems: EBMT + SMT
● Kyoto-EBMT and CMU-EBMT:
● statistical alignment during analysis phase
● Alignment also for parse trees and semantic graphs
66
![Page 67: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/67.jpg)
Summary
● Data-driven paradigm of MT
● Translation by analogy
● Makes use of rules to find matches and to recombine aligned parts and build translation
● Similar to CBR: makes use of text similarity, language resources and abundant parallel corpora
● Similarity based on words and/or structure
● Recombination: adapt matched translation parts to build a new translation (boundary friction)
67
![Page 68: Machine Translation WiSe 2016/2017 - Hasso Plattner Institute · PDF fileHow are function words (articles, prepositions) inserted and where? ... ( How does one know which words are](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa4b2387f8b9a2f048c76e5/html5/thumbnails/68.jpg)
Suggested reading
● Machine Translation, Pushpak Bhattacharyya, (Chapter 6)
● Kyoto-EBMT (http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?KyotoEBMT)
● CMU-EBMT (http://www.cs.cmu.edu/~ralf/ebmt/ebmt.html)
68