![Page 1: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/1.jpg)
1
Gholamreza Haffari
Simon Fraser University
MT Summit, August 2009
Machine Learning approaches for dealing with Limited Bilingual Data in SMT
![Page 2: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/2.jpg)
2
Acknowledgments
Special thanks to: Anoop Sarkar
Some slides are adapted or used from Chris Callison Burch Trevor Cohn Dragos Stefan Munteanu
![Page 3: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/3.jpg)
3
Statistical Machine Translation
Translate from a source language to a target language by computer using a statistical model
MFE is a standard log-linear model
MFESource Lang. F Target Lang. E
![Page 4: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/4.jpg)
4
Log-Linear Models
Feature functions Weights
In the test time, the best output t* for a given s is chosen by
t* = arg max t i wi . fi (t,s)
![Page 5: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/5.jpg)
5
Phrase-based SMT
MFE is composed of two main components:
The language model flm : Takes care of the fluency of the generated translation
The phrase table fpt : Takes care of the content of the source sentence in the generated translation
Huge bitext is needed to learn a high quality
phrase dictionary
![Page 6: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/6.jpg)
6
Bilingual Parallel Data
Source Text Target Text
![Page 7: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/7.jpg)
7
This Talk
What if we don’t have large bilingual
text to learn a good phrase table?
![Page 8: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/8.jpg)
8
Motivations
Low-density Language pairs Population speaking the language is small / Limited online resources
Adapting to a new style/domain/topic
Overcome training and testing mismatch
![Page 9: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/9.jpg)
9
Available Resources
Small bilingual parallel corpora
Large amounts of monolingual data
Comparable corpora
Small translation dictionary
Multilingual parallel corpora which includes multiple source languages but not the target language
![Page 10: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/10.jpg)
10
The Map
source-targetsmall bitext
MT system
large comparable source-target
bitext
parallel sentenceextraction
bilingual dictionary induction
large source monotext
semi-supervised/active learning
source-anotherlanguage bitext
paraphrasing
source-anotheranother-targetsource-target
bitexts
triangulation/co-training
![Page 11: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/11.jpg)
11
Learning Problems (I)
Supervised Learning: Given a sample of object-label pairs (xi,yi), find the
predictive relationship between object and labels
Un-supervised learning: Given a sample consisting of only objects, look for
interesting structures in the data, and group similar objects
![Page 12: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/12.jpg)
12
Learning Problems (II)
Now consider a training data consisting of: Labeled data: Object-label pairs (xi,yi)
Unlabeled data: Objects xj
Leads to the following learning scenarios: Semi-Supervised Learning: Find the best mapping from
objects to labels benefiting from Unlabeled data
Transductive Learning: Find the labels of unlabeled data
Active Learning: Find the mapping while actively query the oracle for the label of unlabeled data
![Page 13: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/13.jpg)
13
The Big Picture
Unlabeled{xj}
(monotext)
Labeled{(xi,yi)}(bitext)
Data
Train M Select
Self-Training
![Page 14: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/14.jpg)
14
Mining More Bilingual Parallel Data
Comparable Corpora are texts which are not parallel in the strict sense but convey overlapping information Wikipedia pages New agencies: BBC, CNN
From comparable corpora, we can extract sentence pairs which are (approximately) translation of each other
![Page 15: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/15.jpg)
15
Extracting Parallel Sentences
(Munteanu & Marcu, 2005)
Un-matched Documents
Parallelsentences
![Page 16: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/16.jpg)
16
Article Selection
(Munteanu & Marcu, 2005)
Select the n-most relevant target-language docs to a source-language document using an information retrieval (IR) system:
Translate each source-lang article into a target-lang query using the bilingual dictionary
Un-matched Documents
![Page 17: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/17.jpg)
17
Candidate Sentence Pair Selection
(Munteanu & Marcu, 2005)
Consider all of the sentence pairs from the source-lang article and relevant target-lang articles. Filter the sentence pairs if:
The ratio of the length is more than 2
At least half of the words in each sentence does not have a translation in the other sentence
![Page 18: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/18.jpg)
18
Parallel Sentence Selection
(Munteanu & Marcu, 2005)
Each candidate sentence pair (s,t) is classified into c0=‘parallel’ or c1=‘not parallel’ according to the following log-linear model:
The weights are learned during training phase using training data
![Page 19: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/19.jpg)
19
Model Features & Training Data
(Munteanu & Marcu, 2005)
The features of the log-linear classifier include: Length of the sentences, as well as their ratio
Percentage of words in one side that do not have translation in the other side / are not connected by alignment links
Training data can be prepared by a parallel corpus containing K sentence pairs
This gives K positive and K2 – K negative examples (which can be filtered further using the previous heuristics)
![Page 20: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/20.jpg)
20
Improvement in SMT (Arabic to English)
(Munteanu & Marcu, 2005)
Initial out-of-domain parallel corpus
Initial + extracted corpus
Initial + human translated data
![Page 21: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/21.jpg)
21
Outline
Introduction
Semi-supervised Learning for SMT Background (EM, Self-training, Co-Training) SSL for Alignments / Phrases / Sentences
Active Learning for SMT Single-language pair Multiple Language Pairs
![Page 22: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/22.jpg)
22
Inductive vs.Transductive
Transductive: Produce label only for the available unlabeled data. The output of the method is not a classifier It’s like writing answers for the take-home exam!
Inductive: Not only produce label for unlabeled data, but also produce a classifier. It’s like preparation for writing answers for the in-class
exam!
![Page 23: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/23.jpg)
23
Self-Training
Iteration: 0
+
-
A Model
trained by SL
Choose instances labeled with high confidence
Iteration: 1
+
-
Add them to thepool of current labeled training data
……
Iteration: 2
+
-
(Yarowsky 1995)
![Page 24: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/24.jpg)
24
EM
Use EM to maximize the joint log-likelihood of labeled and unlabeled data:
: Log-likelihood of labeled data
: Log-likelihood of unlabeled data
(Dempster et al 1977)
![Page 25: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/25.jpg)
25
EM
Iteration: 0
+
-
A Model
trained by SL Clone new
weighted labeled instances with unlab instancesusing (probabilisitc) model
Iteration: 1
+
-
……
(Yarowsky 1995)
w+i
w-i
Iteration: 2
+
-
![Page 26: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/26.jpg)
26
Co-Training Instances contain two sufficient sets of features
i.e. an instance is x=(x1,x2)
Each set of features is called a View
Two views are independent given the label:
Two views are consistent:
xx1 x2
(Blum & Mitchell 1998)
![Page 27: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/27.jpg)
27
Co-Training
Iteration: t
+
-
Iteration: t+1
+
-
……
C1: A Classifiertrained
on view 1
C2: A Classifiertrained
on view 2
Allow C1 to label Some instances
Allow C2 to label Some instances
Add self-labeled instances to the pool of training data
![Page 28: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/28.jpg)
28
Outline
Introduction
Semi-supervised Learning for SMT Background (EM, Self-training, Co-Training) SSL for Alignments / Phrases / Sentences
Active Learning for SMT Single-language pair Multiple Language Pairs
![Page 29: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/29.jpg)
29
Word Alignment & Translation Quality
(Fraser & Marcu 2006a) presented an SSL method for learning a better word alignment
A small / big set of sentence pairs annotated/unannotated with word alignments (~ 100 / ~ 2-3 million)
They showed that improvement in the word alignment caused improvement in the BLEU
The same conclusion was made later in (Ganchev et al 2008) for other translation tasks
![Page 30: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/30.jpg)
30
Word Alignment Model Consider the following log-linear model for word
alignment:
The feature functions are sub-models used in the IBM model 4, such as Translation probability t(f|e) Fertility probs n(|e): number of words generated by e …
![Page 31: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/31.jpg)
31
SS-Word Alignment (Fraser & Marcu 2006a) tuned the word alignment model
parameters on the small labeled data in a discriminative fashion
With the current , generate the n-best list
Manipulate so that the best alignment stands out, i.e. the one which maximizes modified f-measure (MERT style alg)
Use to find the word alignments of the big unlabeled data Estimate the feature functions’ parameters based on these best (Viterbi)
alignments: 1 iteration of the EM algorithm
Repeat the above two steps
![Page 32: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/32.jpg)
32
Outline
Introduction
Semi-supervised Learning for SMT Background (EM, Self-training, Co-Training) SSL for Alignments / Phrases / Sentences
Active Learning for SMT Single-language pair Multiple Language Pairs
![Page 33: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/33.jpg)
33
Paraphrasing If a word is unseen then SMT will not be able to
translate it Keep/omit/transliterate source word or use regular
expression to translate it (dates, …)
If a phrase is unseen, but its individual words are, then SMT will be less likely to produce a correct translation
The idea: Use paraphrases in the source language to replace unknown words/phrases Paraphrases are alternative ways of conveying the same
information(Callison Burch, 2007)
![Page 34: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/34.jpg)
34
Coverage Problem in SMT
Percentage of Test Item Types vs Corpus Size
(Callison Burch, 2007)
![Page 35: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/35.jpg)
35
Behavior on Unseen Data A system trained on 10,000 sentences (~200,000
words) may translate: Es positivo llegar a un acuerdo sobre los procedimientos, pero debemos
encargarnos de que este sistema no sea susceptible de ser usado como arma pol´ıtica.
as It is good reach an agreement on procedures, but we must encargarnos that
this system is not susceptible to be usado as political weapon.
Since the translations of encargarnos and usado were not learned, they are either reproduced in the translation, or omitted entirely
(Callison Burch, 2007)
![Page 36: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/36.jpg)
36
Substituting Paraphrases then Translating
It is good reach an agreement on procedures, but we must encargarnos that this system is not susceptible to be usado as political weapon.
encargarnos ?
usado ?
(Callison Burch, 2007)
![Page 37: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/37.jpg)
37
Substituting Paraphrases then Translating
It is good reach an agreement on procedures, but we must encargarnos that this system is not susceptible to be usado as political weapon.
encargarnos ?
garantizar
velar
procurar
Asegurarnos
usado ?
utilizado
empleado
uso
utiliza
(Callison Burch, 2007)
![Page 38: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/38.jpg)
38
Substituting Paraphrases then Translating
It is good reach an agreement on procedures, but we must guarantee that this system is not susceptible to be used as political weapon.
encargarnos ?
garantizar
velar
procurar
Asegurarnos
guarantee, ensure, guaranteed, assure, provided
ensure, ensuring, safeguard, making sure
ensure that, try to, ensure, endeavour to
ensure, secure, make certain
usado ?
utilizado
empleado
uso
utiliza
used, use, spent, utilized
used, spent, employee
use, used, usage
used, uses, used, being used
(Callison Burch, 2007)
![Page 39: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/39.jpg)
39
Learning paraphrases (I)
From monolingual parallel corpora Multiple source sentences which are conveying the same
information Extract paraphrases seen in the same context in the aligned
source sentences
Emma burst into tears and he tried to comfort her, saying things to make her smile.
Emma cried, and he tried to console her, adorning his words with puns.
(Callison Burch, 2007)
![Page 40: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/40.jpg)
40
Learning paraphrases (I)
From monolingual parallel corpora Multiple source sentences which are conveying the same
information Extract paraphrases seen in the same context in the aligned
source sentences
burst into tears = cried comfort= console
Emma burst into tears and he tried to comfort her, saying things to make her smile.
Emma cried, and he tried to console her, adorning his words with puns.
(Callison Burch, 2007)
![Page 41: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/41.jpg)
41
Learning paraphrases (I)
From monolingual parallel corpora Multiple source sentences which are conveying the same
information Extract paraphrases seen in the same context in the aligned
source sentences
Problems with this approach Monolingual parallel corpora are relatively uncommon Limits what paraphrases we can generate, e.g. limited
number of paraphrases
(Callison Burch, 2007)
![Page 42: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/42.jpg)
42
Learning paraphrases (I)
From monolingual source corpora
For each unknown phrase x, build a distributional profile DPx which shows the co-occurrences of the surrounding words with x
Select the top-k phrases which have the most similar distributional profile with DPx
Is position important when building the profile? Should we simply count words, or use TF/IDF, or …? Which vector similarity measure should be used?
Needs smart tricks to make it scalable(Marton et al 2009)
![Page 43: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/43.jpg)
43
Learning paraphrases (II)
From bilingual parallel corpora However no longer we have access to identical contexts Adopt techniques from phrase-based SMT Use aligned foreign language phrases as pivot
(Callison Burch, 2007)
![Page 44: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/44.jpg)
44
Paraphrase Probability
Generate multiple paraphrases for a given phrase
We give them probabilities so they can be ranked
Define translation model probability:
![Page 45: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/45.jpg)
45
Refined Paraphrase Probability
Using multiple bilingual corpora, e.g. English-Spanish, English-German, …
C is the set of bilingual corpora and c is the weight of the corpus c, e.g. we may put more weight on larger corpora
Taking word sense into account In a paraphrase, replace each word with its word_sense item
![Page 46: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/46.jpg)
46
Plugging Paraphrases into SMT Model
For each paraphrase s2 having a translation t, we expand the phrase table by adding new entries (t,s1)
s1 s2 t
Add a new feature function into the SMT log-linear model to take into account the paraphrase probabilities
p(s2 | s1) If phrase table entry (t,s1) is generated from (t,s2)
1 Otherwise
f(t,s1) =
![Page 47: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/47.jpg)
47
Results of Paraphrasing
(Callison Burch, 2007)
![Page 48: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/48.jpg)
48
Improvement in Coverage
(Callison Burch, 2007)
![Page 49: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/49.jpg)
49
Triangulation
We can find additional data by focusing on: Multi-parallel corpora Collection of bitexts with some common language(s)
(Cohn & Lapata, 2007)
![Page 50: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/50.jpg)
50
Triangulation
We can find additional data by focusing on: Multi-parallel corpora Collection of bitexts with some common language(s)
(Cohn & Lapata, 2007)
![Page 51: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/51.jpg)
51
Triangulation
We can find additional data by focusing on: Multi-parallel corpora Collection of bitexts with some common language(s)
(Cohn & Lapata, 2007)
![Page 52: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/52.jpg)
52
Phrase-Level Triangulation
Triangulation (Kay, 1997) Translate source phrase into an intermediate language phrase Translate this intermediate phrase into the target phrase
Example: Translating a hot potato into French
(Cohn & Lapata, 2007)
![Page 53: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/53.jpg)
53
A Generative Model for Triangulation
Marginalize out the intermediate phrases:
The generative model for p(s|t) :
(Cohn & Lapata, 2007)
![Page 54: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/54.jpg)
54
Marginalize out the intermediate phrases:
Conditional independence assumption: i fully represents the information in t needed to translating s
Extends trivially to many intermediate languages
p(s|i) and p(i|t) are estimated using phrase frequencies
(Cohn & Lapata, 2007)
A Generative Model for Triangulation
![Page 55: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/55.jpg)
55
A Generative Model for Triangulation
Marginalize out the intermediate phrases:
Conditional independence may be violated
Translation model is estimated from noisy alignments
Missing contexts, i, in p(s|i)
Fewer large or rare phrases can be translated(Cohn & Lapata, 2007)
![Page 56: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/56.jpg)
56
Plugging Triangulated Phrases into Model
A mixture model of phrase pair probabilities from training set (standard) and the newly learned phrase pairs by triangulation:
As a new feature in the log-linear model
standard triang
+ (1-)
![Page 57: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/57.jpg)
57
Coverage Benefit
![Page 58: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/58.jpg)
58
For any Language Pair?
10k bilingual sentences, interpolated with 3 intermediate langs: /
(Cohn & Lapata, 2007)
![Page 59: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/59.jpg)
59
Larger Corpora
For French to English with Spanish as the intermediate language using different sizes for bitext(s)
triang: only triangulated
phrases
interp: mixture model
of the two phrase tables
(Cohn & Lapata, 2007)
![Page 60: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/60.jpg)
60
What Languages are best for triangulation?
10K bilingual sentences, translating from French to English
(Cohn & Lapata, 2007)
![Page 61: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/61.jpg)
61
How many languages are required?
10K bilingual sentences, translating from French to English, ordered by language family
(Cohn & Lapata, 2007)
![Page 62: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/62.jpg)
62
Paraphrasing vs Triangulation
Paraphrasing Uses bilingual projection to translate to and from a
source phrase It is employed to improve the source side coverage
Triangulation Generalizes the paraphrasing method to any
translation pathway linking the source and target Improves both source and target coverage
(Cohn & Lapata, 2007)
![Page 63: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/63.jpg)
63
Bilingual Lexicon Induction The goal is to induce a larger bilingual dictionary. It can
be used, for example, to augment the phrase table/parallel text
Suppose we have access to a small bilingual dictionary plus large monolingual text
Build distributional profile using use monolingual source text
Map the profile using seed rules (initial bilingual dictionary) to the target language vocabulary space
Select the top-k target language words with most similar distributional profiles
(Rapp, 1999)
![Page 64: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/64.jpg)
64
Context-based Rapp Model
(Garera et al 2009)
![Page 65: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/65.jpg)
65
Dependency Context Usually words in a fixed-size window are used to represent the
context
(Garera et al 2009) uses the latent structure in the dependency parse tree to represent the context
(Garera et al 2009)
![Page 66: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/66.jpg)
66
Dependency Context Usually words in a fixed-size window are used to represent the
context
(Garera et al 2009) uses the latent structure in the dependency parse tree to represent the context
Dynamic context size
Accounts for reordering
(Garera et al 2009)
![Page 67: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/67.jpg)
67
Bilingual Lexicon Induction (more references)
(Koehn & Knight 2002) takes into account the orthographic features in addition to the context
(Haghighi et al 2008) devise a generative model which generates the (feature vector of) related words in the source and target languages
Each word is represented by a feature vector containing both contextual and ortographic features
(Mann & Yarowsky 2001) and (Schafer & Yarowsky 2002) use a bridge language to induce bilingual lexicon
![Page 68: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/68.jpg)
68
Bilingual Phrase Induction (non-comparable corpora)
Non-comparable corpora contain “... disparate, very nonparallel bilingual documents that could either be on the same topic (on-topic) or not” (Fung & Cheung 2004) The goal is to extract parallel sub-sentential fragments, as
opposed to extracting parallel sentences
Assume we have a lexical dictionary P(t | s): the probability the source word s translates into target word t
Using some heuristics, specify the candidate sentence pairs
(Munteanu & Marcu 2006)
![Page 69: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/69.jpg)
69
The Signal Processing Approach
target
source
![Page 70: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/70.jpg)
70
The Signal Processing Approach
target
source
![Page 71: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/71.jpg)
71
The Signal Processing Approach
target
source
![Page 72: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/72.jpg)
72
The Signal Processing Approach
P(t|s)
target
source
![Page 73: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/73.jpg)
73
The Signal Processing Approach
target
source
![Page 74: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/74.jpg)
74
The Signal Processing Approach
target
source
![Page 75: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/75.jpg)
75
The Signal Processing Approach
target
source
Average of “signals”from neighbors
![Page 76: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/76.jpg)
76
The Signal Processing Approach
target
source
Average of “signals”from neighbors
![Page 77: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/77.jpg)
77
Bilingual Phrase Induction (non-comparable corpora)
Retain “positive fragments”, i.e. those fragments for which the corresponding filtered signal values are positive
Repeat the procedure in the other direction (target to source) to obtain the fragments for source, and consider the resulting two text chunks as parallel
The signal filtering function is simple, more advanced filters might work better
(Munteanu & Marcu 2006)
![Page 78: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/78.jpg)
78
The Effect of Parallel Fragments for SMT
(Munteanu & Marcu 2006)
Explained in the beginning of the talk
The method just explained
![Page 79: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/79.jpg)
79
Outline
Introduction
Semi-supervised Learning for SMT Background (EM, Self-training, Co-Training) SSL for Alignments / Phrases / Sentences
Active Learning for SMT Single-language pair Multiple Language Pairs
![Page 80: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/80.jpg)
80
Self-Training for SMT
Train
MFE
Bilingual text
FF EE
Monolingual text
DecodeTranslated text
FF EE
FF EE
Selecthigh quality Sent. pairs
Selecthigh quality Sent. pairs
Re-Log-linear Model
Re-training the SMT model
Re-training the SMT model
![Page 81: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/81.jpg)
81
Self-Training for SMT
Train
MFE
Bilingual text
FF EE
Monolingual text
DecodeTranslated text
FF EE
FF EE
Selecthigh quality Sent. pairs
Selecthigh quality Sent. pairs
Re-Log-linear Model
Re-training the SMT model
Re-training the SMT model
(Ueffing et al 2007a)
![Page 82: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/82.jpg)
82
Scoring & Selecting Sentence Pairs
Scoring: Use normalized decoder’s score Confidence estimation method (Ueffing & Ney 2007)
Selecting: Importance sampling: Those whose score is above a threshold Keep all sentence pairs
![Page 83: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/83.jpg)
83
Confidence Estimation
A log linear combination of Word posterior probabilities: The chance of seeing
a word in a particular position in translations Phrase posterior probabilities Language model score
The weights are tuned to minimize the classification error rate Translations having a WER above a threshold are
considered incorrect
![Page 84: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/84.jpg)
84
Self-Training for SMT
Train
MFE
Bilingual text
FF EE
Monolingual text
DecodeTranslated text
FF EE
FF EE
Selecthigh quality Sent. pairs
Selecthigh quality Sent. pairs
Re-Log-linear Model
Re-training the SMT model
Re-training the SMT model
(Ueffing et al 2007a)
![Page 85: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/85.jpg)
85
Re-Training the SMT Model (I)
Simply add the newly selected sentence pairs to the initial bitext, and fully re-train the phrase table
A mixture model of phrase pair probabilities from training set combined with phrase pairs from the newly selected sentence pairs
Initial Phrase Table New Phrase Table
+ (1-)(Ueffing et al 2007a)
![Page 86: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/86.jpg)
86
Re-training the SMT Model (II)
Use new sentence pairs to train an additional phrase table and use it as a new feature function in the SMT log-linear model One phrase table trained on sentences for which we have
the true translations One phrase table trained on sentences with their generated
translations
Phrase Table 1 Phrase Table 2
![Page 87: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/87.jpg)
87
Results (Chinese to English, Transductive)
Selection Scoring BLEU% WER% PER%
Baseline 27.9 .7 67.2 .6 44.0 .5
Keep all 28.1 66.5 44.2
Importance Sampling
Norm. score 28.7 66.1 43.6
Confidence 28.4 65.8 43.2
Threshold Norm. score 28.3 66.1 43.5
confidence 29.3 65.6 43.2
• WER: Lower is better (Word error rate)• PER: Lower is better (Position independent WER )• BLEU: Higher is better
Bold: best result, italic: significantly better
Using additional phrase table
![Page 88: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/88.jpg)
88
Results (Chinese to English, Inductive)
system BLEU% WER% PER%
Eval-04 (4 refs.)
Baseline 31.8 .7 66.8 .7 41.5 .5
Add Chinese data Iter 1 32.8 65.7 40.9
Iter 4 32.6 65.8 40.9
Iter 10 32.5 66.1 41.2
• WER: Lower is better (Word error rate)• PER: Lower is better (Position independent WER )• BLEU: Higher is better
Bold: best result, italic: significantly better
Using importance sampling and additional phrase table
![Page 89: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/89.jpg)
89
Why does it work (I)
Reinforces parts of the phrase translation model which are relevant for test corpus, hence obtain more focused probability distribution
source | target prob
A B | a b e
A B | c d
…
.5
.5
…
Decode monotext
---- A B ----- ---- c d -----
“c d” is chosen since LM picks it according to signals from context
source | target prob
A B | a b e
A B | c d
…
.2
.8
…
Use this to resolve ambiguity of translating “A B” in other parts of the text
Retraining
(Ueffing et al 2008)
![Page 90: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/90.jpg)
90
Why does it work (II)
Composes new phrases, for example:
Original parallel corpus Additional source data Possible new phrases
‘A B’, ‘C D E’ ‘A B C D E’ ‘A B C’, ‘B C D E’, …
Source: ----- A B C D E -----
Translation: ----- a b c d e ----- ----- A B C D E -----
----- a b c d e -----
----- A B C D E -----
----- a b c d e -----
(Ueffing et al 2008)
![Page 91: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/91.jpg)
91
Analysis
New phrases are used rarely, hence most of the benefit comes from focused probability distributions
![Page 92: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/92.jpg)
92
Co-training for SMT
Source sentence is a view onto the translation
Existing translations of a source sentence can be used as additional views on the translation
(Callison Burch, 2003)
![Page 93: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/93.jpg)
93
Co-Training for SMT
(Callison Burch, 2003)
![Page 94: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/94.jpg)
94
Co-Training for SMT
(Callison Burch, 2003)
Having initial bitexts, train SMT models from source languages to the target language
![Page 95: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/95.jpg)
95
Co-Training for SMT
(Callison Burch, 2003)
Translate a multilingual parallel sentence in the source languages using the trained SMT models
![Page 96: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/96.jpg)
96
Co-Training for SMT
(Callison Burch, 2003)
Choose the best generated translation
![Page 97: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/97.jpg)
97
Co-Training for SMT
(Callison Burch, 2003)
Add the new sentence pairs to the bitexts and re-train the SMT models
![Page 98: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/98.jpg)
98
Results of Co-Training
20k initial labeled sentences, 60k unlabeled parallel sentences in 5 languages, select 10k pseudo-labeled sentences in each iteration
(Callison Burch, 2003)
![Page 99: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/99.jpg)
99
Coaching
Suppose we have no German-English bitext There is a French-English bitext There is a French-German bitext
Train a French to English translation model
Translate the French to English and align the generated translations with German
![Page 100: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/100.jpg)
100
Results of Coaching
Coaching of German to English by a French to English translation model
(Callison Burch, 2003)
![Page 101: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/101.jpg)
101
Results of Coaching
Coaching of German to English by multiple translation models
(Callison Burch, 2003)
![Page 102: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/102.jpg)
102
Outline
Introduction
Semi-supervised Learning for SMT Background (EM, Self-training, Co-Training) SSL for Alignments / Phrases / Sentences
Active Learning for SMT Single-language pair Multiple Language Pairs
![Page 103: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/103.jpg)
103
Shortage of Bilingual Data: A Solution
Suppose we are given a large monolingual text in the source language F
Pay a human expert and ask him/her to translate these sentences into the target language E This way, we will have a bigger bilingual text
But our budget is limited ! We cannot afford to translate all monolingual sentences
![Page 104: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/104.jpg)
104
A Better Solution
Choose a subset of monolingual sentences for which:
if we had the translation,
the SMT performance would increase the most
Only ask the human expert for the translation of these highly informative sentences
This is the goal of Active Learning
![Page 105: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/105.jpg)
105
Active Learning for SMT
Train
MFE
Bilingual text
FF EE
Monolingual text
DecodeTranslated text
FF EE
Translate by human
FF EE FF
SelectInformative Sentences
SelectInformative Sentences
Re-Log-linear Model
Re-training the SMT models
Re-training the SMT models
(Haffari et al 2009)
![Page 106: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/106.jpg)
106
Active Learning for SMT
Train
MFE
Bilingual text
FF EE
Monolingual text
DecodeTranslated text
FF EE
Translate by human
FF EE FF
SelectInformative Sentences
SelectInformative Sentences
Re-Log-linear Model
Re-training the SMT models
Re-training the SMT models
![Page 107: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/107.jpg)
107
Sentence Selection Strategies
Baselines: Randomly choose sentences from the pool of monolingual
sentences Choose longer sentences from the monolingual corpus
Other methods Decoder’s confidence for the translations (Kato & Barnard,
2007)
Reverse model Utility of the translation units
(Haffari et al 2009)
![Page 108: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/108.jpg)
108
Decoder’s Confidence
Sentences for which the model is not confident about their translations are selected first
Hopefully high confident translations are good ones
Normalize the confidence score by the sentence length
(Haffari et al 2009)
![Page 109: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/109.jpg)
109
Reverse Model
Comparing the original sentence, and the final sentence
Tells us something about the value of the sentence
I will let you know about the issue later
Je vais vous faire plus tard sur la question
I will later on the question
MEF
Rev: MFE
(Haffari et al 2009)
![Page 110: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/110.jpg)
110
Sentence Selection Strategies
Baselines: Randomly choose sentences from the pool of monolingual
sentences Choose longer sentences from the monolingual corpus
Other methods Decoder’s confidence for the translations (Kato & Barnard,
2007)
Reverse model Utility of the translation units
(Haffari et al 2009)
![Page 111: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/111.jpg)
111
Utility of the Translation Units
Phrases are the basic units of translations in phrase-based SMT
I will let you know about the issue later
Monolingual Text6
6
18
3
Bilingual Text5
6
12
3
7
The more frequent a phrase is in the monolingual text, the more important it is
The more frequent a phrase is in the bilingual text, the less important it is
m b
![Page 112: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/112.jpg)
112
Generative Models for Phrases
Monolingual Text Bilingual Text
66183
Count
.25
.25
.05
.33
.12
Probability
561237
Count Probability
.21
.22
.05
.09
.14
.29
m b
![Page 113: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/113.jpg)
113
Sentence Selection: Probability Ratio Score
For a monolingual sentence S
Consider the bag of its phrases:
Score of S depends on its probability ratio:
= { , , }
m ( )
b ( )
m ( )
b ( )
m ( )
b ( )
(Haffari et al 2009)
![Page 114: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/114.jpg)
114
Sentence Selection: Probability Ratio Score
For a monolingual sentence S
Consider the bag of its phrases:
Score of S depends on its probability ratio:
Phrase probability ratio captures our intuition about the utility of the translation units
= { , , }
Phrase Prob. Ratio
![Page 115: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/115.jpg)
115
Extensions of the Score
Instead of using phrases, we may use n-grams
We may alternatively use the following score
(Haffari et al 2009)
![Page 116: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/116.jpg)
116
Sentence Segmentation
How to prepare the bag of phrases for a sentence S?
For the bilingual text, we have the segmentation from the training phase of the SMT model
For the monolingual text, we run the SMT model to produce the top-n translations and segmentations
What about OOV fragments in the sentences of the monolingual text?
(Haffari & Sarkar 2009)
![Page 117: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/117.jpg)
117
OOV Fragments: An Example
i will go to school on fridayOOV Fragment
go to school on friday
go to school on friday
go to school on friday
OOV Phrases
Which can be long
(Haffari & Sarkar 2009b)
![Page 118: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/118.jpg)
118
Counting OOV Phrases
Fix an OOV fragment x
Put a uniform distribution over all possible segmentations of x
Use the expected count of OOV Phrases under this uniform distribution
See (Haffari & Sarkar 2009b) for how to compute these expectations efficiently
x:
…
(Haffari & Sarkar 2009)
![Page 119: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/119.jpg)
119
Active Learning for SMT
Train
MFE
Bilingual text
FF EE
Monolingual text
DecodeTranslated text
FF EE
Translate by human
FF EE FF
SelectInformative Sentences
SelectInformative Sentences
Re-Log-linear Model
Re-training the SMT models
Re-training the SMT models
![Page 120: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/120.jpg)
120
Re-training the SMT Models
We use two phrase tables in each SMT model MFiE
One trained on sents for which we have the true translations
One trained on sents with their generated translations (Self-training)
Fi Ei
Phrase Table 1 Phrase Table 2
![Page 121: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/121.jpg)
121
Experimental Setup
Dataset size:
We select 200 sentences from the monolingual sentence set for 25 iterations
We use Portage from NRC as the underlying SMT system (Ueffing et al, 2007)
Bitext Monotext test
French-English 5K 20K 2K
![Page 122: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/122.jpg)
122
The Simulated AL Setting
Utility of phrases
Random
Decoder’s Confidence
Bet
ter
![Page 123: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/123.jpg)
123
The Simulated AL SettingB
ette
r
![Page 124: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/124.jpg)
124
Outline
Introduction
Semi-supervised Learning for SMT Background (EM, Self-training, Co-Training) SSL for Alignments / Phrases / Sentences
Active Learning for SMT Single-language pair Multiple Language Pairs
![Page 125: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/125.jpg)
125
Multiple Language-Pair AL-SMT
E(English)
Add a new lang. to a multilingual parallel corpus To build high quality SMT systems from existing
languages to the new lang.
F1
(German) F2
(French) F3
(Spanish)
… AL
Translation Quality
![Page 126: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/126.jpg)
126
AL-SMT: Multilingual Setting
Train
MFEF1,F2, …F1,F2, … EE
Monolingual text
DecodeE1,E2,..E1,E2,..
Translate by human
SelectInformative Sentences
SelectInformative Sentences
Re-Log-linear Model
Re-training the SMT models
Re-training the SMT models
F1,F2, …F1,F2, …
F1,F2, …F1,F2, …F1,F2, …F1,F2, … EE
![Page 127: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/127.jpg)
127
Selecting Multilingual Sents. (I)
• Alternate Method: To choose informative sents. based on a specific Fi in each AL iteration
F1 F2 F3
… … …
2
35
1
3
19
2
2
17
3
Rank
(Reichart et al, 2008)
![Page 128: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/128.jpg)
128
Selecting Multilingual Sents. (II)
• Combined Method: To sort sents. based on their ranks in all lists
F1 F2 F3
… … …
2
35
1
3
19
2
2
17
3
Combined Rank
…
7=2+3+2
71=35+19+17
6=1+2+3
(Reichart et al, 2008)
![Page 129: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/129.jpg)
129
Selecting Multilingual Sents. (III)
• Disagreement Method – Pairwise BLEU score of the generated translations– Sum of BLEU scores from a consensus translation
F1 F2 F3
… … …
E1
…
E2
…
E3
…
Consensus Translation
![Page 130: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/130.jpg)
130
AL-SMT: Multilingual Setting
Train
MFEF1,F2, …F1,F2, … EE
Monolingual text
DecodeE1,E2,..E1,E2,..
Translate by human
SelectInformative Sentences
SelectInformative Sentences
Re-Log-linear Model
Re-training the SMT models
Re-training the SMT models
F1,F2, …F1,F2, …
F1,F2, …F1,F2, …F1,F2, …F1,F2, … EE
![Page 131: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/131.jpg)
131
Re-training the SMT Models (I)
We use two phrase tables in each SMT model MFiE
One trained on sents for which we have the true translations
One trained on sents with their generated translations (Self-training)
Fi Ei
Phrase Table 1 Phrase Table 2
![Page 132: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/132.jpg)
132
Re-training the SMT Models (II)
Phrase Table 2: We can instead use the consensus translations (Co-Training)
Fi
Phrase Table 1
E1 E2 E3 Econsensus
Phrase Table 2
![Page 133: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/133.jpg)
133
Experimental Setup
We want to add English to a multilingual parallel corpus containing Germanic languages in EuroParl: Germanic Langs: German, Dutch, Danish, Swedish
Sizes of dataset and selected sentences Initially there are 5k multilingual sents parallel to English
sents 20k parallel sents in multilingual corpora. 10 AL iterations, and select 500 sentences in each iteration
We use Portage from NRC as the underlying SMT system (Ueffing et al, 2007b)
![Page 134: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/134.jpg)
134
Self-training vs Co-training
Germanic Langs to English
Co-Training mode outperforms Self-Training mode
19.75
20.20
![Page 135: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/135.jpg)
135
Germanic Languages to English
method Self-TrainingWER / PER / BLEU
Co-TrainingWER / PER / BLEU
Combined Rank
Alternate
Random
• WER: Lower is better (Word error rate)• PER: Lower is better (Position independent WER )• BLEU: Higher is better
41.0
40.2
41.6
40.1
40.0
40.5
30.2
30.0
31.0
30.1
29.6
30.7
19.9
20.0
19.4
20.2
20.3
20.2
Bold: best result, italic: significantly better
![Page 136: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/136.jpg)
136
Conclusion
source-targetsmall bitext
MT system
large comparable source-target
bitext
parallel sentenceextraction
bilingual dictionary induction
large source monotext
semi-supervised/active learning
source-anotherlanguage bitext
paraphrasing
source-anotheranother-targetsource-target
bitexts
triangulation/co-training
![Page 137: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/137.jpg)
137
Finish
![Page 138: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/138.jpg)
138
References (Blum & Mitchell 1998) A. Blum and T. Mitchell, “Combining Labeled and
Unlabeled Data with Co-Training”, COLT.
(Callison Burch 2007) C. Callison Burch, “Paraphrasing and Translation”, PhD thesis, University of Edinburgh.
(Callison Burch 2003) C. Callison Burch, “Co-Training for Statistical Machine Translation”, Master’s thesis, University of Edinburgh.
(Cohn & Lapata 2007) T. Cohn and M. Lapata, “Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora”, ACL.
(Dempster et al 1977) A. P. Dempster, N. M. Laird, D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society. Series B.
(Fraser & Marcu 2006a) A. Fraser and D. Marcu, “Semi-Supervised Training for Statistical Word Alignment”, ACL.
![Page 139: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/139.jpg)
139
References (Fraser & Marcu 2006b) A. Fraser and D. Marcu, “Measuring Word
Alignment Quality for Statistical Machine Translation”, Technical Report ISI-TR-616, ISI/University of Southern California.
(Fung & Cheung 2004) P. Fung and P. Cheung, “ Mining very non-parallel corpora: Parallel sentence and lexicon extraction vie bootstrapping and EM”, EMNLP.
(Garera et al 2009) N. Garera, C. Callison-Burch and D. Yarowsky, “Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences”, CoNLL.
(Haffari et al 2009) G. Haffari, M. Roy, A. Sarkar, “Active Learning for Statistical Phrase-based Machine Translation ”, NAACL.
(Haffari & Sarkar 2009) G. Haffari and A. Sarkar, “Active Learning for Multilingual Statistical Machine Translation ”, ACL-IJCNLP.
(Haghighi et al 2008) A. Haghighi, P. Liang, T. Berg-Kirkpatrick, and D. Klein, ”Learning bilingual lexicons from monolingual Corpora”, ACL.
![Page 140: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/140.jpg)
140
References (Kuzman et al 2008) K. Ganchev, J. Graca and B. Taskar , “Better
Alignments = Better Translations?”, ACL.
(Koehn & Knight 2002) P. Koehn and K. Knight, ”Learning a translation lexicon from monolingual corpora”, ACL Workshop on Unsupervised Lexical Acquisition.
(Mann & Yarowsky 2001) G.Mann and D. Yarowsky, “Multi-path translation lexicon induction via bridge languages”, NAACL.
(Munteanu Marcu 2006) D. Munteanu and D. Marcu, “Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora”, COLING-ACL.
(Marton et al 2009) Y. Marton, C. Callison-Burch and P. Resnik, “Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases ”, EMNLP.
(Munteanu & Marcu, 2005) D. Munteanu and D. Marcu, “Improving Machine Translation Performance by Exploiting Non-parallel Corpora”, Computational Linguistics, 31(4).
![Page 141: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/141.jpg)
141
References (Rapp 1999) R. Rapp, “Automatic identification of word translations from
unrelated english and german corpora”, ACL.
(Reichart et al 2008) R. Reichart, K. Tomanek, U. Hahn and A. Rappoport, “Multi-Task Active Learning for Linguistic Annotations”, ACL.
(Schafer & Yarowsky 2001) C. Schafer and D. Yarowsky, “Inducing translation lexicons via diverse similarity measures and bridge languages”, COLING.
(Ueffing & Ney 2007) N. Ueffing and H. Ney, “ Word-Level Confidence Estimation for Machine Translation”, Computational Linguistics.
(Ueffing et al 2007a) N. Ueffing, G.R. Haffari, A. Sarkar, “Transductive Learning for Statistical Machine Translation ”, ACL.
(Ueffing et al 2007b) N. Ueffing, M. Simard, S. Larkin, and J. H. Johnson, “NRC’s Portage system for WMT 2007”, ACL Workshop on SMT.
![Page 142: 1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf9a1a28abf838c9256c/html5/thumbnails/142.jpg)
142
References (Ueffing et al 2008) N. Ueffing, G.R. Haffari, A. Sarkar, “Semi-supervised
model adaptation for statistical machine translation ”, Machine Treanslation Journal.
(Yarowsky 1995) D. Yarowsky, “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods”, ACL.