avenue/letras: learning-based mt for languages with limited resources faculty: jaime carbonell, alon...
TRANSCRIPT
![Page 1: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/1.jpg)
AVENUE/LETRAS:Learning-based MT for Languages
with Limited Resources
Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert FrederkingStudents and Staff: Erik Peterson, Christian Monson, Ariadna Font Llitjós, Alison Alvarez, Roberto Aranovich, Rodolfo Vega
![Page 2: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/2.jpg)
Mar 1, 2006 AVENUE/LETRAS 2
Outline
• Scientific Objectives• Framework Overview• Learning Morphology• Elicitation• Learning Transfer Rules• Automatic Rule Refinement• Language Prototypes• New Directions
![Page 3: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/3.jpg)
Mar 1, 2006 AVENUE/LETRAS 3
Why Machine Translation for Languages with Limited Resources?
• We are in the age of information explosion– The internet+web+Google anyone can get the information
they want anytime…• But what about the text in all those other languages?
– How do they read all this English stuff?– How do we read all the stuff that they put online?
• MT for these languages would Enable:– Better government access to native indigenous and minority
communities– Better minority and native community participation in
information-rich activities (health care, education, government) without giving up their languages.
– Civilian and military applications (disaster relief)– Language preservation
![Page 4: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/4.jpg)
Mar 1, 2006 AVENUE/LETRAS 4
The Roadmap to Learning-based MT
• Automatic acquisition of necessary language resources and knowledge using machine learning methodologies
• A framework for integrating the acquired MT resources into effective MT prototype systems
• Effective integration of acquired knowledge with statistical/distributional information
![Page 5: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/5.jpg)
Mar 1, 2006 AVENUE/LETRAS 5
CMU’s AVENUE Approach
• Elicitation: use bilingual native informants to produce a small high-quality word-aligned bilingual corpus of translated phrases and sentences
• Transfer-rule Learning: apply ML-based methods to automatically acquire syntactic transfer rules for translation between the two languages– Learn from major language to minor language– Translate from minor language to major language
• XFER + Decoder:– XFER engine produces a lattice of possible transferred structures
at all levels– Decoder searches and selects the best scoring combination
• Rule Refinement: automatically refine and correct the acquired transfer rules via a process of interaction with bilingual informants which help the system identify translation errors
• Morphology Learning: unsupervised learning of morpheme structure of words based on their organization into paradigms and distributional information
![Page 6: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/6.jpg)
Mar 1, 2006 AVENUE/LETRAS 6
AVENUE MT Approach
Interlingua
Syntactic Parsing
Semantic Analysis
Sentence Planning
Text Generation
Source (e.g. Quechua)
Target(e.g. English)
Transfer Rules
Direct: SMT, EBMT
AVENUE: Automate Rule Learning
![Page 7: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/7.jpg)
Mar 1, 2006 AVENUE/LETRAS 7
Avenue Architecture
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 8: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/8.jpg)
Mar 1, 2006 AVENUE/LETRAS 8
Transfer Rule Formalism
Type information
Part-of-speech/constituent information
Alignments
x-side constraints
y-side constraints
xy-constraints,
e.g. ((Y1 AGR) = (X1 AGR))
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)
((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))
![Page 9: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/9.jpg)
Mar 1, 2006 AVENUE/LETRAS 9
Transfer Rule Formalism (II)
Value constraints
Agreement constraints
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)
((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))
![Page 10: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/10.jpg)
Mar 1, 2006 AVENUE/LETRAS 10
Transfer and Decoding
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 11: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/11.jpg)
Mar 1, 2006 AVENUE/LETRAS 11
The Transfer Engine
AnalysisSource text is parsed into its grammatical structure. Determines transfer application ordering.
Example:
ראיתי את האיש הזקן
(I) saw *acc the man the old
S
VP
V P NP
D N D Adj
הזקן האיש את ראיתי
TransferA target language tree is created by reordering, insertion, and deletion.
S
NP VP
N V NP
DET Adj N
I saw the old man
Source words translated with transfer lexicon.
GenerationTarget language constraints are checked, target morphology applied, and final translation produced.
E.g. “saw” in past tense selected.
Final translation:
“I saw the old man”
![Page 12: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/12.jpg)
Mar 1, 2006 AVENUE/LETRAS 12
Symbolic Decoder
• System rarely finds a full parse/transfer for complete input sentence• XFER engine produces comprehensive lattice of segment
translations• Decoder selects best combination of translation segments• Search for optimal scoring path of partial translations, based on
multiple features:– Target Language Model scores– XFER Rule Scores– Path Fragmentation– Other features…
• Symbolic decoding essential for scenarios where there is insufficient data for training large target LM– Effective Rule Scoring is crucial
![Page 13: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/13.jpg)
Mar 1, 2006 AVENUE/LETRAS 13
Morphology Learning
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 14: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/14.jpg)
Mar 1, 2006 AVENUE/LETRAS 14
The Challenge of Morphology
Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers)
Allkütulekefun
![Page 15: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/15.jpg)
Mar 1, 2006 AVENUE/LETRAS 15
The Challenge of Morphology
Mapudungun
-ke -fu -n-leAllkütu
![Page 16: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/16.jpg)
Mar 1, 2006 AVENUE/LETRAS 16
The Challenge of Morphology
Mapudungun
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
![Page 17: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/17.jpg)
Mar 1, 2006 AVENUE/LETRAS 17
The Challenge of Morphology
Mapudungun
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
I
![Page 18: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/18.jpg)
Mar 1, 2006 AVENUE/LETRAS 18
The Challenge of Morphology
Mapudungun
I used to
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
![Page 19: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/19.jpg)
Mar 1, 2006 AVENUE/LETRAS 19
The Challenge of Morphology
Mapudungun
I used to listen
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
![Page 20: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/20.jpg)
Mar 1, 2006 AVENUE/LETRAS 20
The Challenge of Morphology
Mapudungun
I used to listen
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
Tasks for Morphology• Segment Words• Map Morphemes onto Features
![Page 21: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/21.jpg)
Mar 1, 2006 AVENUE/LETRAS 21
The Challenge of Morphology
Tasks for Morphology
• Segment Words• Map Morphemes
onto Features
• Learn these tasks– unsupervised – from data – for any language
![Page 22: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/22.jpg)
Mar 1, 2006 AVENUE/LETRAS 22
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
Our Approach
![Page 23: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/23.jpg)
Mar 1, 2006 AVENUE/LETRAS 23
Ø.sblamesolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
![Page 24: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/24.jpg)
Mar 1, 2006 AVENUE/LETRAS 24
Ø.sblamesolve
Ø.s.dblame
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
![Page 25: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/25.jpg)
Mar 1, 2006 AVENUE/LETRAS 25
Ø.sblamesolve
Ø.s.dblame
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
![Page 26: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/26.jpg)
Mar 1, 2006 AVENUE/LETRAS 26
Ø.sblamesolve
Ø.s.dblame
sblameroamsolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
![Page 27: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/27.jpg)
Mar 1, 2006 AVENUE/LETRAS 27
Ø.sblamesolve
Ø.s.dblame
sblameroamsolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
![Page 28: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/28.jpg)
Mar 1, 2006 AVENUE/LETRAS 28
Ø.sblamesolve
Ø.s.dblame
sblameroamsolve
e.esblamsolv
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
![Page 29: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/29.jpg)
Mar 1, 2006 AVENUE/LETRAS 29
Ø.sblamesolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Ø.s.dblame
sblameroamsolve
e.esblamsolv
Our Approach
![Page 30: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/30.jpg)
Mar 1, 2006 AVENUE/LETRAS 30
e.esblamsolv
e.edblam
esblamsolv
Ø.s.dblame
Ø.sblamesolve
Øblameblamesblamedroams
roamedroaming
solvesolvessolving
e.es.edblam
edblamroam
dblameroame
Ø.dblame
s.dblame
sblameroamsolve
es.edblam
eblamsolv
me.mesbla
me.medbla
mesbla
me.mes.medbla
medblaroa
mes.medbla
mebla
![Page 31: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/31.jpg)
Mar 1, 2006 AVENUE/LETRAS 31
a.as.o.os43
african, cas, jurídic, l, ...
a.as.o.os.tro1
cas
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a.tro2
cas.cen
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
tro16
catas, ce, cen, cua, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
Spanish Newswire Corpus
40,011 Tokens
6,975 Types
31
![Page 32: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/32.jpg)
Mar 1, 2006 AVENUE/LETRAS 32
a.as.o.os43
african, cas, jurídic, l, ...
a.as.o.os.tro1
cas
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a.tro2
cas.cen
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
tro16
catas, ce, cen, cua, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
32
Suffixes
Stems
Level 5 = 5 suffixes
Stem Type Count
![Page 33: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/33.jpg)
Mar 1, 2006 AVENUE/LETRAS 33
a.as.o.os43
african, cas, jurídic, l, ...
Adjective Paradigm
33
a.as.o.os.tro1
cas
a.tro2
cas.cen
tro16
catas, ce, cen, cua, ...
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
From the spurious suffix “tro”
![Page 34: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/34.jpg)
Mar 1, 2006 AVENUE/LETRAS 34
a.as.o.os.tro1
cas
a.tro2
cas.cen
tro16
catas, ce, cen, cua, ...
a.as.o.os43
african, cas, jurídic, l, ...
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
34
De
cre
asin
g S
tem
Co
un
t
Incr
ea
sin
g S
uffix
Co
unt
Basic Search Procedure
![Page 35: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/35.jpg)
Mar 1, 2006 AVENUE/LETRAS 35
Examples and Evaluation of Automatically Selected Suffix Sets
Ø.ba.n.ndo ada.adas.ado.ados.aron.ó
a.aba.ado.ados.ar.ará.arán ada.ado.ados.ar.o
a.aciones.ación.adas.ado.ar ado.adores.o
a.ada.adas.ado.ar.ará ado.ados.arse.e
a.adas.ado.an.ar ado.ar.aron.arse.ará
a.ado.ados.ar.ó do.dos.ndo.r.ron
a.ado.an.arse.ó e.ida.ido
a.ado.aron.arse.ó emos.ido.ía.ían
aba.ada.ado.ar.o.os ida.ido.idos.ir.ió
aciones.ación.ado.ados ido.iendo.ir
aciones.ado.ados.ará ido.ir.ro
ación.ado.an.e
35
Global Suffix Evaluation
Precision: 0.506
Recall: 0.517
F1: 0.511
KeyCorrectWrong
![Page 36: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/36.jpg)
Mar 1, 2006 AVENUE/LETRAS 36
Next Steps for Morphology Induction
• Improve the Quality of Induced Paradigms– Current Work
• Convert Paradigms into a Segmenter– Soon
• Learn Mappings from Morphemes to Features– Future Goal
![Page 37: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/37.jpg)
Mar 1, 2006 AVENUE/LETRAS 37
Elicitation
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 38: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/38.jpg)
Mar 1, 2006 AVENUE/LETRAS 38
Purpose of Elicitation
• Provide a small but highly targeted corpus of hand aligned data– To support machine
learning from a small data set
– To cover all basic morpho-syntactic phenomena.
newpairsrcsent: Tú caístetgtsent: eymi ütrünagimialigned: ((1,1),(2,2))context: tú = Juan [masculino, 2a persona del
singular]comment: You (John) fell
newpairsrcsent: Tú estás cayendotgtsent: eymi petu ütünagimialigned: ((1,1),(2 3,2 3))context: tú = Juan [masculino, 2a persona del
singular]comment: You (John) are falling
newpairsrcsent: Tú caíste tgtsent: eymi ütrunagimialigned: ((1,1),(2,2))context: tú = María [femenino, 2a persona del
singular]comment: You (Mary) fell
![Page 39: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/39.jpg)
Mar 1, 2006 AVENUE/LETRAS 39
Purpose of Elicitation
• To get data from someone who is– Bilingual – Literate– Not experienced with linguistics
![Page 40: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/40.jpg)
Mar 1, 2006 AVENUE/LETRAS 40
English-Hindi Example
![Page 41: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/41.jpg)
Mar 1, 2006 AVENUE/LETRAS 41
English-Chinese Example
![Page 42: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/42.jpg)
Mar 1, 2006 AVENUE/LETRAS 42
English-Arabic Example
![Page 43: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/43.jpg)
Mar 1, 2006 AVENUE/LETRAS 43
The Elicitation Tool has been used with these languages
• Mapudungun• Hindi• Hebrew• Quechua• Aymara• Thai• Japanese• Chinese• Dutch• Arabic
![Page 44: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/44.jpg)
Mar 1, 2006 AVENUE/LETRAS 44
Elicitation Corpus: List of Minimal Pairs of Sentences in a Major Language
Eliciting from Spanish
Canto
Canté
Estoy cantando
Cantaste
Eliciting from
English
I sing
I sang
I am singing
You sang
![Page 45: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/45.jpg)
Mar 1, 2006 AVENUE/LETRAS 45
AVENUE Elicitation Corpora
• The Functional-Typological Corpus– Designed to elicit elements of meaning that
may have morpho-syntactic realization
• The Structural Elicitation Corpus– Based on sentence structures from the Penn
TreeBank
![Page 46: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/46.jpg)
Mar 1, 2006
The Process
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
![Page 47: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/47.jpg)
Mar 1, 2006 AVENUE/LETRAS 47
Feature Structuressrcsent: Mary was not a leader.context: Translate this as though it were spoken to a peer co-
worker;
((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…))
(pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…))
(c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical-aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase-aspect phase-aspect-neutral) (c-general-type declarative-clause)(c-polarity polarity-negative)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)…)
![Page 48: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/48.jpg)
Mar 1, 2006 AVENUE/LETRAS 48
Feature Specification
• Defines Features and their values
• Sets default values for features
• Specifies feature requirements and restrictions
• Written in XML
![Page 49: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/49.jpg)
Mar 1, 2006 AVENUE/LETRAS 49
Feature SpecificationFeature: c-copula-type
(a copula is a verb like “be”; some languages do not have copulas)Values
copula-n/a Restrictions: 1. ~(c-secondary-type secondary-copula)Notes:
copula-role Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler"
copula-identity Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "Clark Kent is Superman" "Sam is the teacher"
copula-location Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification.
copula-description Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A description is an attribute. "The children are happy." "The books are long."
![Page 50: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/50.jpg)
Mar 1, 2006 AVENUE/LETRAS 50
Feature Maps
• Some features interact in the grammar– English –s reflects person and number of the subject and tense of
the verb.– In expressing the English present progressive tense, the auxiliary
verb is in a different place in a question and a statement:• He is running.
• Is he running?
• We need to check many, but not all combinations of features and values.
• Using unlimited feature combinations leads to an unmanageable number of sentences
![Page 51: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/51.jpg)
Mar 1, 2006 AVENUE/LETRAS 51
![Page 52: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/52.jpg)
Mar 1, 2006 AVENUE/LETRAS 52
Evidentiality Map
Lexical Aspect
Assertiveness
Polarity
Source
Tense
Gram.
Aspect
activity-accomplishment
Assertiveness-asserted, Assetiveness-neutral
Polarity-positive, Polarity-negative
Hearsay, quotative, inferred, assumption
Visual, Auditory, non-visual-or-auditory
Past Present, Future Past Present
Perfective, progressive, habitual, neutral
habitual, neutral, progressive
Perfective, progressive, habitual, neutral
habitual, neutral, progressive
![Page 53: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/53.jpg)
Mar 1, 2006 AVENUE/LETRAS 53
Current Work
• Navigation– Start: large search space of all possible
feature combinations– Finish: each feature has been eliminated as
irrelevant or has been explored– Goal: dynamically find the most efficient path
through the search space for each language.
![Page 54: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/54.jpg)
Mar 1, 2006 AVENUE/LETRAS 54
Current Work
• Feature Detection– Which features have an effect on
morphosyntax?– What is the effect?– Drives the Navigation process
![Page 55: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/55.jpg)
Mar 1, 2006 AVENUE/LETRAS 55
Feature Detection: Spanish
The girl saw a red book.((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))La niña vió un libro rojo
A girl saw a red book((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))Una niña vió un libro rojo
I saw the red book((1,1)(2,2)(3,3)(4,5)(5,4))Yo vi el libro rojo
I saw a red book.
((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: noMarked-on-dependent: yesMarked-on-governor: noMarked-on-other: noAdd/delete-word: noChange-in-alignment: no
![Page 56: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/56.jpg)
Mar 1, 2006 AVENUE/LETRAS 56
Feature Detection: Chinese
A girl saw a red book.
((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8))
有 一个 女人 看见 了 一本 红色 的 书 。
The girl saw a red book.
((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7))
女人 看见 了 一本 红色的 书
Feature: definiteness
Values: definite, indefinite
Function-of-*: subject
Marked-on-head-of-*: no
Marked-on-dependent: no
Marked-on-governor: no
Add/delete-word: yes
Change-in-alignment: no
![Page 57: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/57.jpg)
Mar 1, 2006 AVENUE/LETRAS 57
Feature Detection: Chinese
I saw the red book((1, 3)(2, 4)(2, 5)(4, 1)(5, 2))
红色的 书, 我 看见 了
I saw a red book.((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6))我 看见 了 一本 红色的 书 。
Feature: definitenesValues: definite, indefiniteFunction-of-*: objectMarked-on-head-of-*: noMarked-on-dependent: noMarked-on-governor: noAdd/delete-word: yesChange-in-alignment: yes
![Page 58: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/58.jpg)
Mar 1, 2006 AVENUE/LETRAS 58
Feature Detection: Hebrew
A girl saw a red book.((2,1) (3,2)(5,4)(6,3))
ראתה ספר אדוםילדה
The girl saw a red book((1,1)(2,1)(3,2)(5,4)(6,3))
ראתה ספר אדוםהילדה
I saw a red book.((2,1)(4,3)(5,2))
אדוםספרראיתי
I saw the red book.((2,1)(3,3)(3,4)(4,4)(5,3))
האדוםהספרראיתי את
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: yesMarked-on-dependent: yesMarked-on-governor: noAdd-word: noChange-in-alignment: no
![Page 59: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/59.jpg)
Mar 1, 2006 AVENUE/LETRAS 59
Feature Detection Feeds into…
• Corpus Navigation: which minimal pairs to pursue next.– Don’t pursue gender in Mapudungun– Do pursue definiteness in Hebrew
• Morphology Learning:– Morphological learner identifies the forms of the morphemes– Feature detection identifies the functions
• Rule learning:– Rule learner will have to learn a constraint for each morpho-
syntactic marker that is discovered• E.g., Adjectives and nouns agree in gender, number, and definiteness
in Hebrew.
![Page 60: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/60.jpg)
Mar 1, 2006 AVENUE/LETRAS 60
Rule Learning
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 61: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/61.jpg)
Mar 1, 2006 AVENUE/LETRAS 61
Rule Learning - Overview
• Goal: Acquire Syntactic Transfer Rules• Use available knowledge from the major-
language side (grammatical structure)• Three steps:
1. Flat Seed Generation: first guesses at transfer rules; flat syntactic structure
2. Compositionality Learning: use previously learned rules to learn hierarchical structure
3. Constraint Learning: refine rules by learning appropriate feature constraints
![Page 62: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/62.jpg)
Mar 1, 2006 AVENUE/LETRAS 62
Flat Seed Rule Generation
Learning Example: NP
Eng: the big apple
Heb: ha-tapuax ha-gadol
Generated Seed Rule:
NP::NP [ART ADJ N] [ART N ART ADJ]
((X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2))
![Page 63: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/63.jpg)
Mar 1, 2006 AVENUE/LETRAS 63
Flat Seed Rule Generation
• Create a “flat” transfer rule specific to the sentence pair, partially abstracted to POS– Words that are aligned word-to-word and have the same POS in
both languages are generalized to their POS– Words that have complex alignments (or not the same POS)
remain lexicalized
• One seed rule for each translation example• No feature constraints associated with seed rules (but
mark the example(s) from which it was learned)
![Page 64: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/64.jpg)
Mar 1, 2006 AVENUE/LETRAS 64
Compositionality Learning
Initial Flat Rules: S::S [ART ADJ N V ART N] [ART N ART ADJ V P ART N]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))
NP::NP [ART ADJ N] [ART N ART ADJ]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N] [ART N]
((X1::Y1) (X2::Y2))
Generated Compositional Rule:
S::S [NP V NP] [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
![Page 65: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/65.jpg)
Mar 1, 2006 AVENUE/LETRAS 65
Compositionality Learning
• Detection: traverse the c-structure of the English sentence, add compositional structure for translatable chunks
• Generalization: adjust constituent sequences and alignments
• Two implemented variants:– Safe Compositionality: there exists a transfer rule that
correctly translates the sub-constituent– Maximal Compositionality: Generalize the rule if supported
by the alignments, even in the absence of an existing transfer rule for the sub-constituent
![Page 66: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/66.jpg)
Mar 1, 2006 AVENUE/LETRAS 66
Constraint LearningInput: Rules and their Example Sets
S::S [NP V NP] [NP V P NP] {ex1,ex12,ex17,ex26}
((X1::Y1) (X2::Y2) (X3::Y4))
NP::NP [ART ADJ N] [ART N ART ADJ] {ex2,ex3,ex13}
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N] [ART N] {ex4,ex5,ex6,ex8,ex10,ex11}
((X1::Y1) (X2::Y2))
Output: Rules with Feature Constraints:
S::S [NP V NP] [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4)
(X1 NUM = X2 NUM)
(Y1 NUM = Y2 NUM)
(X1 NUM = Y1 NUM))
![Page 67: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/67.jpg)
Mar 1, 2006 AVENUE/LETRAS 67
Constraint Learning
• Goal: add appropriate feature constraints to the acquired rules• Methodology:
– Preserve general structural transfer– Learn specific feature constraints from example set
• Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments)
• Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary
• The seed rules in a group form the specific boundary of a version space
• The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints
![Page 68: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/68.jpg)
Mar 1, 2006 AVENUE/LETRAS 68
Rule Refinement
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 69: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/69.jpg)
Mar 1, 2006 AVENUE/LETRAS 69
Interactive and Automatic Refinement of Translation Rules
• Problem: Improve Machine Translation quality.
• Proposed Solution: Put bilingual speakers back into the loop; use their corrections to detect the source of the error and automatically improve the lexicon and the grammar.
• Approach: Automate post-editing efforts by feeding them back into the MT system.Automatic refinement of translation rules that
caused an error beyond post-editing.
• Goal: Improve MT coverage and overall quality.
![Page 70: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/70.jpg)
Mar 1, 2006 AVENUE/LETRAS 70
Technical Challenges
Elicit minimal MT information from non-expert users
Automatically Refine and Expand
Translation Rules minimally
Manually written Automatically Learned
Automatic Evaluation of Refinement process
![Page 71: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/71.jpg)
AVENUE/LETRAS 71
Error Typology for Automatic Rule Refinement (simplified)
Missing word
Extra word
Wrong word order
Incorrect word
Wrong agreement
Local vs Long distance
Word vs. phrase
+ Word change
Sense
Form
Selectional restrictions
Idiom
Missing constraint
Extra constraint
![Page 72: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/72.jpg)
Mar 1, 2006 AVENUE/LETRAS 72
TCTool (Demo)• Add a word• Delete a word• Modify a word• Change word order
Actions:
Interactive elicitation of error information
precision recall
error detection 90% 89%
error classification 72% 71%
![Page 73: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/73.jpg)
Mar 1, 2006 AVENUE/LETRAS 73
1. Refine a translation rule:R0 R1 (change R0 to make it more
specific or more general)
Types of Refinement Operations
Automatic Rule Adaptation
R0:
R1:
NP
DET N ADJ
NP
DET ADJ N
a nice house
una casa bonito
NP
DET N ADJ
NP
DET ADJ N
a nice house
una casa bonita
N gender = ADJ gender
![Page 74: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/74.jpg)
Mar 1, 2006 AVENUE/LETRAS 74
2. Bifurcate a translation rule:R0 R0 (same, general rule)
R1 (add a new more specific rule)
Types of Refinement Operations
Automatic Rule Adaptation
R0: NP
DET N ADJ
NP
DET ADJ N
NP
DET ADJ N
NP
DET ADJ N
R1:
a nice house una casa bonita
a great artist un gran artista
ADJ type: pre-nominal
![Page 75: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/75.jpg)
AVENUE/LETRAS 75
Error Information Elicitation
Refinement Operation Typology
Automatic Rule Adaptation
Change word orderSL: Gaudí was a great artist
MT system output:TL: Gaudí era un artista grande
Ucorrection: *Gaudí era un artista grande Gaudí era un gran artista
A concrete example
clue word
error
correction
![Page 76: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/76.jpg)
76
Finding Triggering Feature(s): (error word, corrected word) =
need to postulate a new binary feature: feat1
Blame assignment (from MT system output)
tree: <((S,1 (NP,2 (N,5:1 "GAUDI") )
(VP,3 (VB,2 (AUX,17:2 "ERA") )
(NP,8 (DET,0:3 "UN")
(N,4:5 "ARTISTA")
(ADJ,5:4 "GRANDE") ) ) ) )>
Automatic Rule Adaptation
S,1
…
NP,1
…
NP,8
…Grammar
ADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))
ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))
![Page 77: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/77.jpg)
Mar 1, 2006 AVENUE/LETRAS 77
Refining Rules
• Bifurcate NP,8 NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N)
{NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N]( (X1::Y1) (X2::Y2) (X3::Y3)
((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ))
Automatic Rule Adaptation
![Page 78: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/78.jpg)
Mar 1, 2006 AVENUE/LETRAS 78
Refining Lexical EntriesADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = -))
ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = +))
Automatic Rule Adaptation
![Page 79: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/79.jpg)
Mar 1, 2006 AVENUE/LETRAS 79
Evaluating Improvement
Automatic Rule Adaptation
- Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present:- Corrected Translation Sentence- Original Translation Sentence (labelled as incorrect
by the user)
un artista gran
un gran artista
un grande artista
*un artista grande
![Page 80: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/80.jpg)
Mar 1, 2006 AVENUE/LETRAS 80
Evaluating Improvement
Automatic Rule Adaptation
- Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present:- Corrected Translation Sentence- Original Translation Sentence (labelled as incorrect
by the user)
*un artista gran
un gran artista
*un grande artista
*un artista grande
![Page 81: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/81.jpg)
Mar 1, 2006 AVENUE/LETRAS 81
Challenges and future work
• Credit and Blame assignment from TCTool Log Files and Xfer engine’s trace
• Order of corrections matters ~ explore rule interactions
• Explore the space between batch mode and fully interactive system
• Online TCTool always running to collect corrections from bilingual speakers make it into a game with rewards for the best users
![Page 82: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/82.jpg)
Mar 1, 2006 AVENUE/LETRAS 82
AVENUE Prototypes
• General XFER framework under development for past three years
• Prototype systems so far:– German-to-English, Dutch-to-English– Chinese-to-English– Hindi-to-English– Hebrew-to-English
• In progress or planned:– Mapudungun-to-Spanish– Quechua-to-Spanish– Native Alaskan languages (Inupiaq) to English– Native-Bolivian languages (Aymara) to Spanish– Native-Brazilian languages to Brazilian Portuguese
![Page 83: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/83.jpg)
Mar 1, 2006 AVENUE/LETRAS 83
Mapudungun
• Indigenous Language of Chile and Argentina• ~ 1 Million Mapuche Speakers
![Page 84: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/84.jpg)
Mar 1, 2006 AVENUE/LETRAS 84
Collaboration
• Mapuche Language Experts – Universidad de la Frontera (UFRO)
• Instituto de Estudios Indígenas (IEI)– Institute for Indigenous Studies
• Chilean Funding– Chilean Ministry of Education
(Mineduc)• Bilingual and Multicultural Education
Program
Eliseo Cañulef
Rosendo Huisca
Hugo Carrasco
Hector Painequeo
Flor Caniupil
Luis Caniupil Huaiquiñir
Marcela Collio Calfunao
Cristian Carrillan Anton
Salvador Cañulef
Carolina Huenchullan Arrúe
Claudio Millacura Salas
![Page 85: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/85.jpg)
Mar 1, 2006 AVENUE/LETRAS 85
Accomplishments
• Corpora Collection
– Spoken Corpus• Collected: Luis Caniupil Huaiquiñir • Medical Domain• 3 of 4 Mapudungun Dialects
– 120 hours of Nguluche– 30 hours of Lafkenche– 20 hours of Pwenche
• Transcribed in Mapudungun• Translated into Spanish
– Written Corpus• ~ 200,000 words• Bilingual Mapudungun – Spanish• Historical and newspaper text
nmlch-nmjm1_x_0405_nmjm_00:M: <SPA>no pütokovilu kay koC: no, si me lo tomaba con agua
M: chumgechi pütokoki femuechi pütokon pu <Noise> C: como se debe tomar, me lo tomé pués
nmlch-nmjm1_x_0406_nmlch_00:M: ChengewerkelafuymiürkeC: Ya no estabas como gente entonces!
![Page 86: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/86.jpg)
Mar 1, 2006 AVENUE/LETRAS 86
Accomplishments
• Developed At UFRO– Bilingual Dictionary with Examples
• 1,926 entries
– Spelling Corrected Mapudungun Word List• 117,003 fully-inflected word forms
– Segmented Word List• 15,120 forms• Stems translated into Spanish
![Page 87: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/87.jpg)
Mar 1, 2006 AVENUE/LETRAS 87
Accomplishments
• Developed at LTI using Mapudungun language resources from UFRO– Spelling Checker
• Integrated into OpenOffice
– Hand-built Morphological Analyzer– Prototype Machine Translation Systems
• Rule-Based• Example-Based
– Website: LenguasAmerindias.org
![Page 88: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/88.jpg)
Mar 1, 2006 AVENUE/LETRAS 88
QuechuaSpanish MT
• V-Unit: funded Summer project in Cusco (Peru) June-August 2005 [preparations and data collection started earlier]
• Intensive Quechua course in Centro Bartolome de las Casas (CBC)
• Worked together with two Quechua native and one non-native speakers on developing infrastructure (correcting elicited translations, segmenting and translating list of most frequent words)
![Page 89: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/89.jpg)
Mar 1, 2006 AVENUE/LETRAS 89
Quechua Spanish Prototype MT System
• Stem Lexicon (semi-automatically generated): 753 lexical entries
• Suffix lexicon: 21 suffixes – (150 Cusihuaman)
• Quechua morphology analyzer• 25 translation rules• Spanish morphology generation
module• User-Studies: 10 sentences, 3
users (2 native, 1 non-native)
![Page 90: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/90.jpg)
Mar 1, 2006 AVENUE/LETRAS 90
Challenges for Hebrew MT
• Paucity in existing language resources for Hebrew– No publicly available broad coverage morphological
analyzer– No publicly available bilingual lexicons or dictionaries– No POS-tagged corpus or parse tree-bank corpus for
Hebrew– No large Hebrew/English parallel corpus
• Scenario well suited for CMU transfer-based MT framework for languages with limited resources
![Page 91: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/91.jpg)
Mar 1, 2006 AVENUE/LETRAS 91
Hebrew Morphology Example
• Input word: B$WRH
0 1 2 3 4
|--------B$WRH--------|
|-----B-----|$WR|--H--|
|--B--|-H--|--$WRH---|
![Page 92: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/92.jpg)
Mar 1, 2006 AVENUE/LETRAS 92
Hebrew Morphology Example
Y0: ((SPANSTART 0) Y1: ((SPANSTART 0) Y2: ((SPANSTART 1) (SPANEND 4) (SPANEND 2) (SPANEND 3) (LEX B$WRH) (LEX B) (LEX $WR) (POS N) (POS PREP)) (POS N) (GEN F) (GEN M) (NUM S) (NUM S) (STATUS ABSOLUTE)) (STATUS ABSOLUTE))
Y3: ((SPANSTART 3) Y4: ((SPANSTART 0) Y5: ((SPANSTART 1) (SPANEND 4) (SPANEND 1) (SPANEND 2) (LEX $LH) (LEX B) (LEX H) (POS POSS)) (POS PREP)) (POS DET))
Y6: ((SPANSTART 2) Y7: ((SPANSTART 0) (SPANEND 4) (SPANEND 4) (LEX $WRH) (LEX B$WRH) (POS N) (POS LEX)) (GEN F) (NUM S) (STATUS ABSOLUTE))
![Page 93: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/93.jpg)
Mar 1, 2006 AVENUE/LETRAS 93
Sample Output (dev-data)
maxwell anurpung comes from ghana for israel four years ago and since worked in cleaning in hotels in eilat
a few weeks ago announced if management club hotel that for him to leave israel according to the government instructions and immigration police
in a letter in broken english which spread among the foreign workers thanks to them hotel for their hard work and announced that will purchase for hm flight tickets for their countries from their money
![Page 94: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/94.jpg)
Mar 1, 2006 AVENUE/LETRAS 94
Future Research Directions
• Automatic Transfer Rule Learning:– In the “large-data” scenario: from large volumes of
uncontrolled parallel text automatically word-aligned– In the absence of morphology or POS annotated lexica– Learning mappings for non-compositional structures– Effective models for rule scoring for
• Decoding: using scores at runtime• Pruning the large collections of learned rules
– Learning Unification Constraints
• Integrated Xfer Engine and Decoder– Improved models for scoring tree-to-tree mappings,
integration with LM and other knowledge sources in the course of the search
![Page 95: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/95.jpg)
Mar 1, 2006 AVENUE/LETRAS 95
Future Research Directions
• Automatic Rule Refinement
• Morphology Learning
• Feature Detection and Corpus Navigation
• Prototypes for New Languages
![Page 96: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/96.jpg)
Mar 1, 2006 AVENUE/LETRAS 96
Publications• 2005, Carbonell, J. G., A. Lavie
, L. Levin and A. Black, "Language Technologies for Humanitarian Aid". In Technology for Humanitarian Action, K. M. Cahill (ed.), pp. 111-138, Fordham University Press, ISBN 0-8232-2393-0, 2005.
• 2005. Font Llitjós, A., R. Aranovich and L. Levin. "Building Machine translation systems for indigenous languages". Second Conference on the Indigenous Languages of Latin America (CILLA II), 27-29 October 2005, Texas, USA.
• 2005, Font-Llitjos, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" . In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT-2005), Budapest, Hungary, May 2005.
• 2004, Lavie, A., S. Wintner, Y. Eytani, E. Peterson and K. Probst. "Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System". In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2004), Baltimore, MD, October 2004. Pages 1-10.
• 2004, Probst, K. and A. Lavie. "A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings between Languages". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004.
![Page 97: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/97.jpg)
Mar 1, 2006 AVENUE/LETRAS 97
Publications• 2004. Font Llitjós, A., K. Probst and J.G. Carbonell .
"Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004.
• 2004, Monson, C., A. Lavie, J. Carbonell and L. Levin "Unsupervised Induction of Natural Language Morphology Inflection Classes". In Proceedings of Workshop on Current Themes in Computational Phonology and Morphology at the 42th Annual Meeting of the Association of Computational Linguistics (ACL-2004), Barcelona, Spain, July 2004.
• 2004, Monson, C., L. Levin, R. Vega, R. Brown, A. Font Llitjos, A. Lavie, J. Carbonell, E. Cañulef, R. Huisca. "Data Collection and Analysis of Mapudungun Morphology for Spelling Correction". In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal, May 2004.
• 2004. Font Llitjós, A. and J.G. Carbonell . "The Translation Correction Tool: English-Spanish user studies“. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004). Lisbon, Portugal, May 2004.
• 2004, Lavie, A., K. Probst, E. Peterson, S. Vogel, L.Levin, A. Font-Llitjos and J. Carbonell. "A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources". In Proceedings of Workshop of the European Association for Machine Translation (EAMT-2004), Valletta, Malta, April 2004.
![Page 98: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/98.jpg)
Mar 1, 2006 AVENUE/LETRAS 98
Publications• 2003, Lavie, A., S. Vogel, L. Levin, E. Peterson, K. Probst, A. Font Llitjos
, R. Reynolds, J. Carbonell, and R. Cohen, "Experiments with a Hindi-to-English Transfer-based MT System under a Miserly Data Scenario". ACM Transactions on Asian Language Information Processing (TALIP), 2(2). June 2003. Pages 143-163.
• 2002, Probst, K., L. Levin, E. Peterson, A. Lavie, and J. Carbonell, "MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules". Machine Translation, 17(4). Pages 245-270.
• 2002, Carbonell, J., K. Probst, E. Peterson, C. Monson, A. Lavie, R. Brown and L. Levin. "Automatic Rule Learning for Resource Limited MT". In Proceedings of 5th Conference of the Association for Machine Translation in the Americas (AMTA-2002), Tiburon, CA, October 2002.
• 2002, Levin, L., R. Vega, J. Carbonell, R. Brown, A. Lavie, E. Canulef and C. Huenchullan. "Data Collection and Language Technologies for Mapudungun". In Proceedings of International Workshop on Resources and Tools in Field Linguistics at the Third International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Canary Islands, Spain, June 2002.
• 2001, Probst, K., R. Brown, J. Carbonell, A. Lavie, L. Levin, and E. Peterson. "Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages". In Proceedings of the MT-2010 Workshop at MT-Summit VIII, Santiago de Compostela, Spain, September 2001.
![Page 99: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/99.jpg)
Mar 1, 2006 AVENUE/LETRAS 99
Mapudungun-to-Spanish Example
Mapudungun
pelafiñ Maria
Spanish
No vi a María
English
I didn’t see Maria
![Page 100: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/100.jpg)
Mar 1, 2006 AVENUE/LETRAS 100
Mapudungun-to-Spanish Example
Mapudungun
pelafiñ Mariape -la -fi -ñ Mariasee -neg -3.obj -1.subj.indicative Maria
Spanish
No vi a MaríaNo vi a Maríaneg see.1.subj.past.indicative acc Maria
English
I didn’t see Maria
![Page 101: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/101.jpg)
Mar 1, 2006 AVENUE/LETRAS 101
V
pe
pe-la-fi-ñ Maria
![Page 102: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/102.jpg)
Mar 1, 2006 AVENUE/LETRAS 102
V
pe
pe-la-fi-ñ Maria
VSuff
laNegation = +
![Page 103: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/103.jpg)
Mar 1, 2006 AVENUE/LETRAS 103
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffGPass all features up
![Page 104: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/104.jpg)
Mar 1, 2006 AVENUE/LETRAS 104
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fiobject person = 3
![Page 105: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/105.jpg)
Mar 1, 2006 AVENUE/LETRAS 105
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffGPass all features up from both children
![Page 106: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/106.jpg)
Mar 1, 2006 AVENUE/LETRAS 106
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
person = 1number = sgmood = ind
![Page 107: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/107.jpg)
Mar 1, 2006 AVENUE/LETRAS 107
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
Pass all features up from both children
VSuffG
![Page 108: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/108.jpg)
Mar 1, 2006 AVENUE/LETRAS 108
V
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
Pass all features up from both children
VSuffGCheck that:1) negation = +2) tense is undefined
![Page 109: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/109.jpg)
Mar 1, 2006 AVENUE/LETRAS 109
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
V NP
N
Maria
N person = 3number = sghuman = +
![Page 110: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/110.jpg)
Mar 1, 2006 AVENUE/LETRAS 110
Pass features up from
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
Check that NP is human = +V VP
![Page 111: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/111.jpg)
Mar 1, 2006 AVENUE/LETRAS 111
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
![Page 112: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/112.jpg)
Mar 1, 2006 AVENUE/LETRAS 112
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Pass all features to Spanish side
![Page 113: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/113.jpg)
Mar 1, 2006 AVENUE/LETRAS 113
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Pass all features down
![Page 114: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/114.jpg)
Mar 1, 2006 AVENUE/LETRAS 114
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Pass object features down
![Page 115: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/115.jpg)
Mar 1, 2006 AVENUE/LETRAS 115
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Accusative marker on objects is introduced because human = +
![Page 116: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/116.jpg)
Mar 1, 2006 AVENUE/LETRAS 116
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
VP::VP [VBar NP] -> [VBar "a" NP]( (X1::Y1)
(X2::Y3)
((X2 type) = (*NOT* personal)) ((X2 human) =c +)
(X0 = X1) ((X0 object) = X2)
(Y0 = X0)
((Y0 object) = (X0 object))(Y1 = Y0)(Y3 = (Y0 object))((Y1 objmarker person) = (Y3 person))((Y1 objmarker number) = (Y3 number))((Y1 objmarker gender) = (Y3 ender)))
![Page 117: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/117.jpg)
Mar 1, 2006 AVENUE/LETRAS 117
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
Pass person, number, and mood features to Spanish Verb
Assign tense = past
![Page 118: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/118.jpg)
Mar 1, 2006 AVENUE/LETRAS 118
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
Introduced because negation = +
![Page 119: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/119.jpg)
Mar 1, 2006 AVENUE/LETRAS 119
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
ver
![Page 120: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/120.jpg)
Mar 1, 2006 AVENUE/LETRAS 120
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vervi
person = 1number = sgmood = indicativetense = past
![Page 121: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/121.jpg)
Mar 1, 2006 AVENUE/LETRAS 121
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vi N
María
N
Pass features over to Spanish side
![Page 122: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/122.jpg)
Mar 1, 2006 AVENUE/LETRAS 122
V
pe
I Didn’t see Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vi N
María
N
![Page 123: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students](https://reader035.vdocuments.site/reader035/viewer/2022070401/56649f165503460f94c2bd6b/html5/thumbnails/123.jpg)
Mar 1, 2006 AVENUE/LETRAS 123