Download - Omnivorous MT
![Page 1: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/1.jpg)
Building NLP Systems for Two Resource Scarce Indigenous
Languages: Mapudungun and Quechua, and some other
languages
Christian Monson, Ariadna Font Llitjós, Roberto Aranovich, Lori Levin, Ralf
Brown, Erik Peterson, Jaime Carbonell, and Alon Lavie
![Page 2: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/2.jpg)
Omnivorous MT
• Eat whatever resources are available
• Eat large or small amounts of data
Mapusaurus RoseaeMapu = landMapuche = land peopleMapudungun= land speech
![Page 3: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/3.jpg)
AVENUE’s Inventory
• Resources– Parallel corpus– Monolingual corpus– Lexicon– Morphological
Analyzer (lemmatizer)– Human Linguist– Human non-linguist
• Techniques– Rule based transfer
system– Example Based MT– Morphology Learning– Rule Learning– Interactive Rule
Refinement– Multi-Engine MT
This research was funded in part by NSF grant number IIS-0121-631.
![Page 4: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/4.jpg)
Startup without corpus or linguist
Requires someone who is bilingual and literate
![Page 5: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/5.jpg)
The Elicitation Tool has been used with these languages
• Mapudungun• Hindi• Hebrew• Quechua• Aymara• Thai• Japanese• Chinese• Dutch• Arabic
![Page 6: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/6.jpg)
Purpose of Elicitation
• Provide a small but highly targeted corpus of hand aligned data– To support machine
learning from a small data set
– To discover basic word order
– To discover how syntactic dependencies are expressed
– To discover which grammatical meanings are reflected in the morphology or syntax of the language
srcsent: Tú caístetgtsent: eymi ütrünagimialigned: ((1,1),(2,2))context: tú = Juan [masculino, 2a persona del
singular]comment: You (John) fell
srcsent: Tú estás cayendotgtsent: eymi petu ütünagimialigned: ((1,1),(2 3,2 3))context: tú = Juan [masculino, 2a persona del
singular]comment: You (John) are falling
srcsent: Tú caíste tgtsent: eymi ütrunagimialigned: ((1,1),(2,2))context: tú = María [femenino, 2a persona del
singular]comment: You (Mary) fell
![Page 7: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/7.jpg)
Feature Structuressrcsent: Mary was not a leader.context: Translate this as though it were spoken to a peer co-
worker;
((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…))
(pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…))
(c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical-aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase-aspect phase-aspect-neutral) (c-general-type declarative-clause)(c-polarity polarity-negative)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)…)
![Page 8: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/8.jpg)
Current Work
• Search space:– Elements of meanings that might be
expressed by syntax or morphology: tense, aspect, person, number, gender, causation, evidentiality, etc.
– Syntactic dependencies: subject, object– Interactions of features:
• Tense and person • Tense and interrogative mood• Etc.
![Page 9: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/9.jpg)
Current Work
• For a new language– For each item of the search space
• Eliminate it as irrelevant or• Explore it
– Using as few sentences as possible
![Page 10: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/10.jpg)
Mar 1, 2006
Tools for Creating Elicitation Corpora
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
XML SchemaXSLT Script
![Page 11: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/11.jpg)
Mar 1, 2006
Tools for Creating Elicitation Corpora
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
Combination Formalism
![Page 12: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/12.jpg)
Mar 1, 2006
Tools for Creating Elicitation Corpora
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
Feature Structure Viewer
![Page 13: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/13.jpg)
Mar 1, 2006
Tools for Creating Elicitation Corpora
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
![Page 14: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/14.jpg)
Outline
• Two ideas– Omnivorous MT– Startup for low resource situation
• Four Languages– Mapudungun– Quechua– Hindi– Hebrew
![Page 15: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/15.jpg)
The Avenue Low Resource Scenario
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 16: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/16.jpg)
The Avenue Low Resource Scenario
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 17: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/17.jpg)
The Avenue Low Resource Scenario
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 18: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/18.jpg)
The Avenue Low Resource Scenario
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 19: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/19.jpg)
Mapudungun Language
• 900,000 Mapuche people• At least 300.000 speakers of Mapudungun• Polysynthetic
sl: pe- rke- fi- ñ Maria ver-REPORT-3pO-1pSgS/INDtl: DICEN QUE LA VI A MARÍA (They say that) I saw Maria.
![Page 20: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/20.jpg)
AVENUE Mapudungun
• Joint project between Carnegie Mellon University, the Chilean Ministry of Education, and Universidad de la Frontera.
![Page 21: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/21.jpg)
Mapudungun to Spanish Resources
• Initially: – Large team of native speakers at Universidad de la Frontera,
Temuco, Chile• Some knowledge of linguistics• No knowledge of computational linguistics
– No corpus– A few short word lists– No morphological analyzer
• Later: Computational Linguists with non-native knowledge of Mapudungun
• Other considerations:– Produce something that is useful to the community, especially for
bilingual education– Experimental MT systems are not useful
![Page 22: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/22.jpg)
Mapudungun
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
Corpus: 170 hours of spoken Mapudungun
Example Based MT
Spelling checker
Spanish Morphology from UPC, Barcelona
![Page 23: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/23.jpg)
Mapudungun Products
• http://www.lenguasamerindias.org/– Click: traductor mapudungún– Dictionary lookup (Mapudungun to Spanish)– Morphological analysis– Example Based MT (Mapudungun to Spanish)
![Page 24: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/24.jpg)
V
pe
I Didn’t see Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vi N
María
N
![Page 25: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/25.jpg)
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
VP::VP [VBar NP] -> [VBar "a" NP]( (X1::Y1)
(X2::Y3)
((X2 type) = (*NOT* personal)) ((X2 human) =c +)
(X0 = X1) ((X0 object) = X2)
(Y0 = X0)
((Y0 object) = (X0 object))(Y1 = Y0)(Y3 = (Y0 object))((Y1 objmarker person) = (Y3 person))((Y1 objmarker number) = (Y3 number))((Y1 objmarker gender) = (Y3 ender)))
![Page 26: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/26.jpg)
AVENUE Hebrew
• Joint project of Carnegie Mellon University and University of Haifa
![Page 27: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/27.jpg)
Hebrew Language
• Native language of about 3-4 Million in Israel• Semitic language, closely related to Arabic and with
similar linguistic properties– Root+Pattern word formation system– Rich verb and noun morphology– Particles attach as prefixed to the following word: definite article
(H), prepositions (B,K,L,M), coordinating conjuction (W), relativizers ($,K$)…
• Unique alphabet and Writing System– 22 letters represent (mostly) consonants– Vowels represented (mostly) by diacritics– Modern texts omit the diacritic vowels, thus additional level of
ambiguity: “bare” word word– Example: MHGR mehager, m+hagar, m+h+ger
![Page 28: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/28.jpg)
Hebrew Resources
• Morphological analyzer developed at Technion
• Constructed our own Hebrew-to-English lexicon, based primarily on existing “Dahan” H-to-E and E-to-H dictionary
• Human Computational Linguists
• Native Speakers
![Page 29: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/29.jpg)
Hebrew
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 30: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/30.jpg)
Flat Seed Rule Generation
Learning Example: NP
Eng: the big apple
Heb: ha-tapuax ha-gadol
Generated Seed Rule:
NP::NP [ART ADJ N] [ART N ART ADJ]
((X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2))
![Page 31: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/31.jpg)
Compositionality Learning
Initial Flat Rules: S::S [ART ADJ N V ART N] [ART N ART ADJ V P ART N]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))
NP::NP [ART ADJ N] [ART N ART ADJ]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N] [ART N]
((X1::Y1) (X2::Y2))
Generated Compositional Rule:
S::S [NP V NP] [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
![Page 32: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/32.jpg)
Constraint LearningInput: Rules and their Example Sets
S::S [NP V NP] [NP V P NP] {ex1,ex12,ex17,ex26}
((X1::Y1) (X2::Y2) (X3::Y4))
NP::NP [ART ADJ N] [ART N ART ADJ] {ex2,ex3,ex13}
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N] [ART N] {ex4,ex5,ex6,ex8,ex10,ex11}
((X1::Y1) (X2::Y2))
Output: Rules with Feature Constraints:
S::S [NP V NP] [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4)
(X1 NUM = X2 NUM)
(Y1 NUM = Y2 NUM)
(X1 NUM = Y1 NUM))
![Page 33: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/33.jpg)
Quechua facts• Agglutinative language
• A stem can often have 10 to 12 suffixes, but it can have up to 28 suffixes
• Supposedly clear cut boundaries, but in reality several suffixes change when followed by certain other suffixes
• No irregular verbs, nouns or adjectives
• Does not mark for gender
• No adjective agreement
• No definite or indefinite articles (‘topic’ and ‘focus’ markers perform a similar task of articles and intonation in English or Spanish)
![Page 34: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/34.jpg)
Quechua examples
– taki+ni (also written takiniy)sing 1sg (I sing) canto
– taki+sha+ni (takishaniy)sing progr 1sg (I am singing) estoy cantando
– taki+pa+ku+q+chu? taki sing -pa+ku to join a group to do something -q agentive -chu interrogative
(para) cantar con la gente (del pueblo)? (to sing with the people (of the village)?)
![Page 35: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/35.jpg)
Quechua Resources
• A few native speakers, not linguists
• A computational linguist learning Quechua
• Two fluent, but non-native linguists
![Page 36: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/36.jpg)
Quechua
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
Parallel Corpus: OCR with correction
![Page 37: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/37.jpg)
Grammar rules;taki+sha+ni -> estoy cantando (I am singing){VBar,3} VBar::VBar : [V VSuff VSuff] -> [V V]( (X1::Y2)
((x0 person) = (x3 person)) ((x0 number) = (x3 number)) ((x2 mood) =c ger) ((y2 mood) = (x2 mood)) ((y1 form) =c estar) ((y1 person) = (x3 person)) ((y1 number) = (x3 number)) ((y1 tense) = (x3 tense))((x0 tense) = (x3 tense))((y1 mood) = (x3 mood))((x3 inflected) =c +)((x0 inflected) = +))
lex = cantarmood = ger
lex = estarperson = 1number = sgtense = presmood = ind
SpanishMorphologyGeneration
estoy
cantando
![Page 38: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/38.jpg)
Hindi Resources
• Large statistical lexicon from the Linguistic Data Consortium (LDC)
• Parallel Corpus from LDC• Morphological Analyzer-Generator from LDC• Lots of native speakers• Computational linguists with little or no
knowledge of Hindi• Experimented with the size of the parallel corpus
– Miserly and large scenarios
![Page 39: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/39.jpg)
Hindi
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
15,000 Noun Phrases from Penn TreeBank
Parallel Corpus
EBMT
SMT
Supported by DARPA TIDES
![Page 40: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/40.jpg)
Manual Transfer Rules: Example
; NP1 ke NP2 -> NP2 of NP1; Ex: jIvana ke eka aXyAya; life of (one) chapter ; ==> a chapter of life;{NP,12}NP::NP : [PP NP1] -> [NP1 PP]( (X1::Y2) (X2::Y1); ((x2 lexwx) = 'kA'))
{NP,13}NP::NP : [NP1] -> [NP1]( (X1::Y1))
{PP,12}PP::PP : [NP Postp] -> [Prep NP]( (X1::Y2) (X2::Y1))
NP
PP NP1
NP P Adj N
N1 ke eka aXyAya
N
jIvana
NP
NP1 PP
Adj N P NP
one chapter of N1
N
life
![Page 41: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/41.jpg)
System BLEU M-BLEU NIST
EBMT 0.058 0.165 4.22
SMT 0.093 0.191 4.64
XFER (naïve) man
grammar
0.055 0.177 4.46
XFER (strong) no grammar
0.109 0.224 5.29
XFER (strong) learned
grammar
0.116 0.231 5.37
XFER (strong) man
grammar
0.135 0.243 5.59
XFER+SMT
0.136 0.243 5.65
Very miserly training data.
Seven combinations of components
Strong decoder allows re-ordering
Three automatic scoring metrics
Hindi-English
![Page 42: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/42.jpg)
Extra Slides
![Page 43: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/43.jpg)
The Avenue Low Resource Scenario
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
![Page 44: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/44.jpg)
Feature Specification
• Defines Features and their values
• Sets default values for features
• Specifies feature requirements and restrictions
• Written in XML
![Page 45: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/45.jpg)
Feature SpecificationFeature: c-copula-type
(a copula is a verb like “be”; some languages do not have copulas)Values
copula-n/a Restrictions: 1. ~(c-secondary-type secondary-copula)Notes:
copula-role Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler"
copula-identity Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "Clark Kent is Superman" "Sam is the teacher"
copula-location Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification.
copula-description Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A description is an attribute. "The children are happy." "The books are long."
![Page 46: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/46.jpg)
Feature Maps
• Some features interact in the grammar– English –s reflects person and number of the subject and tense of
the verb.– In expressing the English present progressive tense, the auxiliary
verb is in a different place in a question and a statement:• He is running.
• Is he running?
• We need to check many, but not all combinations of features and values.
• Using unlimited feature combinations leads to an unmanageable number of sentences
![Page 47: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/47.jpg)
![Page 48: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/48.jpg)
Evidentiality Map
Lexical Aspect
Assertiveness
Polarity
Source
Tense
Gram.
Aspect
activity-accomplishment
Assertiveness-asserted, Assetiveness-neutral
Polarity-positive, Polarity-negative
Hearsay, quotative, inferred, assumption
Visual, Auditory, non-visual-or-auditory
Past Present, Future Past Present
Perfective, progressive, habitual, neutral
habitual, neutral, progressive
Perfective, progressive, habitual, neutral
habitual, neutral, progressive
![Page 49: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/49.jpg)
Current Work
• Navigation– Start: large search space of all possible
feature combinations– Finish: each feature has been eliminated as
irrelevant or has been explored– Goal: dynamically find the most efficient path
through the search space for each language.
![Page 50: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/50.jpg)
Current Work
• Feature Detection– Which features have an effect on
morphosyntax?– What is the effect?– Drives the Navigation process
![Page 51: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/51.jpg)
Feature Detection: Spanish
The girl saw a red book.((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))La niña vió un libro rojo
A girl saw a red book((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))Una niña vió un libro rojo
I saw the red book((1,1)(2,2)(3,3)(4,5)(5,4))Yo vi el libro rojo
I saw a red book.
((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: noMarked-on-dependent: yesMarked-on-governor: noMarked-on-other: noAdd/delete-word: noChange-in-alignment: no
![Page 52: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/52.jpg)
Feature Detection: Chinese
A girl saw a red book.
((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8))
有 一个 女人 看见 了 一本 红色 的 书 。
The girl saw a red book.
((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7))
女人 看见 了 一本 红色的 书
Feature: definiteness
Values: definite, indefinite
Function-of-*: subject
Marked-on-head-of-*: no
Marked-on-dependent: no
Marked-on-governor: no
Add/delete-word: yes
Change-in-alignment: no
![Page 53: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/53.jpg)
Feature Detection: Chinese
I saw the red book((1, 3)(2, 4)(2, 5)(4, 1)(5, 2))
红色的 书, 我 看见 了
I saw a red book.((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6))我 看见 了 一本 红色的 书 。
Feature: definitenesValues: definite, indefiniteFunction-of-*: objectMarked-on-head-of-*: noMarked-on-dependent: noMarked-on-governor: noAdd/delete-word: yesChange-in-alignment: yes
![Page 54: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/54.jpg)
Feature Detection: Hebrew
A girl saw a red book.((2,1) (3,2)(5,4)(6,3))
ראתה ספר אדוםילדה
The girl saw a red book((1,1)(2,1)(3,2)(5,4)(6,3))
ראתה ספר אדוםהילדה
I saw a red book.((2,1)(4,3)(5,2))
אדוםספרראיתי
I saw the red book.((2,1)(3,3)(3,4)(4,4)(5,3))
האדוםהספרראיתי את
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: yesMarked-on-dependent: yesMarked-on-governor: noAdd-word: noChange-in-alignment: no
![Page 55: Omnivorous MT](https://reader036.vdocuments.site/reader036/viewer/2022081505/5681586a550346895dc5c8f9/html5/thumbnails/55.jpg)
Feature Detection Feeds into…
• Corpus Navigation: which minimal pairs to pursue next.– Don’t pursue gender in Mapudungun– Do pursue definiteness in Hebrew
• Morphology Learning:– Morphological learner identifies the forms of the morphemes– Feature detection identifies the functions
• Rule learning:– Rule learner will have to learn a constraint for each morpho-
syntactic marker that is discovered• E.g., Adjectives and nouns agree in gender, number, and definiteness
in Hebrew.