challenges of machine translation csc 5930 machine translation fall 2012 dr. tom way

11
Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Upload: charity-williams

Post on 05-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Challenges of Machine Translation

CSC 5930 Machine Translation

Fall 2012 Dr. Tom Way

Page 2: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Translation is hard

• Novels

• Word play, jokes, puns, hidden messages

• Concept gaps: go Greek, bei fen

• Other constraints: lyrics, dubbing, poem, …

Page 3: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Major challenges

• Getting the right words:– Choosing the correct root form– Getting the correct inflected form– Inserting “spontaneous” words

• Putting the words in the correct order:– Word order: SVO vs. SOV, …– Unique constructions: – Divergence

Page 4: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Lexical choice

• Homonymy/Polysemy: bank, run

• Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, …

• Coding (Concept lexeme mapping) differences:– More distinction in one language: e.g., kinship

vocabulary.– Different division of conceptual space:

Page 5: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Choosing the appropriate inflection

• Inflection: gender, number, case, tense, …

• Ex:– Number: Ch-Eng: all the concrete nouns: ch_book book, books– Gender: Eng-Fr: all the adjectives– Case: Eng-Korean: all the arguments– Tense: Ch-Eng: all the verbs: ch_buy buy, bought, will buy

Page 6: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Inserting spontaneous words• Function words:

– Determiners: Ch-Eng: ch_book a book, the book, the books, books

– Prepositions: Ch-Eng: … ch_November … in November

– Relative pronouns: Ch-Eng: … ch_buy ch_book de ch_person the person who bought /book/

– Possessive pronouns: Ch-Eng: ch_he ch_raise ch_hand He raised his hand(s)

– Conjunction: Eng-Ch: Although S1, S2 ch_although S1, ch_but S2

– …

Page 7: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Inserting spontaneous words (cont)

• Content words:– Dropped argument: Ch-Eng: ch_buy le ma Has Subj bought Obj?

– Chinese First name: Eng-Ch: Jiang … ch_Jiang ch_Zemin …

– Abbreviation, Acronyms: Ch-Eng: ch_12 ch_big the 12th National Congress of the

CPC (Communist Party of China)

– …

Page 8: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Major challenges

• Getting the right words:– Choosing the correct root form– Getting the correct inflected form– Inserting “spontaneous” words

• Putting the words in the correct order:– Word order: SVO vs. SOV, …– Unique construction: – Structural divergence

Page 9: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Word order

• SVO, SOV, VSO, …• VP + PP PP VP• VP + AdvP AdvP + VP

• Adj + N N + Adj• NP + PP PP NP• NP + S S NP

• P + NP NP + P

Page 10: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

“Unique” Constructions

• Overt wh-movement: Eng-Ch:– Eng: Why do you think that he came yesterday?– Ch: you why think he yesterday come ASP?– Ch: you think he yesterday why come?

• Ba-construction: Ch-Eng– She ba homework finish ASP She finished her

homework.– He ba wall dig ASP CL hole He digged a hole in

the wall.– She ba orange peel ASP skin She peeled the

orange’s skin.

Page 11: Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

Translation divergences

• Source and target parse trees (dependency trees) are not identical.

• Example: I like Mary S: Marta me gusta a mi (‘Mary pleases me’)