monotrans: human-computer collaborative translation chang hu, ben bederson, philip resnik...

47
Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and Information Processing Lab University of Maryland Crowdsourcing Translation with People Who Speak Only One Language

Upload: katherine-sparks

Post on 28-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Monotrans: Human-Computer Collaborative Translation

Chang Hu, Ben Bederson, Philip Resnik

Human-Computer Interaction LabComputational Linguistics and Information Processing Lab

University of Maryland

Crowdsourcing Translation with People Who Speak Only One Language

Page 2: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

• Why translation by monolingual people?• How Monotrans works• Research prototype• Preliminary evaluation

Outline

Page 3: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Source: Global Reach, Internet World Stats

Languages on Internet by Population

English52%

Chinese5%

Spanish5%

Japanese9%

the rest29%

2000

Page 4: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Source: Global Reach, Internet World Stats

Languages on Internet by Population

English32%

Chinese21%

Spanish8%

Japanese8%

the rest31%

2005

English52%

Chinese5%

Spanish5%

Japanese9%

the rest29%

2000

Page 5: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Source: Global Reach, Internet World Stats

Languages on Internet by Population

English28%

Chinese23%

Spanish8%

Japanese5%

the rest37%

2009

English52%

Chinese5%

Spanish5%

Japanese9%

the rest29%

2000

English32%

Chinese21%

Spanish8%

Japanese8%

the rest31%

2005

Page 6: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

A real-world problem: International Children’s Digital Library

www.childrenslibrary.org

Page 7: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Machine Translation (MT)

Large volume, cheap, fast Unreliable quality

(餐厅 = restaurant, dining hall)

Page 8: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Professional Translators

High quality, but slow and expensive(even for common language pairs)

Page 9: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Translation with the Crowd

Bottle neck: bilingual people

Page 10: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Translation with the Crowd

vs. 75,000 contributors Wikipedia: 800 translators

Translation with the Monolingual Crowd

Page 11: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Quality

Affor

dabi

lity

MachineTranslationMachineTranslation

Professional Bilingual Human ParticipationProfessional Bilingual Human Participation

Amateur Bilingual Human ParticipationAmateur Bilingual Human Participation

MonolingualHumanParticipation

Page 12: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

• Why translation by monolingual people?

• How Monotrans works• Research prototype• Preliminary evaluation

Outline

Page 13: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Basic Idea

Original source sentence

Fluent translation

MTInaccurate back translation

Fluent, accurate source sentence MT Et cetera…

Source language speaker

MT Inaccurate translation

Target language speaker

Page 14: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

An (Richer) Example

Page 15: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Mary

Page 16: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Mary

Sees: In general, it means well, both.MT

Page 17: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

MT

Page 18: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

MT

MT

Page 19: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

MT

MT

Page 20: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

MT

MT

MT

enrichment

In generalEn général

Get alongNous entendons

Page 21: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

MT

MT

MT

enrichment

Page 22: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

Edits into: In general, we get along well.

MT

MT

MT

enrichment

Page 23: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Sees: En général, nous nous entendons bien.(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

Edits into: In general, we get along well.

MT

MT

MT

MT

enrichment

Page 24: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Sees: En général, nous nous entendons bien.(lit. In general, we get along well.)

Proposes to stop with current translation

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

Edits into: In general, we get along well.

Agrees to stop with current translation

MT

MT

MT

MT

enrichment

Page 25: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Monotrans Protocol

Page 26: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

• Why translation by monolingual people?• How Monotrans works

• Research prototype• Preliminary evaluation

Outline

Page 27: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and
Page 28: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Web link

Image

Mark OK

Mark unclear

Page 29: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

• Why translation by monolingual people?• How Monotrans works• Research prototype

• Preliminary evaluation

Outline

Page 30: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Preliminary Evaluation

• Older version of the UI (same protocol)• Children’s book, Russian to Chinese• 2 Russian speakers and 4 Chinese speakers

formed 4 Pairs*• 1 hour per pair

Page 31: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Results

• 44 sentences (6 pages) worked on• 28 sentences finished (≈ 4 pages)• Overall translation speed: 50 words per hour

professional translator speed: 250 words per hour

Page 32: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Evaluation

none little much most all0

2

4

6

8

10

12

14

6 6

9

43

0

45

12

7

Google TranslateMonotrans

Original meaning translated

# of

sen

tenc

es

Page 33: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Google Translate …

none little much most all0

2

4

6

8

10

12

14

6 6

9

43

Google TranslateMonotrans

Original meaning translated

# of

sen

tenc

es

Page 34: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

… Monotrans

None Little Much Most All0

2

4

6

8

10

12

14

6 6

9

43

0

45

12

7

Google TranslateMonotrans

Original meaning translated

# of

sen

tenc

es

Page 35: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Where to from here?• Larger and more formal validation of the protocol• Richer annotations

✓Images✓Web links✓Marking correct spans✓Marking incorrect spansParaphraseWord clouds…??

• Large-scale crowd support (CrowdFlow talk @1:20PM)

Page 36: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

• Monolingual translation can help large-scale translation

• Translation with monolingual people is actually feasible

Take-Away Message

Page 37: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Sponsors

Page 38: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Thank You

Q&A

Page 39: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Backup slides

Page 40: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Projected annotation

• Project information from one language to another using word alignments as a bridge

• Illustration of how this has been done for natural language annotation

[Kolak 2005]

Page 41: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Projected annotationTout le monde doit entendre l'histoire de Cendrillon

Everybody has heard the business by CinderellaMT

Tout le monde doit entendre l'histoire de Cendrillon

Everybody has heard the business by Cinderella

MT

Everybody has heard the story about Cinderella

Tout le monde doit entendre l'histoire de Cendrillon

Everybody has heard the business by Cinderella

MT

=> Pilot experiment results: Projected annotations helped improve translation

Page 42: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

One of my examples involves rmvng ll th vwls frm th wrds nd shwng tht th rdr cn stll ndrstnd th sntnc.

Page 43: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Tout le monde doit entendre l'histoire de Cendrillon.

MT

Pilot experiment results: Post-editing machine translation output by monolingual people improves translation quality

Everybody has hear story about Cinderella

Everybody has heard the story about Cinderella

Three Types of Errors

I. Detectable and Correctable Error

Page 44: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Everybody has heard the story about Cinderella

Tout le monde doit entendre l'histoire de Cendrillon.

Everybody has hear story about Cinderella

MT MT

Everybody has heard the business by Cinderella

II. Detectable but not Correctable Error

Communication needed

Three Types of Errors

Page 45: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Everybody has heard the story about Cinderella

Tout le monde doit entendre l'histoire de Cendrillon.

Everybody has hear story about Cinderella

MT MT

Everybody has heard the business by Cinderella

II. Detectable but not Correctable Error

Pilot experiment results: Communication through enrichment channel can improve translation

Three Types of Errors

Page 46: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Everybody has heard the story about Cinderella

Tout le monde doit entendre l'histoire de Cendrillon.

Everybody has hear story about Cinderella

MT MT MT

Everybody has heard the business by Cinderella

Everybody loves the story about Cinderella

Need more redundancy

III. Undetectable Error

Add more redundancy, reduce it to type I or type II

Three Types of Errors

Page 47: Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and

Prototype Evaluation

System seems promising

(1=unintelligible, 4=very intelligible) (1=not translated, 5=full meaning)