monotrans: human-computer collaborative translation chang hu, ben bederson, philip resnik...

Post on 28-Mar-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Monotrans: Human-Computer Collaborative Translation

Chang Hu, Ben Bederson, Philip Resnik

Human-Computer Interaction LabComputational Linguistics and Information Processing Lab

University of Maryland

Crowdsourcing Translation with People Who Speak Only One Language

• Why translation by monolingual people?• How Monotrans works• Research prototype• Preliminary evaluation

Outline

Source: Global Reach, Internet World Stats

Languages on Internet by Population

English52%

Chinese5%

Spanish5%

Japanese9%

the rest29%

2000

Source: Global Reach, Internet World Stats

Languages on Internet by Population

English32%

Chinese21%

Spanish8%

Japanese8%

the rest31%

2005

English52%

Chinese5%

Spanish5%

Japanese9%

the rest29%

2000

Source: Global Reach, Internet World Stats

Languages on Internet by Population

English28%

Chinese23%

Spanish8%

Japanese5%

the rest37%

2009

English52%

Chinese5%

Spanish5%

Japanese9%

the rest29%

2000

English32%

Chinese21%

Spanish8%

Japanese8%

the rest31%

2005

A real-world problem: International Children’s Digital Library

www.childrenslibrary.org

Machine Translation (MT)

Large volume, cheap, fast Unreliable quality

(餐厅 = restaurant, dining hall)

Professional Translators

High quality, but slow and expensive(even for common language pairs)

Translation with the Crowd

Bottle neck: bilingual people

Translation with the Crowd

vs. 75,000 contributors Wikipedia: 800 translators

Translation with the Monolingual Crowd

Quality

Affor

dabi

lity

MachineTranslationMachineTranslation

Professional Bilingual Human ParticipationProfessional Bilingual Human Participation

Amateur Bilingual Human ParticipationAmateur Bilingual Human Participation

MonolingualHumanParticipation

• Why translation by monolingual people?

• How Monotrans works• Research prototype• Preliminary evaluation

Outline

Basic Idea

Original source sentence

Fluent translation

MTInaccurate back translation

Fluent, accurate source sentence MT Et cetera…

Source language speaker

MT Inaccurate translation

Target language speaker

An (Richer) Example

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Mary

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Mary

Sees: In general, it means well, both.MT

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

MT

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

MT

MT

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

MT

MT

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

MT

MT

MT

enrichment

In generalEn général

Get alongNous entendons

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

MT

MT

MT

enrichment

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

Edits into: In general, we get along well.

MT

MT

MT

enrichment

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Sees: En général, nous nous entendons bien.(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

Edits into: In general, we get along well.

MT

MT

MT

MT

enrichment

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.(*)

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Sees: En général, nous nous entendons bien.(lit. In general, we get along well.)

Proposes to stop with current translation

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

Edits into: In general, we get along well.

Agrees to stop with current translation

MT

MT

MT

MT

enrichment

Monotrans Protocol

• Why translation by monolingual people?• How Monotrans works

• Research prototype• Preliminary evaluation

Outline

Web link

Image

Mark OK

Mark unclear

• Why translation by monolingual people?• How Monotrans works• Research prototype

• Preliminary evaluation

Outline

Preliminary Evaluation

• Older version of the UI (same protocol)• Children’s book, Russian to Chinese• 2 Russian speakers and 4 Chinese speakers

formed 4 Pairs*• 1 hour per pair

Results

• 44 sentences (6 pages) worked on• 28 sentences finished (≈ 4 pages)• Overall translation speed: 50 words per hour

professional translator speed: 250 words per hour

Evaluation

none little much most all0

2

4

6

8

10

12

14

6 6

9

43

0

45

12

7

Google TranslateMonotrans

Original meaning translated

# of

sen

tenc

es

Google Translate …

none little much most all0

2

4

6

8

10

12

14

6 6

9

43

Google TranslateMonotrans

Original meaning translated

# of

sen

tenc

es

… Monotrans

None Little Much Most All0

2

4

6

8

10

12

14

6 6

9

43

0

45

12

7

Google TranslateMonotrans

Original meaning translated

# of

sen

tenc

es

Where to from here?• Larger and more formal validation of the protocol• Richer annotations

✓Images✓Web links✓Marking correct spans✓Marking incorrect spansParaphraseWord clouds…??

• Large-scale crowd support (CrowdFlow talk @1:20PM)

• Monolingual translation can help large-scale translation

• Translation with monolingual people is actually feasible

Take-Away Message

Sponsors

Thank You

Q&A

Backup slides

Projected annotation

• Project information from one language to another using word alignments as a bridge

• Illustration of how this has been done for natural language annotation

[Kolak 2005]

Projected annotationTout le monde doit entendre l'histoire de Cendrillon

Everybody has heard the business by CinderellaMT

Tout le monde doit entendre l'histoire de Cendrillon

Everybody has heard the business by Cinderella

MT

Everybody has heard the story about Cinderella

Tout le monde doit entendre l'histoire de Cendrillon

Everybody has heard the business by Cinderella

MT

=> Pilot experiment results: Projected annotations helped improve translation

One of my examples involves rmvng ll th vwls frm th wrds nd shwng tht th rdr cn stll ndrstnd th sntnc.

Tout le monde doit entendre l'histoire de Cendrillon.

MT

Pilot experiment results: Post-editing machine translation output by monolingual people improves translation quality

Everybody has hear story about Cinderella

Everybody has heard the story about Cinderella

Three Types of Errors

I. Detectable and Correctable Error

Everybody has heard the story about Cinderella

Tout le monde doit entendre l'histoire de Cendrillon.

Everybody has hear story about Cinderella

MT MT

Everybody has heard the business by Cinderella

II. Detectable but not Correctable Error

Communication needed

Three Types of Errors

Everybody has heard the story about Cinderella

Tout le monde doit entendre l'histoire de Cendrillon.

Everybody has hear story about Cinderella

MT MT

Everybody has heard the business by Cinderella

II. Detectable but not Correctable Error

Pilot experiment results: Communication through enrichment channel can improve translation

Three Types of Errors

Everybody has heard the story about Cinderella

Tout le monde doit entendre l'histoire de Cendrillon.

Everybody has hear story about Cinderella

MT MT MT

Everybody has heard the business by Cinderella

Everybody loves the story about Cinderella

Need more redundancy

III. Undetectable Error

Add more redundancy, reduce it to type I or type II

Three Types of Errors

Prototype Evaluation

System seems promising

(1=unintelligible, 4=very intelligible) (1=not translated, 5=full meaning)

top related