speed or accuracy? a study in evaluation of simultaneous ... · a study in evaluation of...

23
1 Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation Takashi Mieno, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura Nara Institute of Science and Technology (NAIST)

Upload: others

Post on 23-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

1

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Speed or Accuracy?A Study in Evaluation of

Simultaneous Speech TranslationTakashi Mieno, Graham Neubig, Sakriani Sakti,

Tomoki Toda, Satoshi Nakamura

Nara Institute of Scienceand Technology (NAIST)

Page 2: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

2

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Speech Translation

Source: Microsoft Researchhttp://research.microsoft.com/en-us/news/features/translator-052714.aspx

Source: NICThttp://www.nict.go.jp/press/2010/06/29-1.html

Page 3: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

3

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Problems w/ Traditional Systems

本日は私の身近にあるとある難題について話しますが皆さんにも関連のある難題で数年前イギリスに渡った時に…

...

SystemSystem

Page 4: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

4

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Simultaneous Speech Translation

本日は私の身近にあるとある難題について話しますが //皆さんにも関連のある難題で //数年前イギリスに渡った時に…

I want to talk today about adifficult topic that is close to me

SystemSystem

and closer than you think to you

Page 5: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

5

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Problems with Evaluation● Given two systems of different speed and accuracy,

which is better?

Delay

Acc

ura

cy

LongShort

Hig

hLo

w

もっと 手頃な ホテルは ありませんか more cheap hotel is there もっと 手頃な ホテルは ありませんか more cheap hotel is there

Don’t split the sentence

Split the sentence

do you have a more reasonable hotel ? /

more / reasonable / is there a hotel ? /

Page 6: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

6

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Goal of EvaluationA

ccur

acy

Delay

High

LowAcc

urac

y

Delay

Acc

urac

y

Delay

● An evaluation measure considering delay andaccuracy for simultaneous speech translation.

Page 7: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

7

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Proposed Evaluation Method

Page 8: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

8

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

How to Create an Evaluation Function?(Based on Data)

EvaluatedData

EvaluatedData

AccuracyAccuracy

DelayDelay

Training Data

FeaturesMovies with variousdelays and accuracies

Mov

ie d

ata

Mac

hine

Lea

rnin

g

Eva

lua

tion

Fu

nctio

n

Page 9: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

9

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Evaluation Sheet Example

● (Separate window)

Page 10: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

10

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Data Format• Rank-based evaluation

– Perform comparative evaluation of which output is “better”

– Allows for consideration of both speed and accuracy

System A

System B

System C

System D

System E

Output A

Output B

Output C

Output D

Output E

☆Rank

4

1

3

2

5

Inpu

t vid

eo

Ra

nkin

g b

yev

alua

tors

Page 11: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

11

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Learning an Evaluation Function

Weight vector Features useful inevaluation(i.e., delay and accuracy)

Displayedvideo

Define a linear function that takes a video as inputand returns a score

This function can be learned from ranked datausing “learning to rank”

Page 12: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

12

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Learning to Rank

TrainingData

TrainingData

Mov. 1Mov. 1

0.05 0 30.65 10 20.30 3 1

MovieMovie

Accuracy Delay Rank

Mov. 2Mov. 2

0.70 3 10.50 10 30.35 5 2

Mov. 3Mov. 3

0.65 2 10.45 7 20.05 3 3

For all pairs of ratings for each movie, learn the order w/ an SVM

-1-1

+1

Page 13: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

13

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Experiments

Page 14: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

14

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Experimental Setup

• Target video

TED TalksTED Talks

• Gathered dataVideo 20 Types 20-30 Seconds

Delay 7 Types 0,1,2,3,5,7,10Seconds

Subjects 15 Native speakers

Method Ranking 1-3

Modalities Speech + Subtitles

• Translation data(5 varieties)English → Japanese

① Realtime trans. isimportant

② Often used in MTevaluation TranslatorTranslator

Interpreter 1(S Rank)

Interpreter 1(S Rank)

Interpreter 2(A Rank)

Interpreter 2(A Rank)

Syntax-based MTSyntax-based MT

Phrase-based MTPhrase-based MT

Page 15: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

15

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Training/Evaluation

Training data

Ranking datafor 19 movies

Training data

Ranking datafor 19 movies

Linear SVM

Acc. Eval

BLEU+1 (Auto)RIBES (Auto)Adequacy (Man.)

Delay

7 Varieties

Test Data

Ranking dataof a held-out

movie

Test Data

Ranking dataof a held-out

movie

Correct rankingpercentage

Chance Rate = 0.5

(2-fold cross-validation)

Data Format

Features

Eval. Accuracy

Training

Eval

Page 16: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

16

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Evaluation of Evaluation

Acc. Delay+Acc.0

0.10.20.30.40.50.60.70.80.9

1

Text Subtitles

Acc

ura

cy

Acc. Delay+Acc.0

0.10.20.30.40.50.60.70.80.9

1

Speech

NoneBLEU+1RIBESAdeq.

Page 17: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

17

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Q1: Is Delay Important in S2STranslation?

Acc. Delay+Acc.0

0.10.20.30.40.50.60.70.80.9

1

Text Subtitles

Acc

ura

cy

Acc. Delay+Acc.0

0.10.20.30.40.50.60.70.80.9

1

Speech

NoneBLEU+1RIBESAdeq.

A: Yes! In all cases, the scoring function considering delaydid as good or better than just considering accuracy.

Page 18: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

18

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Q2: Does Importance Depend onModality of Presentation?

Acc. Delay+Acc.0

0.10.20.30.40.50.60.70.80.9

1

Text Subtitles

Acc

ura

cy

Acc. Delay+Acc.0

0.10.20.30.40.50.60.70.80.9

1

Speech

NoneBLEU+1RIBESAdeq.

A: Yes! Considering delay was more useful when presenting results through subtitles.Why?: Probably because when watching subtitles, itis possible to hear the original speech.

Avg. +7% Avg. +3%

Page 19: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

19

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Q3: Does this Solve Evaluation?

Acc. Delay+Acc.0

0.10.20.30.40.50.60.70.80.9

1

Text Subtitles

Acc

ura

cy

Acc. Delay+Acc.0

0.10.20.30.40.50.60.70.80.9

1

Speech

NoneBLEU+1RIBESAdeq.

A: No! We still have a large gap between fully automaticeval and human annotation, and ranking accuracy isstill not high.

Page 20: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

20

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Learned Evaluation Functions(for Adequacy)

Speech OutputSubtitle Output5

4

3

2

10 2 4 6 8 10

Delay (s)

5

4

3

2

10 2 4 6 8 10

Delay (s)

5 Le

vel A

ccep

tabi

lity

5 Le

vel A

ccep

tabi

lity

Accuracy Delay

Subtitle Output 1.40 -0.059

Speech Output 1.99 -0.018

1 point of adequacy =

8.0 sec. of delay

28.5 sec. of delay

Page 21: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

21

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Future Challenges

Page 22: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

22

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Future Challenges

● Current conception of delay is artificial

● Can we generalize in some way?

● Non-linear evaluation functions

Page 23: Speed or Accuracy? A Study in Evaluation of Simultaneous ... · A Study in Evaluation of Simultaneous Speech Translation Training/Evaluation Training data Ranking data for 19 movies

23

Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation

Thank You!