(발제) voice typing: a new speech interaction model for dictation on touchscreen devices +chi 2012...

38
)! ' ! $"$ " $$ %#" &# % %" 이동진 ( #%"

Upload: snuuxlab

Post on 22-Nov-2014

330 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

이동진

Page 2: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices

Session: Check This Out: Recommender System CHI 2012, May 5-10, 2012, Austin, Texas, USA

Anuj Kumar, Tim Paek, Bongshin Lee

Microsoft Research One Microsoft Way

Human-Computer Interaction Institute, Carnegie Mellon University

Page 3: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer
Page 4: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer
Page 5: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer
Page 6: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer
Page 7: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Author Keywords Speech recognition; dictation; multimodal; error correction; Speech user interfaces

Page 8: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

INTRODUCTION

Page 9: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer
Page 10: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

The speech interaction model follows a voice recorder metaphor where users must first formulate what they want to say in utterances, and then produce them, as they would with a voice recorder. These utterances do not get transcribed until users have finished speaking (as indicated by a pause or via push-to-talk), at which point the entire output appears at once after a few seconds of delay.

스피치 인터랙션 모델은 보이스 레코더를 메타포로 가져다쓴다. 사용자가 할말을 미리 생각해야 하고, 보이스 레코더를 쓰듯이 말을 쏟아낸다. 이러한 발화는 사용자가 말을 끝내기 전에는 글로 옮겨지지 않는다. (일시 정지나 말하기 버튼이 눌려진 것 처럼) 얼마간의 딜레이 후 전체 아웃풋이 한번에 나타난다.

Page 11: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Problem : <<Error Collection>> Users typically spend only 25-30% of their time actually dictating. The rest of the time is spent on identifying and editing transcription errors.

Page 12: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

그래서 본 논문에서는, Voice Typing이라는 새로운 스피치 인터랙션 모델을 소개합니다.

Page 13: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

In this paper, we introduce Voice Typing, a new speech interaction model where users’ utterances are transcribed as they produce them to enable real-time error identification.

Page 14: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

The metaphor for Voice Typing is that of a secretary typing for you as you monitor and quickly edit the text using the touchscreen.

Page 15: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer
Page 16: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

VOICE TYPING - Motivation

Page 17: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Voice Typing is motivated by both cognitive and technical considerations.

Page 18: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

From a cognitive standpoint,

HCI researchers have long known that providing real-time feedback for user actions not only facilitates learning of the user interface but also leads to greater satisfaction.

Page 19: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

From a technical standpoint,

Voice Typing is motivated by the observation that dictation errors frequently stem from incorrect segmentations.

Page 20: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Consider the classic example of speech recognition failure:

Page 21: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

“It’s hard to wreck a nice beach” “It’s hard to recognize speech.”

Page 22: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

VOICE TYPING - Prototype

Page 23: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

First, instead of displaying the recognition result all at once, we decided to display each word one by one, left to right, as if a secretary had just typed the text. Second, knowing the speed at which the recognizer could return results and keep up with user utterances, we trained users to speak in chunks of 2-4 words.

Page 24: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

The second is correcting the errors in a fast and efficient manner on touchscreen devices. To achieve this goal, we leveraged a marking menu that provides multiple ways of editing text.

Page 25: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

(a) the Voice Typing marking menu, (b) list of alternate candidates for a selected word, including the word with capitalized first letter, (c) re-speak mode with volume indicator, and (d) list of punctuation choices.

Error correction interface

Page 26: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

USER STUDY

Page 27: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

In order to assess the correction efficacy and usability of Voice Typing in comparison to traditional dictation, we conducted a controlled experiment in which participants engaged in an email composition task. For the email content, participants were provided with a structure they could fill out themselves. For example, “Write an email to your friend Michelle recommending a restaurant you like. Suggest a plate she should order and why she will like it.” Because dictation entails spontaneous language generation, we chose this task to reflect how end users might actually use Voice Typing.

Page 28: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Experimental Design

Page 29: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

We conducted a 2x2 within-subjects factorial design experiment with two independent variables: Speech Interaction Model (Dictation vs. Voice Typing) and Error Correction Method (Marking Menu vs. Regular).

Page 30: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

In Regular Error Correction, all of the Marking Menu options were made available to participants as follows: If users tapped a word, the interface would display an n-best list of word alternates. If they performed press-and-hold on the word, that invoked the re-speak or spelling option. For deleting words, we provided “Backspace” and “Delete” buttons at the bottom of the text area. Placing the cursor between words, users could delete the word to the left using “Backspace” and the word to the right using “Delete.” Users could also insert text anywhere the cursor was located by performing press-and-hold on an empty area.

Page 31: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

We collected both quantitative and qualitative measures. With respect to quantitative measures, we measured rate of correction and the types of corrections made. With respect to the qualitative measures, we utilized the NASA task load index (NASA-TLX).

Page 32: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Participants We recruited 24 participants (12 males and 12 females), All of whom were native English speakers. Participants came from a wide variety of occupational backgrounds (e.g., finance, car mechanics, student, housewife, etc.). The age of the participants ranged from 20 to 50 years old (M = 35.13) with roughly equal numbers of participants in each decade.

Page 33: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

User correction error rate for all four conditions. Blue data points are for traditional Dictation, red data points for Voice Typing. In terms of UCER, a repeated measures ANOVA yielded a significant main effect for the Speech Interaction Model(F(1,46) = 4.15, p < 0.05), where Voice Typing (M = 0.10, SD = 0.01) was significantly lower than Dictation (M = 0.14, SD = 0.01).

Quantitative

Page 34: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Average number of substitutions, insertions, deletions made by the user in order to correct an email for each of the four experimental conditions, and the number of words left uncorrected. For substitutions, We obtained a significant main effect for Error Correction Method(F(1,46) = 5.9, p < 0.05), where Marking Menu had significantly higher substitutions (M = 7.24, SD = 0.5) than Regular (M = 5.50, SD = 0.5). For insertions, deletions, and identified but uncorrected errors, we did not find any significant effects.

Page 35: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Frequency distribution of the system response times across emails in Voice Typing condition. For most emails the delays were within one standard deviation (0.3 seconds) of the average (1.27 seconds), and all the emails were with two standard deviations from the average.

Page 36: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Qualitative

Voice Typing vs. Dictation A repeated measure ANOVA on the NASA TLX data for mental demand yielded a significant main effect for Speech Interaction Model (F(1, 46) = 4.47, p = 0.03), where Voice Typing (M = 30.90, SD = 19.16) had lower mental demand than Dictation (M = 39.48, SD = 20.85). Furthermore, we found a significant main effect Speech Interaction Model on effort and frustration (F(1,46) = 4.03, p = 0.04 and F(1,46) = 4.02, p = .05, respectively). In both cases, Voice Typing displayed significantly lower effort and frustration than Dictation.

Page 37: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer

Marking Menu vs. Regular On the NASA-TLX data, we found a significant main effect for Error Correction Method on physical demand (F(1,46) = 4.43, p = 0.03), where Marking Menu (M = 19.50, SD = 20.56) had significant lower physical demand than Regular (M = 28.13, SD = 19.44).

Page 38: (발제) Voice Typing: A New Speech Interaction Model for Dictation on Touchscreen Devices +CHI 2012 -Anuj Kumar /이동진 x2012 summer