speeg - a multimodal speech- and gesture-based text input solution
DESCRIPTION
Presentation given at AVI 2012, International Working Conference on Advanced Visual Interfaces, Capri Island, Italy, May 2012 ABSTRACT: We present SpeeG, a multimodal speech- and body gesture-basedtext input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft’s Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input. Paper: http://vub.academia.edu/BeatSigner/Papers/1484787/SpeeG_A_Multimodal_Speech-_and_Gesture-based_Text_Input_SolutionTRANSCRIPT
![Page 1: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/1.jpg)
SpeeGA Mul&modal Speech-‐ and
Gesture-‐based Text Input Solu&on
Lode Hoste, Bruno Dumas and Beat Signer
![Page 2: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/2.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel 2
Text-input for set-top boxes
![Page 3: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/3.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel 3
![Page 4: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/4.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel 4
![Page 5: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/5.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel 5
Text-input for set-top boxes
![Page 6: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/6.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
8PenSwiftKey
Speech Dasher SpeeG
EdgeWriter
1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller
6
![Page 7: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/7.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Virtual keyboard
7
![Page 8: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/8.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Kinect 1D keyboard
8
![Page 9: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/9.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Kinect 1D keyboard
9
![Page 10: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/10.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
8PenSwiftKey
Speech Dasher SpeeG
EdgeWriter
1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller
10
![Page 11: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/11.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
8PenSwiftKey
Speech Dasher SpeeG
EdgeWriter
1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller
11
![Page 12: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/12.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
12
Continuous inputJoystick / Gaze / ...Open vocabularyAllows imprecise navigation
![Page 13: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/13.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Dasher
13
![Page 14: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/14.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Controller-freeText inputWithout training
14
KinectCMU SphinxDasher
Used technologies:Goals:
![Page 15: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/15.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
SpeeG
15
![Page 16: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/16.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel 16
![Page 17: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/17.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
SpeeG Architecture
User
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3
17
![Page 18: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/18.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Evaluation
18
SpeeGUser
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3Speech-only
Virtual Keyboard Kinect Keyboard
![Page 19: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/19.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Evaluation
“this was easy for us”“he will allow a rare lie”“did you eat yet”
“my watch fell in the water”“the world is a stage”“peek out the window”
19
7 (male) users: 23-31y
1-3: DARPA’s TIMIT
Performed a quantitative (Words per minute and nr of errors) and qualitative (feedback and preference) evaluation
4-6: MacKenzie and Soukoreff
show 2 about ‘expertise of users’
![Page 20: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/20.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
0
1
2
3
4
5
6
7
8
9
10
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Virtual keyboard
20
6.3 WPM
![Page 21: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/21.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Kinect Keyboard
21
*
1.83 WPM
![Page 22: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/22.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
0
5
10
15
20
25
30
35
40
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Speech-only
22
User
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3
11 WPM
![Page 23: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/23.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
0
1
2
3
4
5
6
7
8
9
10
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 2
User 1
User 3
User 4
User 5
User 6
User 7
SpeeG
23
5.8 WPM
![Page 24: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/24.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
0
1
2
3
4
5
6
7
8
9
10
S1 S2 S3 S4 S5 S6
WPM
Sentence
User 2
User 1
User 3
User 4
User 5
User 6
User 7
SpeeG
24
2.6 7.8 WPM
![Page 25: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/25.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
0
5
10
15
20
25
S1 S2 S3 S4 S5 S6
WPM
Sentence
Controller
Speech only
Kinect only
SpeeG
Mean WPM per sentenceand input device
25
SpeeG
1D Keyboard for XboxVirtual Keyboard for Xbox
Speech-onlyUser
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3
![Page 26: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/26.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel 26
0
1
2
3
4
5
6
7
8
9
10
S1 S2 S3 S4 S5 S6
Mea
n nu
mbe
r of e
rror
s
Sentence
Controller Speech only Kinect only SpeeG
SpeeG
1D Keyboard for XboxVirtual Keyboard for Xbox
Speech-onlyUser
1
GUI (JDasher)
Speech Recogniser(CMU Sphinx 4)
Hand Tracking(Microsoft Kinect and NITE)
5
42
3
Errors per sentenceand input device
![Page 27: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/27.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel 27
![Page 28: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/28.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Future work
28
Other visualisations Smaller gesturesDedicated commands (gesture / voice)
![Page 29: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/29.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel 29
![Page 30: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution](https://reader033.vdocuments.site/reader033/viewer/2022061113/545c524fb0af9f9a2c8b47cb/html5/thumbnails/30.jpg)
SpeeG - Lode HosteVrije Universiteit Brussel
Kinect
- Controller-free text input- Real-time correction- Dasher, zoomable interface - probabilities - alphabetic order - character-level
SpeeGA Mul&modal Speech-‐ and
Gesture-‐ based Text Input Solu&on Lode Hoste, Bruno Dumas, Beat Signer
Speech
- Non-native speakers- Untrained voice recogniser- 6-12 WPM- Perceived fastest- Game-like character- Novice and experts
30Special thanks to Jorn De Baerdenmaeker and Keith Vertaenen