a voice user interface for an activity-aware wearable computing platform · a voice user interface...

67
A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy May 2, 2007 Master’s Thesis in Computing Science, 20 credits Supervisor at CS-UmU: Dipak Surie Examiner: Per Lindstr¨ om Ume ˚ a University Department of Computing Science SE-901 87 UME˚ A SWEDEN

Upload: tranthien

Post on 11-Apr-2018

233 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

A Voice User Interface for anActivity-Aware Wearable

Computing Platform

Udayasimha Reddy.Theepi Reddy

May 2, 2007Master’s Thesis in Computing Science, 20 credits

Supervisor at CS-UmU: Dipak SurieExaminer: Per Lindstrom

Umea UniversityDepartment of Computing Science

SE-901 87 UMEASWEDEN

Page 2: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy
Page 3: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Abstract

Activity-aware computing is becoming an important research focus within the WearableComputing community. Building such systems involve an explicit training phase. Thisresearch wok investigates the possibility of developing a voice user interface for suchactivity-aware Wearable Computing system. In particular, this thesis work focuses onthe training phase of such system .

A survey of the existing user interface for Wearable Computer presented with an in-depth study about voice user interface. The study was complemented by a prototypicalsystem capable of storing and retrieving activity-related information using voice as theprimary information channel. Even through this research has focused only on design-ing Voice User Interface for the training phase, our approach has shown promise toinclude Voice User Interface for other phases of the activity-aware Wearable Computingplatform.

Page 4: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

ii

Page 5: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Contents

1 Introduction 11.1 Outline : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Problem Description 32.1 Background : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Ageing : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Goal : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Designing User-Interfaces for Wearable computers 53.1 What are Wearable Computers ? . . . . . . . . . . . . . . . . . . . . . . 53.2 Operational Modes of Wearable Computing : . . . . . . . . . . . . . . . 53.3 Issues Involved in Designing User interface for Wearable computers : . . 7

4 Voice User Interface(VUI) 114.1 Existing User interfaces for Wearable Computers : . . . . . . . . . . . . 124.2 Voice User Interface : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Speech Recognition and Speech Synthesis : . . . . . . . . . . . . . . . . 15

4.3.1 Parameters of Speech Recognition : . . . . . . . . . . . . . . . . . 164.3.2 Terms and Concepts : . . . . . . . . . . . . . . . . . . . . . . . . 164.3.3 Basic Structure : . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.3.4 Applications and Advantages of Speech Recognition : . . . . . . 21

4.4 Speech Synthesis : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.4.1 Fundamentals of Speech Synthesis : . . . . . . . . . . . . . . . . 224.4.2 Speech Synthesizer Parameters : . . . . . . . . . . . . . . . . . . 224.4.3 Speech Synthesizer Technologies : . . . . . . . . . . . . . . . . . . 22

4.5 Related Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Implementation and Evaluation of a Prototypical Voice User Interface 295.1 Background : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 System Components : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2.1 Wearable Computer : . . . . . . . . . . . . . . . . . . . . . . . . 305.2.2 Dragon Naturally Speaking(Speech Recognition Software) : . . . 31

iii

Page 6: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

iv CONTENTS

5.2.3 Bluetooth Headset with Microphone : . . . . . . . . . . . . . . . 315.3 System Description : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.4 The Two examples of Voice User Interface : . . . . . . . . . . . . . . . . 33

5.4.1 Case 1: when an Activity/Action is not in the Database : . . . . 335.4.2 Case 2: When an Activity/Action is in the the Database : . . . . 34

5.5 Training Phase : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.6 Flow Chart Description : . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.7 Evaluation of the Two Examples : . . . . . . . . . . . . . . . . . . . . . 37

5.7.1 Experimental Setup : . . . . . . . . . . . . . . . . . . . . . . . . 375.7.2 Quantitative Evaluation of Speech Base Data Retrieval : . . . . . 375.7.3 Optimal Parameters : . . . . . . . . . . . . . . . . . . . . . . . . 385.7.4 Precision : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.7.5 Usability Evaluation of VUI : . . . . . . . . . . . . . . . . . . . 38

6 Limitations 41

7 Conclusions 43

8 Future Work 45

9 Acknowledgements 47

References 49

A Dragon Naturally Speaking User Guide 55A.1 Installation Guide to the User’s . . . . . . . . . . . . . . . . . . . . . . . 55

A.1.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . 55A.1.2 Install Dragon NaturallySpeaking : . . . . . . . . . . . . . . . . . 55A.1.3 Starting to Dictate : . . . . . . . . . . . . . . . . . . . . . . . . . 56A.1.4 Turning on the microphone : . . . . . . . . . . . . . . . . . . . . 56A.1.5 Playing back dictation in a document : . . . . . . . . . . . . . . . 56

Page 7: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

List of Figures

1.1 The Three Eras of Computing. [8] . . . . . . . . . . . . . . . . . . . . . 1

3.1 The Signal Flow between Human and Computer in the Constancy Mode[22]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 The Signal Flow between Human and Computer in the Augmentationmode [22]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 The Signal Flow between Human and Computer in the Mediation mode[22]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.4 Mixture of Augmentation and Mediation Modes. [22]. . . . . . . . . . . 83.5 The Signal Flow Paths of the Various Attributes of Wearable Computing

[22]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1 Wearable Twiddler [20] . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2 Wearable Display [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.3 A Wrist-Worn Linux based Wearable Computer[4] . . . . . . . . . . . . 134.4 An architecture for Voice User Interface [10] . . . . . . . . . . . . . . . . 144.5 Basic Structure of Speech Recognition.[12] . . . . . . . . . . . . . . . . . 174.6 Basic Structure of Speech Synthesizer[6]. . . . . . . . . . . . . . . . . . . 224.7 Architecture of the ISAAC System[28]. . . . . . . . . . . . . . . . . . . . 234.8 Speech Translator System Architecture[28] . . . . . . . . . . . . . . . . . 244.9 Speech Translator System Structure [28] . . . . . . . . . . . . . . . . . . 244.10 TIA-P Wearable Computer[28] . . . . . . . . . . . . . . . . . . . . . . . 254.11 Antranz System Architecture[28] . . . . . . . . . . . . . . . . . . . . . . 264.12 MoCCA System Architecture[28] . . . . . . . . . . . . . . . . . . . . . . 27

5.1 Basic Architecture of the Existing Activity-Aware Computing Platform[30] 305.2 Wearable Computer[5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.3 Dragon Speech Recognition Tool Bar[1] . . . . . . . . . . . . . . . . . . 315.4 Bluetooth Headset with MicroPhone . . . . . . . . . . . . . . . . . . . . 325.5 Voice User Interface Architecture . . . . . . . . . . . . . . . . . . . . . . 325.6 An Activity/Action is not in the DataBase. . . . . . . . . . . . . . . . . 34

v

Page 8: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

vi LIST OF FIGURES

5.7 An Activity/Action is in the DataBase . . . . . . . . . . . . . . . . . . . 355.8 Flowchart Description of the Voice User Interface. . . . . . . . . . . . . 36

Page 9: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

List of Tables

4.1 Parameters of Speech Recognition[25] . . . . . . . . . . . . . . . . . . . 164.2 Speech Recognition Techniques[21] . . . . . . . . . . . . . . . . . . . . . 194.3 Some of the Available Speech Recognition(SR) software and their Sellers[16] 204.4 Some of the Available Speech Recognition(SR) hardware module and

their Manufacturer.[16] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.1 Specifications of Wearable Computer is given in Table[5] . . . . . . . . . 305.2 The List of Activities and Actions used during the Simple Sentance

Evalaution.[2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.3 Precision of the system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.4 The User comments about Controlling the system . . . . . . . . . . . . 395.5 The User comments about Response of system . . . . . . . . . . . . . . 395.6 The User comments about Flexibility of the system . . . . . . . . . . . . 395.7 The User comments about Usability of the system . . . . . . . . . . . . 40

A.1 Speech Recognition Software Program and Languages (SRSP) [16] . . . 57

vii

Page 10: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

viii LIST OF TABLES

Page 11: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Chapter 1

Introduction

According to Mark Weiser, a new era has started within the computing world wherepeople can potentially interact with hundreds of computers embedded in the environ-ment and on the clothes that a user wear. This new era was termed by Mark Weiser asthe era of Ubiquitous Computing or in short also known as Ubicomp [8].

Mark Weiser divides the computer world into three eras as shown in fig 1.1. Firstera is known as the Mainframe era , where many people use to work on a single com-puter to accomplish certain tasks which are assigned to that computer. In this era usersinteract with a single computer. Second era is known as the Desktop era or the PC era,where a single person may work on a single computer to complete the tasks assigned tothat computer. This era maintains a one-to-one interactivity relation between the usersand the computers. Finally, the third era is known as the Ubicomp era, where manycomputers are used by a single user to accomplish his/her tasks [8]. This, era maintainsa one-to-many interactivity relation between the users and the computers.Ubicomp is a new era that will enhance computer usage by making them embedded

Figure 1.1: The Three Eras of Computing. [8]

devices into the objects available in the physical environment, and by improving theirinvisiblity in the usage of these computers. In this era, computers are embedded intoseveral devices like television, fridge, wrist watch, clothes that we wear, etc. The ulti-

1

Page 12: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

2 Chapter 1. Introduction

mate goal of Ubicomp is to make the interactions with the computing devices easier andto enhance the living standards of the people using computing technologies [15]. Thevision is to design interactions with computers easier such that the user does not haveto pay much attention towards the computers[27].

Another main goal of this era is to provide better services to user’s which demands theexistence of computers at any place,at any time and in any context such that there exitsa continuous interaction between the user’s and the computers. This need introducestwo new research areas : wearable computing and mobile computing which investigatesto placing computers on the user’s body (wearable) and provide mobility freedom re-spectively. According to Thad Starner, Wearable Computing pursues an interface idealof a continuously worn, intelligent assistant that augments memory, intellect, creativity,communication and physical senses and abilities.

This thesis work aims to develop a voice-user interface for an activity-aware WearableComputing platform. In the current years, Wearable Computing has become more pop-ular because of advancements in sensing technology, processing power, storage capacityand battery power. This thesis will focus on user interaction with wearable computers.In particular, this research work will focus on storing user-defined activity and actionnames in a database and retrieve them if they are already present in the database usingvoice as input. If the activity or action name is not already stored in the database, thenthe system should store them as new activity or action respectively . This work is partof easyADL project that intends to design an activity aware Wearable Computer com-plemented by simple state change sensors and RFID tags attached to everyday objectscapable of supporting everyday activities performed by patients suffering mild dementiain a home environment. Examples of activities include preparing the table for lunch,preparing rice, doing the dishes, etc. This thesis is presented as follows :

1.1 Outline :

– Chapter 2 : Discusses about the Background and the goals of this Thesis.

– Chapter 3 : Discusses about Designing User-Interfaces for Wearable Computers.

– Chapter 4 : Provides in-depth study about Voice-User Interfaces and SpeechRecognition Techniques.

– Chapter 5 : Discusses about the Implementation and Evaluation of a PrototypicalVoice-User Interface for an Activity-Aware Wearable Computing Platform.

– Chapter 6 : Limitations.

– Chapter 7 : Conclusions.

– Chapter 8 : Future work.

Page 13: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Chapter 2

Problem Description

2.1 Background :

With the continous increase of older citizens, demands on age-related health care isbecoming a significant problem from both human and economical point of view. Easyproject investigates the possibility of using wearable and ubiquitous computing tech-nologies is to support mild dementia patients in completing their activities of dailyliving(ADL)[2]. This Master thesis is a part of the easyADL project which focuses ondeveloping a voice user interface for an activity-aware wearable computing system thatprovides assitance to patients suffering mild dementia. The general description of de-mentia according to the World Health Organization is :”Dementia is a syndrome due to disease of the brain, usually of a chronic or progres-sive nature, in which there is disturbance of multiple higher cortical functions, includingmemory, thinking, orientation, comprehension, calculation, learning capacity, language, and judgement. Consciousness is not clouded. Impairments of cognitive function arecommonly accompanied, and occasionally preceded, by deterioration in emotional con-trol, social behaviour, or motivation. This syndrome occurs in Alzheimer’s disease , incerebrovascular disease, and in other conditions primarily or secondarily affecting thebrain” [3].The persons who are suffering from dementia may face problems to achieve effective-ness in the following abilities : The ability to learn new things, conception of time, andretrieving information from the short-term memory. This causes an increased need touse reminders[3]. Difficulties in finding words and understanding speech. The abilityto interpret visual input which leads to reduced visio-spatial function and an increasedtendency to mix up objects. The patients may also be impatient and less motivated toask for help[24].User interface is a key component for any system that needs human to communicate withit . Considering the earlier discussed issues, it is quite necessary to develop an interfacethat helps individuals who are suffering from dementia to interact with the activity-aware wearable computer with less efforts other members of the easy ADL project haveinterviewed six occupational therapists working with dementia patients. According tothe analysis of those interviews, voice user interfaces seems to be a good modality for thewearable computer to interact with the patients as long as they do not have disabilityto speak or hear.

3

Page 14: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

4 Chapter 2. Problem Description

2.2 Ageing :

It is stated that the risk for a person to be affected by Dementia disease increaseswith age. According to[9], the most common impairments suffered by aged persons arevisual, motor and hearing impairments. So if the dementia patients have some hearingimpairments, we have consider other interfaces like gesture interface to compensate forthe impairment and enable the patient to still interact with the wearble computers. Itis quite important to consider these impairments into the requirements specification.The main goal of this research work is to develop a voice user interface for a wearableactivity-aware Computing platform.

2.3 Goal :

The goal of this thesis project is to design voice user interface for an activity awarewearable computer to be used dementia patients. The following are the sub-goals of thisthesis :

1. To survey current user interface solutions for wearable computers. This includesVoice User Interface, Graphical User Interface, Touch Interface, Tactile Interface,etc. Usability is a main characteristic of user-interface. Usability measures thedegree to which the design of a particular user-interface takes into account thehuman psychology and physiology, in a way to improve the effectiveness, efficiencyand satisfaction of users in using a particular system. In brief, usability describes,how well a particular system can be used by its users with efficiency, effectivenessand satisfaction. The functionalities or features are not always part of the userinterface but are treated as key elements in the usability of a product. Here weDiscuss about existing user interface for wearable computers .

2. To do indepth study of existing voice user interface for wearable computers. Inparticular we discuss about voice user interfaces, types of voice user interfaces,advantages/disadvantages of the voice user interface, etc.

3. To identify and discuss about the components useful in designing voice user inter-faces.

(a) Speech Recognition Component : Here recognition depends on variousparameters that help to categorize various Speech Recognition Systems. Thesection will also provide the details of various parameters on which speechrecognition depends. We discuss about the Speech Recognition techniquesand speech recognition software/hardware components.

(b) Speech Synthesizer Component : In this section we discuss about thespeech synthesizer, types of speech synthesizers, problems and limitations.

4. Implementation of a voice user interface that allows a user to store and retrieveactvity/action related data that are used by the activity-aware Wearable Computerto train the system for user activites/actions. In this section we discuss about theImplementation of a prototypical voice user interface, system components, systemdescription, flowchart description and evaluation of the system.

Page 15: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Chapter 3

Designing User-Interfaces forWearable computers

3.1 What are Wearable Computers ?

Wearable computers are light-weight computers worn by users on their body. Thesecomputers provide new form of human-interaction with computers. They are supposedto be accessible and operable anywhere and at any given instant in time. Accordingto [10], [22], there are three reasons for shifting from desktop computing paradigm towearable computing paradigm for personal computing : 1) Reduction in the size ofcomputers, 2) Increased mobility of computing usage, and 3) Requirement of additionalpersonalization of the computing devices. Wearable computers are designed to be alwayspresent with the user and provide the flexibility to perform activities in the physicalworld while interacting with a computing device.

One important feature of wearable computer is its constancy. This creates a con-tinuous interaction between the computer and the user without turning the wearblecomputer ON/OFF. Moreover, wearable computer is designed with an ability in per-forming multitask functions, i.e a user can use this computer while performing someother activites in the physical world[22].

3.2 Operational Modes of Wearable Computing :

According to[22] there exist three operational modes :

1. Constancy : The system provides continuous interaction with the user once thesystem is switched ON, i.e.the signal flow between the human and the wearablecomputer is continuous without any breaks as shown in fig 3.1[22] . In detail , thewearable computer may have power saving sleep mode, so that it can work for alonger period of time. More importantly, it is also interactionally constant. I.e thedevice’s inputs and outputs are always potentially active and operationally con-stant. Interactionally constant implies operationally constant, but operationallyconstant does not necessarily imply interactionaly constant. For example, let usconsider a mobile phone in a users shirt pocket which is operationally constant,but is not interactionaly constant unless attached to a hands free headset. Wrist

5

Page 16: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

6 Chapter 3. Designing User-Interfaces for Wearable computers

watch can be considered as an example which is both operationally constant andInteractionaly constant. Wrist Watch works continuously without any breakpointsand is worn on the body all the time [29].

Figure 3.1: The Signal Flow between Human and Computer in the Constancy Mode[22].

2. Augmentation :The notion of wearable computing makes its users to involve inmultiple tasks apart from doing only computing as with in the desktop computers.Thus wearable computers should serve to augment the intellect, and the senses ofthe user. Fig 3.2 shows how the signal flows in the augmentation mode [22]. Thewearable computer should augment the intellect or the senses, without disturbingthe users primary task. This diverting implies that background knowledge shouldbe obtained from sensors like wearble camera and microphone,wearable RFIDreaders,etc[29]. The fig 3.4 shows the signal flow for mixture of Augmentation andMediation Modes.

3. Mediation : unlike handheld devices, laptop computers, and PDAs,we may en-capsulate the user’s interaction with in the environment (including computers inthe environment)[29]. There exists two aspects in this encapsulation [22] :

(a) Solitude :It can function as an information filter, allowing the user to blockout material which he/she might not wish to experience. It may block outoffensive advertising or simply a desire to replace existing media with differentmedia[22].

(b) Privacy :Mediation allows us to block or modify information leaving theuser’s encapsulated space. In the same way that ordinary clothing preventsothers from seeing our naked bodies, the wearable computer may, for instanceserve as an intermediary for interacting with untrusted systems such as thirdparty digital anonymous cash cyberwallets[22].

Page 17: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

3.3. Issues Involved in Designing User interface for Wearable computers : 7

Figure 3.2: The Signal Flow between Human and Computer in the Augmentation mode[22].

Figure 3.3: The Signal Flow between Human and Computer in the Mediation mode [22].

Fig 3.3 shows enhanced signal flow more explicitly. It depicts as the computer andhuman as two separate entities with an optional protective shell that enables both ar-gumentation and mediation according to the user’s preferences[29].

3.3 Issues Involved in Designing User interface forWearable computers :

The following issues are important to consider in designing user interface for Wearablecomputers[10] :

1. Mobility : The usage of Wearable Computers mainly take place in mobile envi-ronments resulting in a need to provide anytime and anywhere personal computingsupport. These computers should be operable while performing other activitieslike walking, driving a car, etc.

Page 18: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

8 Chapter 3. Designing User-Interfaces for Wearable computers

Figure 3.4: Mixture of Augmentation and Mediation Modes. [22].

2. Assistance : Wearable Computers are also mainly used to assist user’s in per-forming real-world tasks rather than supporting only for some dedicated tasks inthe virtual world.

3. Unobtrusiveness : Wearable Computers should be designed in such a way thatthe user’s interaction with them should not affect the user’s real world interaction.These computers should allow the user to spend most of his and her attentionin performing the real world task the user is involved in , with the possibility ofhandling the access to information using a Wearable Computer.

Six attributes of Wearable Computing : The following are the six-attributes thatbelongs to the newly by emerging interaction between the user and his/her WearableComputer[22].The fig 3.5 shows the signal flow paths of the various attributes of Wear-able Computing.

1. User attention : As discussed above, wearable computers should not restrictthe user to perform only computing task while interacting with it. Instead, itshould permit the user in performing multiple tasks in the real world apart fromcomputing. This results in an issue of how to get the user’s attention while theuser focuses on performing tasks in the real world? The user interface should bedesigned in such a way that the system is able to capture user’s attention whenrequired.

2. Controllable : The user should be able to have control over the system at any-time.

3. Context-aware : The Wearable Computer should be aware of the environment inwhich the system is used. This could include the location of the user, the resources

Page 19: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

3.3. Issues Involved in Designing User interface for Wearable computers : 9

available in the near vicinity, user’s current activity and other information thatare useful in providing personal computing assistance to the user.

4. Multimodal interaction : The user might prefer to interact with the wear-able computer using several modalities including the sense of vision, audition andtactility depending on the user’s context and his/her preference.

5. Multimodal sensing : Wearable Computers in general include wearable sensorsto capture various kinds of information without explicit input from the user. Theuser-interfaces should consider means to reduce the need for explicit input fromthe user and utilize the information sensed by the wearable sensors.

6. Communication : Wearable Computers should allow for communication withother devices and with other people.

Figure 3.5: The Signal Flow Paths of the Various Attributes of Wearable Computing[22].

Page 20: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

10 Chapter 3. Designing User-Interfaces for Wearable computers

Page 21: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Chapter 4

Voice User Interface(VUI)

Human-Computer Interaction (popularly known as HCI) emerged as a new field of studyin mid 1980s. This study concerns with all aspects that are necessary to achieve betterinteraction between humans and computers [31]. Other names for HCI are Man MachineInteraction (MMI) or Computer-Human Interaction (CHI). The basic definition of HCIis: ”Human Computer Interaction is a discipline concerned with the design, evaluationand implementation of Interactive computing systems for human use and with the studyof major phenomena surrounding them”.[31]

In brief, HCI is a socio-technological research area, whose goal is to provide aneffective communication between users and computers. This goal can be achieved bymaintaining an easy access to computers such that users feel more convenient in success-fully using computers in their activities. HCI draws on computer science, computer andcommunications engineering, graphic design, management, psychology, and sociology asits endeavours in designing computers more usable in performing various tasks for theusers [17]. According to [17], the design in HCI is more complex than any other fields ofengineering. This is because the influenced by diverse areas such as computer graphics ,software engineering , human factors and psychology. Making a complex system simple,by providing an effective interaction to its users is in itself a complex task. The mainconcerns of HCI are as follows :

1. Methodologies and processes for designing interfaces.

2. Methods for implementing interfaces (e.g. software toolkits and libraries; efficientalgorithms).

3. Techniques for evaluating and comparing interfaces.

4. Developing new interfaces and interaction techniques.

5. Developing descriptive and predictive models and theories of interaction.

The long term goal of HCI is to minimize the barrier between human and computersuch that the computer can understand and perform the tasks of users more easily andefficiently.

11

Page 22: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

12 Chapter 4. Voice User Interface(VUI)

4.1 Existing User interfaces for Wearable Computers:

User interface deals with the methods or ways that help users to effectively interact withvarious systems like machines, computers or other complex tools. User-Interface alsohelps users in controlling the system and also in assessing the state of the system.The following are the common types of user interfaces that exist in the field of wearablecomputing :

1. Graphical User Interface : A system gets input from input devices like key-board, mouse, etc. and produces graphical output on monitor. In wearable com-puting, input and output devices are replaced by specially designed devices. Fig 4.1shows a device which is designed to replace a conventional keyboard. This is called

Figure 4.1: Wearable Twiddler [20]

wearable twiddler . This is used as an input device in various wearable computingapplications including applications developed for disabled persons. This wearabletwiddler is designed such that it is operated using single hand. Fig 4.2 shows adisplay which is designed to replace monitors to present graphical output to theuser from the wearable computer.

2. Touch Interface : This is a type of interface where user provides input to thesystem using a touch screen. It is used for various industrial processes, self-servicemachines, etc as discussed in [14]. For example touch interface could be used indisplays worn as a wristwatch as shown in Fig 4.3.

3. Voice User Interface : Voice User Interface is designed to enable computer tointeract with the user in the same way that a person naturally communicates withother people using voice. Issues such as how to change topics while interacting withthe wearable computer,speaking in phrases and feedback are handled in a naturaland intuitive manner using conversational VUI [7]. There are many advantages

Page 23: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

4.1. Existing User interfaces for Wearable Computers : 13

Figure 4.2: Wearable Display [13]

Figure 4.3: A Wrist-Worn Linux based Wearable Computer[4]

of using voice user interface like hands free control, alternate control, extensioncapabilities, task adaptability, consistency of interface and commonality of usage[18]. The main disadvantage with voice user interface is background noise [21].

4. Tactile Interface : This is an interface developed to communicate informationusing the sense of touch which is emerging as a new area of research. The main aimof this type of interfaces are to provide the desired output to users using the senseof touch, which we use effectively in everyday life. This interface is particularlyused in computerized simulators. Vibrotactile interface is a good example fortactile interface which is developed based on multi-layer approach [14].

5. Gesture Interface : Gestures of humans may play a very prominent rolein interacting with computers. The research in ubiquitous computing needs toinvolve computers into the everyday activities of humans where gestures interfacecould play an interesting role. The reason for using human gestures in ubiquitouscomputing is to have a better communication with those computers. For example,

Page 24: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

14 Chapter 4. Voice User Interface(VUI)

if we keep some small devices that enable gesture interaction, then the user caninteract with his/her basic gestures like raising his hand down/up to close/openthe door, increase/decrease the volume of stereo, etc. Input is accepted in theform of hand gestures, head gestures, etc[26].

4.2 Voice User Interface :

Figure 4.4: An architecture for Voice User Interface [10]

Figure 4.4 describes an architecture for Voice-User Interface which takes voice asinput and produces voice as output. The first step to be followed in VUI is to acceptvoice as input to a speech recogniton(SR) system which converts it to a text output.This output text is given as input to the Voice Synthesizer which converts the giveninput text into voice. The output which is obtained from the voice synthesizer canbe used to present voice-based feedback to the user. Voice User Interface (VUI) is anewly emerging user interface that has the potential to play key role in future softwareapplications developed for ubicomp. Voice User Interfaces are classified into two typesbased on their mode of operation :

1. Menu-Based VUI : In this type of VUI, user must utter certain keywords in orderto accomplish certain basic functions. This VUI follows step-by-step procedure inprocessing certain commands or actions. This VUI can be treated as an alternatefor famous touch-tone design which is more systematic and methodical. [7]

Apart from the vast advantages provided by VUI, users may face some problemsin using VUI [23] :

(a) Users may feel difficult to use it,if the design consists of excessive steps inprocessing commands.

(b) Users may go out of track while the command is in process.

Page 25: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

4.3. Speech Recognition and Speech Synthesis : 15

(c) Users may feel difficult to speak in short spurts where the speaking is unnat-ural.

(d) Users may feel difficult in memorizing keywords which are used to processcommands.

Menu-based VUIs are suitable in highly controlled environments where, the systemconsists of limited number of commands.

2. Conversational VUI :This VUI aims to support for larger-vocabulary and spontaneous spoken languagethat is exchanged as part of a fluent dialogue between a user and the computer. Inthis the users might VUI mode, user does not require any experience conversationwith an application just interact with the computer like speaking to a real person.User does not require any experience in using these VUIs. Here users might feelmore convenient and have more satisfaction in using these VUIs when comparedwith menu-based VUIs [23].

The following features of Conversational VUI are beneficial in computing this modewhile designing VUI for wearable computers. [23]:

(a) Easy to use : User can communicate with these systems more convenientlywithout following any hierarchy.

(b) Natural : The communication with these systems is more natural whereusers use everyday language in their communication with these systems.

(c) Efficient : The convenience for users in using these systems increases theefficiency of these systems.

(d) User-Controlled : In these systems, users have an opportunity to controlthe system at any point of time using their voice. Thus, this system eliminatesthe chances of user getting out of track.

(e) Context-Sensitive : This system keeps track of user’s contextual informa-tion and provides invisible help that enables efficient usage of the system bythe users.

The ease in using these types of VUIs helps to build in a complex applications, such asvoice portals. These VUI resulted in of rapid development in text-to-speech research.We can consider new conversational VUIs as social interfaces. Where the user’s responseto a computer is converted to a language understandable by the other social partner andvice-versa.

4.3 Speech Recognition and Speech Synthesis :

Speech Recognition will change our way of interaction with machines where every com-mand to machines is done through voice in a hands free manner. Speech Recognitionallows a user to give input to a system using his/her voice and the system make use ofMicrophone to read the provided input. The basic definition of Speech Recognition is :”Speech Recognition is the process of converting a speech signal to a sequence of words,by means of an algorithm implemented as a computer program.”[25] So, we can evenrefer to speech recognition as speech-to-text. To implement this process, it makes use ofa software component known as speech recognition engine. The primary goal of speech

Page 26: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

16 Chapter 4. Voice User Interface(VUI)

recognition engine is to recognize the speech, process the speech, and translate thespeech into text which an application are understands. [19] The translated text that anapplication gets from recognized speech is handled in two ways [19] :

1. The application can interpret the result of the recognition as command. In thisresearch work we use speech recognition for command/control applications.

2. The application will just return the translated text of recognized speech. In thiscase, the command is termed as dictation application.

4.3.1 Parameters of Speech Recognition :

Speech Recognition depends on various parameters that help to categorize among variousSpeech Recognition Systems. The following table provides details of various parametersthat are important for speech recognition. [25]

Table 4.1: Parameters of Speech Recognition[25]Parameters Range

Speaking Mode Isolated words to continous SpeechSpeaking Style Read speech to spontaneous speech

Enrollment Speaker dependent to speaker IndependentLanguage Model finite state to context sensitive

Transducer Voice cancelling microphone to telephone

The first parameter depends on the speaker’s mode of speaking. This mode rangesfrom isolated words of speech to a continuous speech. The second parameter deals withdifferent styles of speech. There exist various styles in speech like spontaneous speech,speech based on reading from a script, etc. It is stated [25] that Spontaneous speechis difficult to recognize compared to speech read from script as it contains disfluen-cies. Depending on the speaker’s enrollment there exist two types of speech recognitionsystems : Speaker Dependent and Speaker Independent System. In Speaker dependentsystem, speaker will initially train the system before using it. On the other hand ,speaker independent system can identify any speaker’s voice. Vocabulary of speech alsoplays an important role in speech recognition. The difficulty in recognizing speech ismore when vocabularies are large or that are too many words with similar sounding.Language models help in organizing a speech. Perplexity is a measure that helps to findthe difficulty of a particular task by combining vocabulary size with a language model.Perplexity is defined as the geometric mean of the number of words that can follow aword after the language model has been applied. There exits some external parametersthat affect the performance of speech recognition system. These parameters include thecharacteristics of the environmental noise, the placement of the microphone, etc.[25]

4.3.2 Terms and Concepts :

It is quite important to know the terms and concepts that are fundamental to speechrecognition. When developing software application that includes speech recognition [19].

1. Utterances : An utterance is any stream of speech between two periods of silence.In brief, when user says something that can be termed as utterance are sent to thespeech engine for processing.

Page 27: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

4.3. Speech Recognition and Speech Synthesis : 17

2. Pronunciation : Speech engine uses pronunciation to process speech.Pronunciationdeals with the way a word should sound to make it recognizable by the speech en-gine for further processing.

3. Grammar : It is important to specify the words and phrases that the user shoulduse while in interacting with an application using speech recognition component.Hence the grammer becomes important in building those phrases.

4. Accuracy : Accuracy is an important concept that helps to measure the qualityof the application. developed using speech recognition. This concept to evaluatethe application that can recognize user’s speech in terms of utterance.

4.3.3 Basic Structure :

In this section,the basic structure that is followed by a speech recognition system isdiscussed and shown in fig 4.5. As already stated, speech recognition system will handlesthe complex task of translating raw speech input into recognized text. To achieve thistask, it uses the following components in building its structure[19].

1. Audio input

2. Grammar(s)

3. Acoustic Model

4. Recognized text

Figure 4.5: Basic Structure of Speech Recognition.[12]

The first step in speech recognition is to capture audio input. Audio input con-tains both speech data and background noise. The SR system should adjust with theenvironment to successfully differentiate the background noise from speech.

The important step of a speech recognition system is to process the speech and toconvert it to text. To accomplish this task, the system will thoroughly analyze the au-dio input and converts it into best suitable format for further analysis. It makes use ofgrammars that help in building words and phrases that a user can use in interacting withthis system. This system make use of acoustic model that helps in better understandingof the environment and in adjusting to it. The final step is to return the recognized text

Page 28: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

18 Chapter 4. Voice User Interface(VUI)

to find a successful match for what a user said. There exist two types of states whileprocessing an utterance. If the system recognizes the text and performs the actions thenthat state is termed as acceptance state and if the system fails in recognizing the text,it is termed as rejection.Available Speech Recognition Techniques : The most popular technique used inspeech recognitions system are based on Hidden Markov Models. Other techniques avail-able for use in Speech Recognition Systems are (Refer to table 4.2 for more information)

1. Artificial Neural Network (ANN) : ANNs can be used to divides speech units suchas words . This is the way ANNs were initially used on simple speech recognitionproblems.

2. Back Propagation Algorithm (BPA) :Backpropagation is a popular learning tech-nique used for training artificial neural networks.

3. Fast Fourier Transform (FFT) :Is an efficient algorithm to compute the discreteFourier transform (DFT) and its inverse. FFTs are of great importance to a widevariety of applications, from digital signal processing to solving partial differentialequations to algorithms for quickly multiplying large integers.

Page 29: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

4.3. Speech Recognition and Speech Synthesis : 19

Table 4.2: Speech Recognition Techniques[21]

Technique Sub Tech-nique

Relevent Vari-able(s) DataStructure

Input Output

Sound Sampling ALL Anlog sound signal Anlog soundsignal

Digital sound signal

Feature ExtractionDynamicTime Warp-ing(DTW)

Statistical Fea-tures (e.g.LPCcoefficients)

DigitalSound Sam-ples

AcousticSequenceTemplate

HiddenMarkovModels(HMM)

Subword Features(e.g.phonemes)

DigitalSound Sam-ples

Subword Features(e.g.phonemes)

ArtificialNeural Net-works(ANN)

Statistical Features(e.g. LPC coeffi-cients)

DigitalSound Sam-ples

Statistical Fea-tures (e.g.LPCcoefficients)

Training and TestingDynamicTime Warp-ing(DTW)

Reference Model-Database

Acoustic Se-quence Tem-plates

Comparison Score

HiddenMarkovModels(HMM)

Markov Chain SubwordFeatures(e.g.phonemes)

Comparison Score

ArtificialNeural Net-works(ANN)

Neural Networkwith Weights

StatisticalFeatures(e.g.LPCcoefficients)

Positive/ Negative-Output

Page 30: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

20 Chapter 4. Voice User Interface(VUI)

The Table 4.3 shows the available Speech Recognition (SR) software and their ven-dors Table 4.4 shows some of the available Speech Recognition hardware modules.

Table 4.3: Some of the Available Speech Recognition(SR) softwareand their Sellers[16]

Speech Recognition Programmsfor Developer Formatted as aParagraph

Vendors

IBM Via Voice IBM http://www306.ibm.com/software/voice/ viavoice/Dragon Naturally Speaking 9 SDK Nuance http://www.nuance.com/naturallyspeaking

/sdk/Voxit http://www.voxit.se/ (Swedish)VOICEBOX: Speech ProcessingToolbox for MATLAB

http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox /voice-box.html

Java Speech APIa Sun Microsystems , Inc http://java.sun.com/products/javamedia/ speech/index.jsp

The CMU Sphinx Group OpenSource Speech Recognition En-gines

http://cmusphinx.sourceforge.net/html/cmusphinx . php

SpeechStudio Suitec SpeechStudio Inc . http://www.speechstudio.com/

Table 4.4: Some of the Available Speech Recognition(SR) hardwaremodule and their Manufacturer.[16]

SR Module ManufacturerVoice ExtremeTM Module Sensory,Inc.http://www.sensoryinc.com/VR StampTM module Sensory,Inc.http://www.sensoryinc.com/HM2007 - Speech RecognitionChip

HUALON Microelectronic Corp. USA

OKI VRP6679 - Voice Recogni-tion Processor

OKI Semiconductor and OKI Distributors Corporate Head-quarters 785 North Mary Avenue, Sunnyvale, CA, 940862909

Speech Commander - VerbexVoice Systems

Verbex Voice Systems 1090 King Georges Post Rd., Bldg107, Edison NJ08837, USA

Voice Control Systems Voice Control Systems, Inc. 14140Midway Rd., Dallas, Tx.75244, USA http://www.voicecontrol.com/

VCS 2060 Voice Dialer Voice Control Systems 14140 Midway Rd., Dallas, Tx.75225, USA http://www.voicecontrol.com/

Page 31: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

4.4. Speech Synthesis : 21

4.3.4 Applications and Advantages of Speech Recognition :

The applications in which Speech Recognition is mostly used are[12] :

1. Dictation :This is the most common application in which, speechrecognition(SR)plays a prominent role. This application includes medical transcriptions, legal andbusiness dictation, etc. Special vocabularies are used to increase the accuracy ofthe system.

2. Command and Control : Using speech recognition(SR), we can also send somecommands and controls to applications . Such speechrecognition(SR) systems aretermed as command and control systems. For example,speech recognition(SR)system may send a command open microsoft word and the application will auto-matically open microsoft word.

3. Telephony : Some Voice mail systems allow users to utter commands instead ofpressing buttons to send specific tones.

4. Wearables : Suitable way to interact with wearable computers is by using voiceuser interface.

5. Medical/Disabilities:Speech Recognition can be a best sorted method that canbe used effectively by many people with disabilities.

6. Embedded Applications : A new feature is added to some mobile phones thatallows user to call a person directly by just uttering his/her name, there by freeingfrom typing the number. For example, if a person wants to call home, he will justsay call home and the phone will automatically connect a call to the home.

Advantages of Speech Recognition :

Speech Recognition will play a prominent role in many areas because of the followingfeatures it adds to an application . [18]

1. Hands free control

2. Alternate control

3. extension capabilities

4. task adaptability

5. consistency of interface

6. commonality of usage

4.4 Speech Synthesis :

Speech Synthesis is the process of producing sound/speech output through a machine.The machine that is used to produce speech is termed as Speech Synthesizer.

Page 32: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

22 Chapter 4. Voice User Interface(VUI)

4.4.1 Fundamentals of Speech Synthesis :

A block diagram of the steps speech synthesis is shown in Fig 4.6. Speech synthesizerconverts the text into a sequence of speech units by the lexical access routines. Usinglarge speech units such as phrases and sentence can give high quality output speechbut requires more memory. The stored speech units are retrieved and concatenated tooutput the synthesized speech[6]. Fig 4.6 is a block diagram that describes the stepsinvovled Speech Synthesis.

Figure 4.6: Basic Structure of Speech Synthesizer[6].

4.4.2 Speech Synthesizer Parameters :

1. Naturalness : Naturalness of a speech synthesizer defined as how much of theoutput is like the speech of actual person.

2. Intelligibility : The intelligibility of a speech synthesizer deals with how easilythe output can be understood.

4.4.3 Speech Synthesizer Technologies :

Speech Synthesizer Technologies used for the generating synthetic speech waveforms,that are concatenative synthesis and formant synthesis, etc.

1. Concatenative synthesis : Concatenative synthesis based on the of segment ofrecorded speech. This techniques produces more natural sounding through speechsynthesizer. There are three main subtypes of concatenative synthesis.

(a) Unit selection synthesis

(b) Diphone synthesis

(c) Domain-specific synthesis

Page 33: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

4.5. Related Systems 23

2. Formated based synthesis : Formated based synthesis does not use any humanspeech samples at runtime. Instead, the output synthesized speech is created usingan acoustic model . This is also known as rule-based synthesizer.

4.5 Related Systems

ISAAC: Integrated Speech Activated Application Control :ISAAC is a voice activated/speech response system aiming at developing a system withhands free control that can be used by users in controlling software applications on basecomputer (a work station) from anywhere in an office . The user uses a wearable unit(wireless microphone) to interact with base computer via a wireless audio tether.

Figure 4.7: Architecture of the ISAAC System[28].

Fig-4.7 shows the architecture of this system. As shown in figure, users can interactwith various applications like email, multimedia presentation control, web browsing, etc.As already stated, every user makes use of wearable unit to interact with the systemthat transmits analogy speech to a SR system in base computer. The base computer isresponsible in handling all dialogs it got from users, process the dialogs, and performthe appropriate requests of users[28].

The system makes use of infrared transmitter to increase the effective of signals itreceives from wireless wearables. Moreover, the system makes use of infrared extendersthat helps in increasing the range of receiving signals up to 25 meters. Speech Synthesisin this system depends on Centigrams Tru Voice software systems, which produces adesired output speech taking a string of characters[28].

Smart module for speech translation :Smart modules are wearable computers designed for applications that run with the help

Page 34: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

24 Chapter 4. Voice User Interface(VUI)

of speech . The SR system of these modules uses CMUs Sphinx 2 continuous, speakerindependent system. Fig 4.8 and Fig 4.9 will describe the structure of this module[28].Fig 4.8 shows the language translation (LT) module, Speech Recognizer (SR) module

Figure 4.8: Speech Translator System Architecture[28]

which are combined to form a complete stand-alone audio-based interactive dialoguebased system for speech translation. Also, this system is augmented with speech syn-thesis. Fig 4.9 shows the Speech Translator structure which will translate speech from

Figure 4.9: Speech Translator System Structure [28]

English (L1) to some foreign language (L2) and vice versa. Initial step in this moduleis to get an input in the form of voice which is done using microphone. Various filtersare used in this process to recognize the original voice uttered by user eliminating thebackground noise. Next step in this process is to convert the accepted input speech totext using speaker models, dictionaries, and syntactical phonemes. Later after gettingthe desired text from speech, translation module will perform text to text translation.

Page 35: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

4.5. Related Systems 25

The translated text will then display on-screen in edit program where the editing ofwrongly recognized words is done[28]. Finally, Speech Synthesizer will produce the de-sired output in voice after translating the text to speech[28].

TIA-P and TIA-0 :TIA-P is a pen-based system developed by Carnegie Mellons University supportingspeech translation applications . The specifications of this system include a 100 MHz

Figure 4.10: TIA-P Wearable Computer[28]

486 processor, 32 MB DRAM, 2 GB IDE Disk, full-duplex sound chip, and spectrum ra-dio. TIA-P supports the Multilingual Interview System/Language Translation that hasbeen jointly developed by Dragon Systems and the Naval Aerospace and OperationalMedical Institute (NAOMI). TIA-P is shown in fig 4.10. The Dragon Multilingual Inter-view System (MIS) is a keyword-triggered multilingual playback system. This systemtakes speech as input and play backs the recognized speech. This system also synthe-sizes the recognized speech into a foreign language (Croatian). The other,local personcan respond with Yes, No or some pointing gestures. It is stated, that Dragon MISis capable in recognizing 45000 phrases covering various domains like medical exami-nation, mine fields, road checkpoints, and interrogation . Moreover, there exists twochannels of output, one is a speech that plays in English and the other is a speech thatplays in Croatian[28]. TIA-P has been describe with the Dragon speech translationsystem in several other countries. TIA-P has supported speech translation descriptions,like human intelligence data collection, and experimentation with the use of electronicmaintenance manuals for F-16 maintenance according to [28].Adtranz :Adtranz is a mobile pen-based computer that is developed with certain advanced fea-

Page 36: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

26 Chapter 4. Voice User Interface(VUI)

tures. These features include voice transmission, capability in collaboration with remote

Figure 4.11: Antranz System Architecture[28]

experts, a spread spectrum radio, image capture , and support for a VGA head mounteddisplay . The specifications for this system include a 50 MHz 486DX2 processor , 12MB RAM memory , 170 MB hard disk , two PCMCIA Type II slots , one serial port,one parallel port, one infrared-red port, and gray scale 640X480 display. And, the ar-chitecture of this system is shown in fig 4.11[28]. The system make use of a real-timeaudio program for voice communication which is then digitized using a sound card, com-pressed, and send through Wave LAN network as files using TCP/IP protocol[28].

The Mobile Communication and Computing Architecture (MoCCA) :MoCCA is a wearable computer designed supporting a group of graphically distributedmobile field service engineers (FSE). The main aim of this design is to provide a systemthat helps all FSEs in accessing information and also to receive advices from other FSEswhile they stay on customer sites and while they travel between sites. The designedsystem should be of lighter weight having access to several legacy databases that existson several corporate computing systems[28]. Fig 4.12 shows the system architecture ofMoCCA.

– A base unit, about the size of a small laptop computer which is connected to aremote server (located at the home office) wirelessly through a CDPD connection

Page 37: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

4.5. Related Systems 27

Figure 4.12: MoCCA System Architecture[28]

.

– A cellular phone associated with the base unit and tethered to it through a PCM-CIA port. The cellular phone communicates wirelessly with the local cellularprovider and thus has access to the telephone network.

– The FSE holds a smaller satellite unit which is connected to the base unit. Thesatellite unit shows the contents of the base unit screen and its keyboard inputlinks directly to the base unit keyboard.

– The FSE wears a microphone and headset that are wirelessly linked to the cellularphone.

The MoCCA system makes use of voice bulletin board to build up an effective voicecommunication system that helps FSEs in interacting with their fellow workers. Thisvoice bulletin board handles a bunch of voce clips that it got from many FSEs those whoencounter some problems during their job. And, each problem in this list will have abunch of voice responses which they got from others FSEs as a solution to that problem. Fig 4.12 will show the control structure of voice bulletin board. It is a menu-baseddiagram providing a bunch of solutions to a user logged on to MoCCA. Each branch isa decision point for the user. Every FSE will listen to this list of voice response , whichcame as solutions to his problem and will select an appropriate solution by pressing thedesired digit on phone[28].

Page 38: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

28 Chapter 4. Voice User Interface(VUI)

Page 39: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Chapter 5

Implementation andEvaluation of a PrototypicalVoice User Interface

5.1 Background :

One goal of this research work is to develop a voice userinterface for a wearable activity-aware computing platform. User interface is a key component for any system thatneeds human to communicate with it. Normally, users use Windows,Icons, Menus andPointing (WIMP) devices to interact with Desktop computers. User experiences seemto be well supported by well designed WIMP applications. WIMP offer a number ofadvantages as compared to other interaction paradigms likew command languages. Themain advantages of WIMP are its ease in learning and supporting most type of users.WIMP has some drawbacks as well, for example a user must interact with the computerwithout the possibility of performing other activities in the physical world. Voice userinterface enables users to interact with wearable computers simultaneously while doingother activities like jogging, walking, etc[30].

WIMP has three assumptions

1. The human actor can dedicate all attention to the interaction with virtual envi-ronment provided by the computer.

2. The real world environment in which the interaction takes place is always theSimilar.

3. The input and output devices are few and Similar.

Fig 5.1 shows the architecture of a wearable activity aware operating system. Which in-cludes activity recognition component[11] and egocentric interaction manager[30]. Thissystem makes use of egocentric interaction sensor pool to collect human activity relatedsensor data which is used by activity recognition component, and the activity-centeredsupport applications.Buletooth headset was used where the microphone belongs to thesensor pool. The bluetooth head phones are part of the actuator pool providing voicefeedback to the user.

29

Page 40: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

30Chapter 5. Implementation and Evaluation of a Prototypical Voice User

Interface

Figure 5.1: Basic Architecture of the Existing Activity-Aware Computing Platform[30]

5.2 System Components :

The Four important system components are speech recognition software , wearable com-puter , bluetooth headset and speech synthesis software .

5.2.1 Wearable Computer :

Figure 5.2 shows the wearable computer that we have intended to use in the future.At present we have built the system on Loptop that can worn as a back-pack and hasa similar specification to sony type UVGN-UX71. Wearable computer could exploitthe commonality in components to reduce cost, weight and redundancy and to improveconnectivity and services. This is one reason taking a wearable computing approachwith in the easyADL project that can provide activity-centred. An explicit trainingphase required for activity-centered support. We would like to integrate the VUI duringthe activity-centered support training phase.

The specifications of the wearable computer is shown in table 5.1

Table 5.1: Specifications of Wearable Computer is given in Table[5]Model Sony Type U VGN-UX71CPU CoreSoloU1400 (1.20Ghz)RAM 1GB

Hard Disk Drive 30GBUltra ATA100Operating System Windows Vista Home Premuim

Security Fingerprint Sensor

Page 41: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

5.2. System Components : 31

Figure 5.2: Wearable Computer[5]

5.2.2 Dragon Naturally Speaking(Speech Recognition Software):

Dragon Naturally Speaking is a popular voice recognition software. that allows usersto send email messages, dictating documents,open websites and close the windows ap-plications through users voice[1] . Figure 5.3 shows the dragon bar. This system uses

Figure 5.3: Dragon Speech Recognition Tool Bar[1]

micro phone as an input device getting various commands from user. This softwarealso includes a speech synthesizer. Which will convert the text on screen to voice andgives out that voice through speakers. Generally, Dragon Speech Recognition systemworks with good accuracy. But this software might have difficulties face problems inrecognizing some words/phrases. In such as case the user may have to (99 percentageof accuracy is possible after training) train the system to improve recognition accuracy[1]. See Appendix for further information.

5.2.3 Bluetooth Headset with Microphone :

Bluetooth is a computing and telecommunications industry specification that describeshow mobile phones, computers and PDAs can be wirelessy connected in an adhoc-manner. Adhoc networking motivated us to use bluetooth technology , so that severalinput and output devices can be easily investigated with the activity-aware wearablecomputer.

Page 42: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

32Chapter 5. Implementation and Evaluation of a Prototypical Voice User

Interface

Figure 5.4: Bluetooth Headset with MicroPhone

5.3 System Description :

During the era of desktop computing Graphical User Interface (GUI), Keyboard, Key-pad, Joystick etc. are use as tools in communicating with those applications. But , toincrease the effectiveness of interaction with these applications, various new technolo-gies have been introduced into the field of Human Computer Interaction. One such newtechnology is voice userinterface that uses speech for effective interaction with theseapplications. voice user interface makes interaction with these applications quite easiersince, we use voice recognition component. The user can interact with his/her wearblecomputers using voice. An architecture for Voice User Interface (VUI) is shown in Fig

Figure 5.5: Voice User Interface Architecture .

Page 43: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

5.4. The Two examples of Voice User Interface : 33

5.5. The users use a Bluetooth headset with microphone to communicate with the wear-able system. Speech recognition software is used to recognize both voice commands andto recognize the voice based information to be stored in the database commands arethat outside this list of possible commands are considered as noise. Bluetooth headset isanother important component to play error messages and confirmation messages to theuser. This has nowever not beon included as of get. At present, feedback is presentedto the user in the form of text output.As this project is in its initial stage, Simple VUI with simple voice commands like start,stop, yes, no, etc. Voice based information like washing dishes, Cook rice, etc needsto be converted to text format further processing. In the future the capability of thissystem will be improved to handle Natural language processing to allow for complexsentences. A database is developed to insert and retrieve activity-aware informationusing voice. ActiveXServer was used to interface Dragon speech recognition softwarewith other components of the voice user interface system.

ActXServer is used to create and exchange COM objects, The following syntax isused h = actxserver(’progid’) where, h represents the servers default interface and progidmeans Programmatic Identifier of the component to be instantiated in the server. Thissystem makes use of Dragon speech recognition software for processing voice signal totext. The recognized activity is printed in a word document and activity read throughthe matlab program to recognize commands and stored into a database. The user canmake use of the Dragon speech recognition software in controlling his dictation. It isnecessary to stop interacting with the system for 10seconds before start dictating nextcommand allowing the system to process the given command. Here the users can useCommands like it ´´Go to Sleep´´ or ´´Stop listening´´ to interrupt his interaction withthe system and uses commands like listen to me or it ´´Wake up´´ used to start his/herinteraction with the system. The system will accept various actions(Activity related) ut-tered by the user as input and will compare it with the existing ´´actions´´(Activity re-lated information) in the database. If the system finds the given activity in the databasethen it will produce an output ´´Action Already in the Database´´As Name of the ac-tion through voice and text and produces an output. The system does not find a matchwith activities in database. For example, then it will produce an output´´New actionidentified´´through displaying to this text massage to the user.

5.4 The Two examples of Voice User Interface :

This section describes the two cases of the system that were personaly conducted at theDepartment of Computers science at the Umea University. In these cases, the systemwill use the activites/actions which are stated in the table 5.2

5.4.1 Case 1: when an Activity/Action is not in the Database :

The system will accept various actions uttered by the user as input and will comparewith the existing activities/actions in the database. If the system finds the given activityin the database then it will produce an output ´´Action already in the Database as Nameof the action´´ through voice or text. if it doesn’t find a match with activities/actions indatabase,it will produce an output ´´New action identified´´(through displaying to thistext massage to the user and also give same output through playing to this voice messageto the user by speech synthesizer). for example if the user given an action ´´clean thefloor´´ as input to the system. The system will compare existing activities/actions in

Page 44: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

34Chapter 5. Implementation and Evaluation of a Prototypical Voice User

Interface

Figure 5.6: An Activity/Action is not in the DataBase.

the database as shown in fig 5.6 like ´´clean the stove´´,´´clean the rack ´´ etc. if thesystem doesn’t find a match with a given activity/action in database, then the resultwill be as shown in fig 5.6 like ´´New action identified´´. After that the system willupdate this activity/action in the database.

5.4.2 Case 2: When an Activity/Action is in the the Database :

For example the user given an action ´´clean the floor´´ as input to the system , thesystem will compare with existing activities/actions in the database as shown in fig 5.8like ´´clean the stove´´,´´clean the rack ´´ etc . if the system find a match with a givenactivity/action in the database , then the result will be as shown in fig 5.8 like ´´ActionAlready in the Database as clean the floor´´.

5.5 Training Phase :

As the system is developed based on speaker dependent feature, it is important totrain the system before he/she uses it. In this phase, users should store his/her speechpatterns using speaker dependent feature. We train the dragon recogntion software andthen start using user voice user interface .

Page 45: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

5.6. Flow Chart Description : 35

Figure 5.7: An Activity/Action is in the DataBase .

5.6 Flow Chart Description :

The flowchart shown in fig 5.9 describes briefly about the system developed in thisproject and also about user’s interaction with this system. This system uses Wakeupcommand to interact with the system. And to implement Wakeup command, user willuse his voice and say Wakeup. The next step in this system asks user for a decisionin inserting a new activity where, user should say YES or NO. If user says NO thensystem will automatically stops and if user says YES then system proceeds further inprocessing activities. The activities received by matlab from word document will becompared with various sentences in Database using EditDistance function that findsdistance between strings. Using EditDistance the system will generate a sequence. Inthe next step, the user will insert an activity that he is wishing to perform using hisvoice like preparing rice , preparing cake , washing dishes etc which the system willrecord the new activity and asks user for various actions in the new activity. Then userwill insert various actions into the system. For example, the actions to be stored in theactivity preparing rice are get the rice bag, pour water into the cooker, pour rice intothe cooker, add salt, put the rice bag back, etc in a Buffer. The system will informuser about recording these actions and asks the user for permission. The user shouldallow the system in recording these new actions by saying start recording with voice.The system will start recording once it receives this command and informs user about

Page 46: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

36Chapter 5. Implementation and Evaluation of a Prototypical Voice User

Interface

Figure 5.8: Flowchart Description of the Voice User Interface.

stopping recording once it completes recording. Then, the user should respond with hisvoice saying stop recording that makes system to start stop recoding process. Now, thesystem asks user about saving the recorded action to which user should respond withYES or NO. YES will save the recorded action into the database and NO will makesystem proceed further where it asks the user to record a new action. Here also,the usershould respond with YES or NO . YES will start the recording new action and NO willmake it proceed further where the system asks user in recording new activity to whichuser responds accordingly.

Initially this project is implemented using GUI in its implementation and is replaceby VUI in its future versions. In VUI, a person can perform his activities and evencan record his activities without seeking any support. But in GUI based system, userseeks support from another person in recording his activities .Interaction with systemusing VUI gives user a feeling of interacting with another person. The system willaccept various actions uttered by the user as input and will compare with the existingactivities in the database . If the system finds the given activity in the database thenit will produce an output ´´Action Already in the Database As Name of the action´´through voice and text. If it doesn’t find a match with activities in database. Then

Page 47: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

5.7. Evaluation of the Two Examples : 37

it will produce an output ´´New action identified´´ through displaying to this textmassage to the user and also give same output through playing to this voice message tothe user by speech synthesizer.

5.7 Evaluation of the Two Examples :

This chapter will evaluate the implementation of a prototypical voice userinterface dis-cussed in cheapter four . Evaluation part is divided into two stages .

– Evalution of simple sentences

– Evalution of complex sentences

5.7.1 Experimental Setup :

A Experimental setup is an important part of evaluting the effectiveness of our sys-tem.We followed a two stages approach evaluting the system. In the first stage , wemake use of simple words to test our system.Here the user will speak about simplesentances like Yes, No, Start, Stop, etc. In our second approach, used both complexeasyADL activities. Here the user speak almost any activies like preparing rice forhaving lunch (or) doing the dishes after having a fika, etc.

5.7.2 Quantitative Evaluation of Speech Base Data Retrieval :

The experiments were performed by four subjects from the Computers science Depart-ment of Umea University . This voice user interface for this usability evaluation of VUIfor activity aware wearable computer. A example comprises of a few related activi-ties performed in some sequence. For example, we used Preparing coffee, Cleaning thekitchen, Having breakfast, etc . Some activities were common for several scenarios likethe activity of doing the dishes which is common to both the lunch example and thebreakfast example . The subjects were allowed to perform the activities in their ownway. Here we show only Preliminary Quantitative evalution further evalution we willdo in future.

Table 5.2: The List of Activities and Actions used during the Sim-ple Sentance Evalaution.[2]

List of 10 activities Actions within individual activitiesPreparing rice Get the rice bag, Pour rice into the cooker, Pour water into

the cooker,Add salt, Put back the rice bagPreparing fried vegetables Get some vegetables, Cut those vegetables,Fry those veg-

etables, Add spices, Place the chopper in the sinkPreparing cake Get the baking plate, Add some eggs, Add some milk, Add

some sugar,Add some cake powder, Place the baking platein the oven

Preparing coffee Take some coffee powder, Pour water into the coffee ma-chine,Get some cups, Pour some coffee into the cups

Preparing breakfast Toast some bread slices, Boil some eggs, Prepare somejuice,Prepare the cereals

Page 48: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

38Chapter 5. Implementation and Evaluation of a Prototypical Voice User

Interface

Doing the dishes Clean the dishes, Dry the dishes on the rack, Wash thehands

Having lunch Have the main meal, Have the dessert, Drink coffee,Placethe used dishes in the sink

Having breakfast Have the main meal, Place the used dishes in the sinkPreparing the table (lunch) Place the table mats, Get some cutlery, Get some plates,Get

some glasses, Get the food, Place some napkinsCleaning the kitchen Clean the table, Clean the stove, Clean the rack, Clean the

floor

When a subject begins performing his/her activity.There are two cases When anActivity/Action is in the the Database. Seconde case When an Activity/Action is notin the the Database.Cases when the subjects say same activity/action exactely what isin the database or not in the database .

5.7.3 Optimal Parameters :

The ´´Matching Distance´´determined for individual information based on the recog-nition accuracy. We have trained and tested the system on recorded data, using variouscombination of the parameter. precision is equal to the truepositive diveded by sum offalse positive and true positive.

We Generally used Matching Distance2 and Matching Distance3 and Matching Dis-tance5 to determine the recognition accuracy of the system . So finally we choose bestoptimal parameter for this system is Matching Distance3 .

5.7.4 Precision :

The precision of a value describes the number of digits that are used to express thatvalue. Here the table shows the precision of the system.

Table 5.3: Precision of the systemActivity name Truepositive False positive Precision %

Preparing fired vegitable 7 3 70%Preparing Rice 8 2 80%Preparing cake 9 1 90%Preparing coffee 9 1 90%

Preparing breakfast 8 2 80%Doing the dishes 6 4 60 %Having breakfast 9 1 90%

Having lunch 8 2 80%Preparing Table for lunch 5 5 50 %

Cleaning kitch 9 1 90%

5.7.5 Usability Evaluation of VUI :

To evaluate the learnability factor, we have examined the user comments about system.From this examinati we have found that sixty percent of the users have failed to manage

Page 49: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

5.7. Evaluation of the Two Examples : 39

the system without training, but the rest of the users have succeded. The following table5.4 give overview of the user comments for the ease of use the system. We only showperliminary evaluation but further evaluations will be conducted in the future. The

Table 5.4: The User comments about Controlling the systemVery Easy 12 %

Very very easy 40 %Hard 40 %

Very Heard 8 %

system efficiency evaluation is an important factor in the usability test. It gives us theinformation about problems and limitation of the system. While efficiency is inferredby measuring the time required to complete a task . To examine the efficiency we askquestions to user - ´´Here the System responding the user’s voice or Not´´ and ´´youfind the delay time´´. The following table 5.5 gives brief idea about this evaluation.

Table 5.5: The User comments about Response of systemDid Nothing 50 %

Something else 10 %Exactly right thing 40 %

Very Heard 8 %

Another important usability factor is to know the flexibility of the system from theuser’s point of view. We have evaluated the flexibility of our system by considering theuser comments.To know the flexibility of the system dependes on the answer to thisquestion ´´Are the commands flexible enough to operate system´´?. Our main focuswas to find out whether the activities and actions are flexible enough to help the user ,are the commands sufficient or do we need to add more activities and actions. The table5.6 presents the result of the user comments about the system and shows 58 that percentof the users believe that the activities/actions are sufficient to control the system. 20percent of the users believe that the Activities/Actions, which already exist, are notsufficient -need to add more, like ´´help the system´´ and 10 of the user’s say that theyneed training to control the system.

Table 5.6: The User comments about Flexibility of the systemsufficent activites to controll 58 %

Don’t Know 12 %Training Need 10 %Not Sufficient 20 %

The most important usability factor which is also difficult to derive from user answersis User satisfaction. To investigate this factor we have considered these questions 1.)Howdo feel to interact with system through voice? 2).Would you prefer to control thesystem with speech instead of joystick or keyboard?. The following table 5.7 presentsthe details about user satisfaction .We have found that 38 percent finds it fun to talkto the system, 22 percent feel - it is uncommon, 17 percent of the user have found itsFunny, 9 percent say that it is ´´Ok´´ and other user comments that sometime the

Page 50: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

40Chapter 5. Implementation and Evaluation of a Prototypical Voice User

Interface

system doesn’t recognize the activities and actions, they need training to control thesystem and it is hard to know what to say. It shows that 75 percent of the users like touse ´´speech´´ to control system, 15 percent prefer Joystick/Keyboard, 6 percent sayit depends on situation and 4 percent say, they don’t know. After evaluating all thecomments from the user’s, we have found that majority of the user’s have given positiveanswers about system, so we can justify that our system satisfies our user.

Table 5.7: The User comments about Usability of the systemGreat 38 %OK 9 %

Uncommon 17 %doen’t recognize 20 %Need Training 16 %

Page 51: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Chapter 6

Limitations

Eliminating the background noise and recognizing the exact words of the user is themain problem that affects the effectiveness of Speech Recognition system. The SpeechRecognition System used in our system will only recognize ADL Activities which arementioned in table 5.2. Speech is used as an input only for some specialized and limitedtasks. Moreover, humans will show little difference in the pronunciation of similarlysounding words or phrases. So, the chances in making mistakes while recognizing speechby Speech Recognition component are considerably high. If you are in a social contextyou might disturb other people, while communicating with the system. Using voice userinterface we need to remember the commands and activities/actions.

41

Page 52: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

42 Chapter 6. Limitations

Page 53: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Chapter 7

Conclusions

In this thesis we have presented a voice user interface for an activity-aware wearablecomputing platform. During the survey and analysis of research that leads us to differentareas.

First of all survey of different user interfaces available for wearable computers gives usacquaintance that voice user interface is desired interface for my project due to its handsfree control, alternate control, extension capabilities, consistency and commonality.

Secondly, the in-depth study of voice user interface and available systems for wearablecomputers. Voice User Interfaces are classified into two types based on their modes ofoperation. Those are conversational voice user interface and menu based voice userinterfaces. Conversational voice user interface provide us to realistic approach towardsour goal as it provide or achieve transparency for wearable computer.

Finally,we identified and discussed the components used in voice user interface arespeech recognition and speech synthesizer. We found the Dragon Naturally speakingsoftware is good for a voice user interface for wearable computers that meet our desiredgoal. Here important target was to give a suitable device to the user for interaction withsystem. The best option we found was a blue tooth headset.There are some limitationsin speech synthesizer that leads us towards further investigation.

43

Page 54: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

44 Chapter 7. Conclusions

Page 55: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Chapter 8

Future Work

Our future work will focus on introducing more complex Activities and Actions to theEasyADL System. Compliment voice interface with visual display, like display in eyeglasses and wrist watches. Natural language processing also our important future workin the wearable computer domain.

45

Page 56: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

46 Chapter 8. Future Work

Page 57: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Chapter 9

Acknowledgements

I would like to thank my supervisor Dipak Surie and Dr.Thomas Pederson, for their help,support, valuable idea and suggestion, which helped me to make this thesis. Withoutthem, this thesis would never exist. I would also like to thank My Student coordinatorDr.Per Lindstom for giving me Admission and Moral support, and special thanks to myparents for financial and mental support. My special thanks to Jean paul Kouma andFabien Lagriffoul for their Suggestions in the programming.

47

Page 58: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

48 Chapter 9. Acknowledgements

Page 59: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

49

§

Page 60: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

50 Chapter 9. Acknowledgements

Page 61: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

References

[1] Dragon natural speaking 9 user’s guide by nuance communication interna-tional(visited 29-02-2007).

[2] Easy adl independentlife despite dementia. www.cs.umu.se/research/easyadl(visited2006-8-12).

[3] The (icd-10)clssification of mental and behavioural disorder. World Health Or-ganization. http://www.who.int/classifications/icd/en/bluebook.pdf(visited 12-02-2007).

[4] Seven-ounce wrist pc runs linux. http://linuxdevices.com/news/NS5812502455.html(visited14-08-2006).

[5] Specifications about wearble computer(visited 04-02-2007). www.geekstuff4u.com.

[6] Speech synthesis. http://www.telecom.tuc.gr/ntsourak/tutorial synthesis.html vis-ited(March 2007-11-2006).

[7] The Benfits of a Conversational voice user interface in voice por-tal,http://www.iec.org/online/tutorials/cvui/. The International EngineeringConsortium(2007)(visited 2006-12-15).

[8] A.Schmidt. Interaction with the ubiquitous computer. Keynote at MobileHCI-2003,(2003).

[9] D.Hawthorn. Possible Implications of Aging for Interface Designer Interacting withComputers,Elsevier Science B.V. DOI 10.1016/S0953-5438(99)00021-1. 12:507–528,(2000).

[10] M. Drugge. Interaction aspects of wearable computing for human communica-tion,ISSN 1402-1544 / ISRN LTU-DT–06/60–SE. (2006).

[11] D.Surie, T.Pederson, F.Lagriffoul, L.E.Janlert, and S.Daniel. Activity recognitionusing an egocentric perspective of everday objects,UMINF.01. Technical report,Umea University, Department of Computing Science,SWEDEN.

[12] D.A.L.Kie Fa. Topics in speech recognition. Technical report, Delft University ofTechnology,Mekelweg 4, 2628 CD Delft,Netherlands(visited 03-04-2007), (2006).

[13] C.Parker G.McAtamney. An examination of the effects of a wearable display oninformal face-to-face communication. Conference on Human Factors in ComputingSystems,Proceedings of the SIGCHI conference on Human Factors in computingsystems ISBN:1-59593-372-7, pages 45 – 54, ( 2006 ).

51

Page 62: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

52 REFERENCES

[14] M. Hafez. Tactile interfaces: Technologies, applications and challenges,volume 23,issue 4, pages: 267-272,ISSN:0178-2789,publisher springer-verlag new york, inc .Technical report, The Visual Computer: International Journal of Computer Graph-ics, Fontenay aux Roses,France 92265, (2007).

[15] I.Burbey. Ubiquitous internet computing,ISBN:0-13-954785-1,publisher prentice-hall,inc. World Wide Web: beyond the basics, Pages: 463 - 475, (1998).

[16] I.E.Zohar. General survey of speech recognition programshttp://www.tau.ac.il/itamarez/sr/survey.htm(visited 31-12-2006). (2004).

[17] J.D.Foley. Fundamentals of Interactive Computer Graphics,Publisher:Addison-Wesley ,ISBN-10: 0201144689. (1996).

[18] J.Payette. Advanced human-computer interface and voice processing applicationsin space,pages: 416 - 420,ISBN:1-55860-357-3. Technical report, Canadian SpaceAgency, St-Hubert, Quebec J3Y 8Y9., (1994). Human Language Technology Con-ference,Proceedings of the workshop on Human Language Technology.

[19] K.A.Kemble. An introduction to speech recognition. (2001). IEEE Industry Stan-dards and Technology Organization(IEEE-ISTO).

[20] K.Lyons and T.Starner. Augmenting cognition with wearable computers. Technicalreport, College of Computing and GVU Center,Georgia Institute of Technology,Atlanta,USA.(visited 20-01-2007). HCI International 2005.

[21] K.Shafkat. Speech recognition for robotic control,UMNAD-606. Technical report,Umea University,Department of Computing Science,sweden., (1997).

[22] S. Mann. Definition of wearble computer,ICWC-98, fairfax va . The First Interna-tional Conference on Wearable Computing, (1998).

[23] M.F.McTear. Spoken dialogue technology: Enabling the conversational user inter-face. ACM Computing SurveysCSUR, Volume 34:Pages: 90 –169, (2002 ).

[24] R.Astrand. Den lilla boken om demens(Little Book about dementia),Erik Sparremedical. (2001).

[25] R.Cole, J.Mariani, J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue. Survey ofthe State of the Art in Human Language Technology,ISBN-13: 9780521592772.Cambridge University Press ., Cambridge University, (1998 ).

[26] R.Eriksson and F.Sjogren. Enhancing the user experience with new interactiontechniques for interactive television,UMNAD-672. Technical report, University ofUmea, Department of Computing Science,Sweden SE-90187, (2007).

[27] R.Gold. Survey on ubiquitous computing and augmented reality,ISBN:0-89791-601-8. Proceedings of the 20th annual International Conference on Computer Graphicsand Interactive Techniques, pages 393 – 394., (1993).

[28] A. Smailagic. An evaluation of audio-centric CMU wearable computers,mobilenetworks and applications,ISSN:1383-469X ,. Kluwer Academic Publishers, Volume:4:59–68, (1999).

Page 63: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

REFERENCES 53

[29] S.Mann. Wearable computing toward humanistic intelligence,ISSN: 1541-1672.Vol:16:Page(s): 10– 15, (2001).

[30] T.Pederson, D. Surie, L.Fabien, L.E.Janlert, and S. Daniel. Towards an activity-aware wearable computing platform based on an egocentric interaction model. In-ternational Conference INTERACT(2007), (1993). Brazil(submitted).

[31] Y.Rogers, H.Sharp, and J.preece. Human Computer Interac-tion,ISBN:0201627698,Pages:772. John Wiley and Sons, 2002.

Page 64: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

54 REFERENCES

Page 65: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

Appendix A

Dragon Naturally SpeakingUser Guide

A.1 Installation Guide to the User’s

A.1.1 System Requirements

This section will describes about Dragon NaturallySpeaking, your system must meetthe following requirements[1] .

– Intel Pentium with 1GHz processor greater.

– 1GB RAM (512 MB free minimum).

– Minimum of 650 MB of free hard disk space for a Custom. Installation where youinstall only the program files and 1 set of speech files .

– Windows 2000 or Windows 2000 Advanced Server.

A.1.2 Install Dragon NaturallySpeaking :

– Insert the first Dragon NaturallySpeaking CD into your CD-ROM drive[1].

– Provide your customer information, including the serial number supplied with yourDragon NaturallySpeaking installation.

– Choose your installation directory. If there are no previous versions of DragonNaturallySpeaking on your system, the default directory is : C : Program Files/Nuance / NaturallySpeaking

– You can enable the Dragon NaturallySpeaking QuickStart option. By enablingQuickStart, Dragon NaturallySpeaking launches at system startup time and addsthe Dragon NaturallySpeaking QuickStart icon to the Windows task bar.

– If you are upgrading from Version 7 or 8, you can select to upgrade your usersas part of the Version 9 installation by checking ”Upgrade existing speech files towork with this installation.”

55

Page 66: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

56 Chapter A. Dragon Naturally Speaking User Guide

– Continue following the on-screen instructions. The setup program will install thefiles for Dragon NaturallySpeaking to your computer .

– When prompted, make sure to register your copy of Dragon NaturallySpeaking.Once registered, Nuance can notify you of product updates and other offers.

A.1.3 Starting to Dictate :

If Dragon NaturallySpeaking is not already running, you can start it according to the[1].

– Double-clicking the Dragon NaturallySpeaking icon .

– Selecting Programs¿Dragon NaturallySpeaking from the Start menu.

– Right-clicking the QuickStart taskbar tray icon and selecting Start Dragon Natu-rallySpeaking, if the QuickStart option is enabled. For more information on usingthe QuickStart option.

A.1.4 Turning on the microphone :

Before you can dictate, you need to turn on the microphone according to the[1] . Youturn on the microphone by

– click the microphone icon on the DragonBar. You can click this icon again to turnit off.

– Pressing the plus key on the numeric keypad to turn the microphone on, and thenpress it again to turn the microphone off.

– clicking the microphone icon in the Windows task bar.

A.1.5 Playing back dictation in a document :

To play back dictation , do any of the following according to the [1]

– Select the text you want to play back , and say ”Play That Back.”

– Click the Start Playback button on the Playback toolbar. Move the insertion pointto the text you want to play back and say any of the following commands.

Page 67: A Voice User Interface for an Activity-Aware Wearable Computing Platform · A Voice User Interface for an Activity-Aware Wearable Computing Platform Udayasimha Reddy.Theepi Reddy

A.1. Installation Guide to the User’s 57

The table shows speech recognition software programmes available in many lan-guages.

Table A.1: Speech Recognition Software Program and Languages(SRSP) [16]

Language DNS pre-ferred Ver-sion 7 , 8and 9

Microsoft SpeechRecognition

ViaVoice Other Appli-cations

Arabic NO NO Last versionwas Mil-lenium 7,but it hasdisappeared

Catalan NO NO NO Was avail-able fromPhilipFreeSpeech2000 onlyWindowsonly upto 98, butdiscontinued

Chinese NO YES NODutch YES NO No longer

mentionedonScanSoftWebsite

English YES - US,UK,Australian,SE Asian (allin one pack-age)

US US, UK(used tobe soldseparately)

French YES NO No longermentionedonScanSoftWebsite

German YES NO YESItalian YES NO YESSpanish YES NO No longer

mentionedonScanSoftWebsite

Swedish NO NO NO AvailablefromVoxit(Ver:5.2)