voice user interface by r.selvi k.priya i-mca. introduction voice portal can be defined as “speech...

27
VOICE USER INTERFACE BY R.SELVI K.PRIYA I-MCA

Upload: rosamund-brown

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

VOICE USER INTERFACE

BYR.SELVIK.PRIYAI-MCA

INTRODUCTION

voice portal can be defined as “speech

enabled access to Web based information”. In other words, a voice portal

provides telephone users with a natural language interface to access and

retrieve Web content.

Voice based internet access uses rapidly advancing

speech recognition technology to give users any time, anywhere

communication and access-the Human Voice- over an office, wireless, or home

phone.

Why Voice?

Natural speech also has an advantage as an input modality, allowing for hands-free and

eyes-free use. With proper design, voice commands can be created

that are easy for a user to remember Speech can be used to create an interface that is easy to use and requires a minimum of user

attention.

VUI (Voice User Interface)

Automated speech recognition (ASR) is the engine that drives today's voice user interface

(VUI) systems. These let customers break the 'menu barrier' and perform

more complex transactions over the phone. The best VUI systems are "speaker independent"-

they understand naturally spoken dialog regardless of the speaker.

ASR systems can be enabled with voice authentication, eliminating the need for PINs and passwords. the 'voice Web' is evolving, where browsers or Wireless Application Protocol (WAP)-enabled

devices display information based on what the user vocally asks

for. Good VUI systems have multiple fallback strategies for

speech recognition failure, such as asking callers to spell names or

breaking a question into parts.

VoiceXML is a markup language specification that defines the key elements

of voice-enabled transactions, which also allows repurposing of this data

across any number of platforms and systems

Next Generation Network Services

It is the next step in the worldIt is the next step in the world communications traditionally enabled by three communications traditionally enabled by three

separate networks: theseparate networks: the public switched telephone network (PSTN) voice public switched telephone network (PSTN) voice

network, the wirelessnetwork, the wireless network and the data network (the Internet). NGNs network and the data network (the Internet). NGNs

converge all three ofconverge all three of these networks-voice, wireless and internet- into a these networks-voice, wireless and internet- into a

common a commoncommon a common packet infrastructure.packet infrastructure.

Three types of services drive NGNs: real-time and Three types of services drive NGNs: real-time and non real-timenon real-time

communication services, content services, and communication services, content services, and transaction services.transaction services.

TheThe services-driven NGN gives service provides services-driven NGN gives service provides

greater control, security,greater control, security, and reliability while reducing their operating costs. and reliability while reducing their operating costs.

Service providersService providers can quickly and costs effectively build new can quickly and costs effectively build new

revenue.revenue.

Next-generation Network Benefits

More choices-Open systems promote multiple-vendor participation in the marketplace. Lower cost-Solutions built on open standards cut development time for solution providers. Innovative Services- Working to standards frees

developers to concentrate on adding value. Lower Risk- Compatibility with products and technologies from many suppliers decreases the risk of ownership.

Deploying NGN Services

The NGN begins with media servers, which provide advanced

media-processing capabilities. The value of a media server is its

flexibility for supporting advanced media-processing services like basic

voice announcements, interactive voice response (IVR), conferencing,

messaging, text-to-speech (TTS), and speech recognition.

Voice Recognition and Voice Recognition and AuthorizationAuthorization

Speaker VerificationSpeaker Verification The speaker-specific characteristics of The speaker-specific characteristics of

speech are due tospeech are due to differences in physiological and behavioral differences in physiological and behavioral

aspects of the speechaspects of the speech production system in humans. The main production system in humans. The main

physiological aspect of thephysiological aspect of the human speech production system is the human speech production system is the

vocal tract shape.vocal tract shape.

The vocal tract modifies the spectral content The vocal tract modifies the spectral content of an acoustic waveof an acoustic wave

as it passes through it, thereby producing as it passes through it, thereby producing speech. Hence, it is common inspeech. Hence, it is common in

speaker verification systems to make use of speaker verification systems to make use of features derived only fromfeatures derived only from

the vocal tract.the vocal tract.

It is possible to represent the vocal-tract in a It is possible to represent the vocal-tract in a parametric form as theparametric form as the

transfer function H(z). In order to estimate the transfer function H(z). In order to estimate the parameters of H(z) fromparameters of H(z) from

the observed speech waveform, it is necessary to the observed speech waveform, it is necessary to assume some form forassume some form for

H(z).H(z). Figure 3 illustrates the differencesFigure 3 illustrates the differences in the models for two speakers saying the same in the models for two speakers saying the same

vowel.vowel. Figure 3:Figure 3:

Speaker Modeling :Speaker Modeling :

The purpose of voicemodeling is to build a model that captures these The purpose of voicemodeling is to build a model that captures these variations in the extracted set of features. There are two types of variations in the extracted set of features. There are two types of models that have been used extensively in speaker verification and models that have been used extensively in speaker verification and speech recognition systems:speech recognition systems:

Stochastic models and template models.Stochastic models and template models. The stochastic model treats the speech production process as a The stochastic model treats the speech production process as a

parametric random process and assumes that the parameters of the parametric random process and assumes that the parameters of the underlying stochastic process can be estimated in a precise, well underlying stochastic process can be estimated in a precise, well defined manner. The template model attempts to modeldefined manner. The template model attempts to model

the speech production process in a non-parametric manner by the speech production process in a non-parametric manner by retaining a number of sequences of feature vectors derived from retaining a number of sequences of feature vectors derived from multiple utterances of the same word by the same person.multiple utterances of the same word by the same person.

A verypopular stochastic model for modeling the A verypopular stochastic model for modeling the speech production process isspeech production process is

the Hidden Markov Model (HMM). HMMs are the Hidden Markov Model (HMM). HMMs are extensions to the conventional Markov models, extensions to the conventional Markov models, wherein the observations are awherein the observations are a

probabilistic function of the state, i.e., the model is probabilistic function of the state, i.e., the model is a doubly embeddeda doubly embedded

stochastic process where the underlying stochastic process where the underlying stochastic process is not directlystochastic process is not directly

observable (it is hidden).observable (it is hidden).

For speech signals, another type of HMM, For speech signals, another type of HMM, called a left-right modelcalled a left-right model

or a Bakis model, is found to be more or a Bakis model, is found to be more useful. A left-right model has theuseful. A left-right model has the

property that as time increases, the state property that as time increases, the state index increases (or stays theindex increases (or stays the

same)-- that is the system states proceed same)-- that is the system states proceed from left to right.from left to right.

Pattern Matching:Pattern Matching:

The pattern matching process involves the The pattern matching process involves the comparison of a givencomparison of a given

set of input feature vectors against the set of input feature vectors against the speaker model for the claimedspeaker model for the claimed

identity and computing a matching score.identity and computing a matching score.

Speech Recognition ChallengesSpeech Recognition Challenges

The basic challenges faced by voice The basic challenges faced by voice applicationapplication

developers included the following:developers included the following: 1. Variability of speech patterns: 1. Variability of speech patterns: in many different ways.in many different ways. Processing power:Processing power: Extracting meaning:Extracting meaning:

Conclusion.

• Speech-enabled Internet portals or voice portals are quickly• becoming the hottest trend in E-commerce – broadening the

access to• internet content to everyone with the most universal

communications• device of all, a telephone.• The potential for voice portals is as wide as the reach of• telephones, which today number 1.3 billion around the world.• A voice portal provides telephone users with a natural• language interface to access and retrieve web content.

• A voice portal can also• provide users access to virtual personal

assistance and web-based unified• messaging applications• Voice portals can also cut operating

expenses by freeing up agent• time and replacing human operators

with an easy-to-use automated• solution.

• The voice portal reference system is a packaged, integrated

• hardware and• Software reference system for building

hardened E-Business and• speech-enabled voice portal solutions.

QUERIESQUERIES

??????????????????????????

THANK YOUTHANK YOU