Download - Speech Recognitions
-
7/29/2019 Speech Recognitions
1/70
Speech Recognition
-
7/29/2019 Speech Recognitions
2/70
Definition
Speech recognition is the process of convertingan acoustic signal, captured by a microphone ora telephone, to a set of words.
The recognised words can be an end inthemselves, as for applications such ascommands & control, data entry, and documentpreparation.
They can also serve as the input to furtherlinguistic processing in order to achieve speechunderstanding
-
7/29/2019 Speech Recognitions
3/70
Speech Processing
Signal processing: Convert the audio wave into a sequence of feature vectors
Speech recognition: Decode the sequence of feature vectors into a sequence
of words Semantic interpretation: Determine the meaning of the recognized words
Dialog Management: Correct errors and help get the task done
Response Generation What words to use to maximize user understanding
Speech synthesis (Text to Speech): Generate synthetic speech from a marked-up word string
-
7/29/2019 Speech Recognitions
4/70
Dialog Management
Goal: determine what to accomplish in responseto user utterances, e.g.:
Answer user question
Solicit further information Confirm/Clarify user utterance
Notify invalid query
Notify invalid query and suggest alternative
Interface between user/language processingcomponents and system knowledge base
-
7/29/2019 Speech Recognitions
5/70
What you can do with Speech
Recognition
Transcription
dictation, information retrieval
Command and control data entry, device control, navigation, callrouting
Information access
airline schedules, stock quotes, directoryassistance
Problem solving
travel planning, logistics
-
7/29/2019 Speech Recognitions
6/70
Transcription and Dictation
Transcription is transforming a stream ofhuman speech into computer-readableform
Medical reports, court proceedings, notes
Indexing (e.g., broadcasts)
Dictation is the interactive composition of
text Report, correspondence, etc.
-
7/29/2019 Speech Recognitions
7/70
Speech recognition andunderstanding
Sphinx system
speaker-independent
continuous speech
large vocabulary
ATIS system
air travel information retrieval
context management
-
7/29/2019 Speech Recognitions
8/70
Speech Recognition and CallCentres
Automate services, lower payroll
Shorten time on hold
Shorten agent and client call time
Reduce fraud
Improve customer service
-
7/29/2019 Speech Recognitions
9/70
Applications related to Speech
Recognition
Speech Recognition
Figure out what a person is saying.
Speaker Verification
Authenticate that a person is who she/heclaims to be.
Limited speech patterns
Speaker Identification
Assigns an identity to the voice of anunknown person.
Arbitrary speech patterns
-
7/29/2019 Speech Recognitions
10/70
Many kinds of Speech RecognitionSystems
Speech recognition systems can becharacterised by many parameters.
An isolated-word (Discrete) speechrecognition system requires that thespeaker pauses briefly between words,whereas a continuous speech recognition
system does not.
-
7/29/2019 Speech Recognitions
11/70
Spontaneous V Scripted
Spontaneous, speech containsdisfluencies, periods of pause and restart,and is much more difficult to recognise
than speech read from script.
-
7/29/2019 Speech Recognitions
12/70
Enrolment
Some systems require speaker enrolment,a user must provide samples of his or herspeech before using them, whereas other
systems are said to be speaker-independent, in that no enrolment isnecessary.
-
7/29/2019 Speech Recognitions
13/70
Large V small vocabularies
Some of the other parameters depend on thespecific task. Recognition is generally moredifficult when vocabularies are large with manysimilar-sounding words.
When speech is produced in a sequence ofwords, language models or artificial grammarsare used to restrict the combination of words.
The simplest language model can be specifiedas a finite-state network, where the permissiblewords following each word are given explicitly.
-
7/29/2019 Speech Recognitions
14/70
Perplexity
One popular measure of the difficulty ofthe task, combining the vocabulary sizeand the language model, is perplexity.
Loosely defined as the geometric mean ofthe number of words that can follow aword after the language model has been
applied., (Zue, Cole, and Ward, 1995).
-
7/29/2019 Speech Recognitions
15/70
Finally, some external parameters canaffect speech recognition systemperformance. These include the
characteristics of the environmental noiseand the type and the placement of themicrophone.
-
7/29/2019 Speech Recognitions
16/70
Properties of RecognizersSummary
Speaker Independent vs. Speaker Dependent Large Vocabulary (2K-200K words) vs.
Limited Vocabulary (2-200)
Continuous vs. Discrete
Speech Recognition vs. Speech Verification Real Time vs. multiples of real time
-
7/29/2019 Speech Recognitions
17/70
Continued
Spontaneous Speech vs. Read Speech Noisy Environment vs. Quiet Environment High Resolution Microphone vs. Telephone vs.
Cellphone Push-and-hold vs. push-to-talk vs. always-
listening Adapt to speaker vs. non-adaptive Low vs. High Latency With online incremental results vs. final results Dialog Management
-
7/29/2019 Speech Recognitions
18/70
Features That Distinguish
Products & Applications
Words, phrases, and grammar
Models of the speakers
Speech flow
Vocabulary: How many words
How you add new words
GrammarsBranching Factor (Perplexity)
Available languages
-
7/29/2019 Speech Recognitions
19/70
Systems are also defined by Users
Different Kinds of Users
One time vs. Frequent users
Homogeneity Technically sophisticated
Based on Users have different speaker
models
-
7/29/2019 Speech Recognitions
20/70
Speaker Models
Speaker Dependent
Speaker Independent
Speaker Adaptive
-
7/29/2019 Speech Recognitions
21/70
Automate services, lower
payroll
Shorten time on hold
Shorten agent and client call
timeReduce fraud
Improve customer service
Sample Market: Call Centers
-
7/29/2019 Speech Recognitions
22/70
A TIMELINE OF SPEECHRECOGNITION
1890s Alexander Graham Bell discovers Phone whiletrying to develop speech recognition system for deafpeople.
1936AT&T's Bell Labs produced the first electronic
speech synthesizer called the Voder (Dudley, Riesz andWatkins).
This machine was demonstrated in the 1939 World Fairsby experts that used a keyboard and foot pedals to playthe machine and emit speech.
1969John Pierce of Bell Labs said automatic speechrecognition will not be a reality for several decadesbecause it requires artificial intelligence.
-
7/29/2019 Speech Recognitions
23/70
Early 70s
Early 1970'sThe Hidden Markov Modeling(HMM) approach to speech recognition wasinvented by Lenny Baum of Princeton Universityand shared with several ARPA (AdvancedResearch Projects Agency) contractors includingIBM.
HMM is a complex mathematical pattern-matching strategy that eventually was adoptedby all the leading speech recognition companiesincluding Dragon Systems, IBM, Philips, AT&Tand others.
-
7/29/2019 Speech Recognitions
24/70
70+
1971DARPA (Defense Advanced Research Projects Agency)established the Speech Understanding Research (SUR) program todevelop a computer system that could understand continuousspeech.
Lawrence Roberts, who initiated the program, spent $3 million peryear of government funds for 5 years. Major SUR project groups
were established at CMU, SRI, MIT's Lincoln Laboratory, SystemsDevelopment Corporation (SDC), and Bolt, Beranek, and Newman(BBN). It was the largest speech recognition project ever.
1978The popular toy "Speak and Spell" by Texas Instruments wasintroduced. Speak and Spell used a speech chip which led to huge
strides in development of more human-like digital synthesis sound.
-
7/29/2019 Speech Recognitions
25/70
80+
1982Covox founded. Company brought digital sound (viaThe Voice Master, Sound Master and The SpeechThing) to the Commodore 64, Atari 400/800, and finallyto the IBM PC in the mid 80s.
1982Dragon Systems was founded in 1982 by speechindustry pioneers Drs. Jim and Janet Baker. DragonSystems is well known for its long history of speech andlanguage technology innovations and its large patentportfolio.
1984SpeechWorks, the leading provider of over-the-telephone automated speech recognition (ASR)solutions, was founded.
-
7/29/2019 Speech Recognitions
26/70
90s
1993 Covox sells its products out to Creative Labs, Inc. 1995 Dragon released discrete word dictation-level speech
recognition software. It was the first time dictation speechrecognition technology was available to consumers. IBM andKurzweil followed a few months later.
1996 Charles Schwab is the first company to devote resourcestowards developing up a speech recognition IVR system withNuance. The program, Voice Broker, allows for up to 360simultaneous customers to call in and get quotes on stock andoptions... it handles up to 50,000 requests each day. The systemwas found to be 95% accurate and set the stage for othercompanies such as Sears, Roebuck and Co., and United Parcel
Service of America Inc., and E*Trade Securities to follow in theirfootsteps. 1996 BellSouth launches the world's first voice portal, called Val
and later Info By Voice.
-
7/29/2019 Speech Recognitions
27/70
95+
1997 Dragon introduced "Naturally Speaking", the first"continuous speech" dictation software available(meaning you no longer need to pause between wordsfor the computer to understand what you're saying).
1998 Lernout & Hauspie bought Kurzweil. Microsoftinvested $45 million in Lernout & Hauspie to form apartnership that will eventually allow Microsoft to usetheir speech recognition technology in their systems.
1999 Microsoft acquired Entropic, giving Microsoft
access to what was known as the "most accurate speechrecognition system" in the Old VCR!
-
7/29/2019 Speech Recognitions
28/70
2000
2000 Lernout & Hauspie acquired Dragon Systemsfor approximately $460 million.
2000 TellMe introduces first world-wide voiceportal.
2000 NetBytel launched the world's first voiceenabler, which includes an on-line orderingapplication with real-time Internet integration forOffice Depot.
-
7/29/2019 Speech Recognitions
29/70
2000s
2001ScanSoft Closes Acquisition of Lernout& Hauspie Speech and Language Assets.
2003ScanSoft Ships Dragon
NaturallySpeaking 7 Medical, LowersHealthcare Costs through Highly AccurateSpeech Recognition.
2003ScanSoft closes deal to distribute andsupport IBM ViaVoice Desktop Products.
-
7/29/2019 Speech Recognitions
30/70
Signal Variability
Speech recognition is a difficult problem, largely becauseof the many sources of variability associated with thesignal.
The acoustic realisations of phonemes, the recognitionsystems smallest sound units of which words are
composed, are highly dependent on the context in whichthey appear.
These phonetic variables are exemplified by the acousticdifferences of the phoneme 't/'in two, true, and butter inEnglish.
At word boundaries, contextual variations can be quitedramatic, and devo andare sound like devandare inItalian.
-
7/29/2019 Speech Recognitions
31/70
More
Acoustic variability can result from changes inthe environment as well as in the position andcharacteristics of the transducer.
Within-speaker variability can result fromchanges in the speaker's physical and emotionalstate, speaking rate, or voice quality.
Differences in socio-linguistic background,dialect, and vocal tract size and shape cancontribute to across-speaker variability.
-
7/29/2019 Speech Recognitions
32/70
What is a speech recognitionsystem?
Speech recognition is generally used as ahuman computer interface for other software.When it functions in this role, three primary tasks
need be performed. Pre-processing, the conversion of spoken input
into a form the recogniser can process. Recognition, the identification of what has been
said. Communication, to send the recognised input tothe application that requested it.
-
7/29/2019 Speech Recognitions
33/70
How is pre-processing performed
To understand how the first of thesefunctions is performed, we must examine,
Articulation, the production of the sound.
Acoustics, the stream of the speech itself.
What characterises the ability tounderstand spoke input, Auditoryperception.
-
7/29/2019 Speech Recognitions
34/70
Articulation
The science of articulation is concerned with howphonemes are produced. The focus of articulation is onthe vocal apparatus of the throat, mouth and nose wherethe sounds are produced.
The phonemes themselves need to be classified, thesystem most often used by speech recognition is theARPABET, (Rabiner and Juang, 1993) The ARPABETwas created in the 1970s by and for contractors workingon speech processing for the Advanced Research
Projects Agency of the U.S. department of defence.
-
7/29/2019 Speech Recognitions
35/70
ARPABET
Like most phoneme classifications, theARPABET separates consonants from vowels.
Consonants are characterised by a total orpartial blockage of the vocal tract.
Vowels are characterised by strong harmonicpatterns and relatively free passage of air
through the vocal tract. Semi-Vowels, such as the y in you, fall between
consonants and vowels.
-
7/29/2019 Speech Recognitions
36/70
Consonant Classifcation
Consonant classification uses the,
Point of articulation.
Manner of articulation. Presence or absence of voicing.
-
7/29/2019 Speech Recognitions
37/70
Acoustics
Articulation provides valuable informationabout how speech sounds are produced,but a speech recognition system cannot
analyse movements of the mouth. Instead, the data source for speech
recognition is the stream of speech itself.
This is an analogue signal, a soundstream, and a continuous flow of soundwaves and silence.
-
7/29/2019 Speech Recognitions
38/70
Important Features (Acoustics)
Four important features of the acoustic analysisof speech are, (Carter, 1984)
Frequency, the number of vibrations per second
a sound produces Amplitude, the loudness of the sound.
Harmonic structure added to the fundamentalfrequency of a sound are other frequencies thatcontribute to its quality or timbre.
Resonance.
-
7/29/2019 Speech Recognitions
39/70
Auditory perception, hearingspeech.
"Phonemes tend to be abstractions that are implicitlydefined by the pronunciation of the words in thelanguage. In particular, the acoustic realisation of aphoneme may heavily depend on the acoustic context in
which it occurs. This effect is usually called co-articulation", (Ney, 1994).
The way a phoneme is pronounced can be affected byits position in a word, neighbouring phonemes and eventhe word's position in a sentence. This affect is called the
co-articulation effect. The variability in the speech signal caused by co-
articulation and other sources make speech analysisvery difficult.
-
7/29/2019 Speech Recognitions
40/70
Human Hearing
The human ear can detect frequencies from 20Hz to20,000Hz but it is most sensitive in the critical frequencyrange, 1000Hz to 6000Hz, (Ghitza, 1994).
Recent Research has uncovered the fact that humansdo not process individual frequencies.
Instead, we hear groups of frequencies, such as formatpatterns, as cohesive units and we are capable ofdistinguishing them from surrounding sound patterns,(Carrell and Opie, 1992) .
This capability, called auditory object formation, or
auditory image formation, helps explain how humans candiscern the speech of individual people at cocktail partiesand separate a voice from noise over a poor telephonechannel, (Markowitz, 1995).
-
7/29/2019 Speech Recognitions
41/70
Pre-processing Speech
Like all sounds, speech is an analoguewaveform. In order for a Recognition System toperform action on speech, it must berepresented in a digital manner.
All noise patterns silences and co-articulationeffects must be captured.
This is accomplished by digital signalprocessing. The way the analogue speech isprocessed is one of the most complex elementsof a Speech Recognition system.
-
7/29/2019 Speech Recognitions
42/70
Recognition Accuracy
To achieve high recognition accuracy thespeech representation process should,(Markowitz, 1995),
Include all critical data.
Remove Redundancies.
Remove Noise and Distortion.
Avoid introducing new distortions.
-
7/29/2019 Speech Recognitions
43/70
Signal Representation
In statistically based automatic speechrecognition, the speech waveform is sampled ata rate between 6.6 kHz and 20 kHz andprocessed to produce a new representation as asequence of vectors containing values of whatare generally called parameters.
The vectors typically comprise between 10 and20 parameters, and are usually computed every10 or 20 milliseconds.
-
7/29/2019 Speech Recognitions
44/70
Parameter Values
These parameter values are then used insucceeding stages in the estimation of theprobability that the portion of waveform justanalysed corresponds to a particular phonetic
event that occurs in the phone-sized or whole-word reference unit being hypothesised.
In practice, the representation and theprobability estimation interact strongly: what one
person sees as part of the representationanother may see as part of the probabilityestimation process.
-
7/29/2019 Speech Recognitions
45/70
Emotional State
Representations aim to preserve the informationneeded to determine the phonetic identity of aportion of speech while being as impervious as
possible to factors such as speaker differences,effects introduced by communications channels,and paralinguistic factors such as the emotionalstate of the speaker.
They also aim to be as compact as possible.
-
7/29/2019 Speech Recognitions
46/70
Representations used in current speechrecognisers, concentrate primarily on propertiesof the speech signal attributable to the shape ofthe vocal tract rather than to the excitation,
whether generated by a vocal-tract constrictionor by the larynx.
Representations are sensitive to whether thevocal folds are vibrating or not (thevoiced/unvoiced distinction), but try to ignoreeffects due to variations in their frequency ofvibration.
F t I t i S h
-
7/29/2019 Speech Recognitions
47/70
Future Improvements in SpeechRepresentation.
The vast majority of major commercial andexperimental systems use representations akinto those described here.
However, in striving to develop betterrepresentations, wave-let transforms(Daubechies, 1990) are being explored, and
neural network methods are being used toprovide non-linear operations on log spectralrepresentations.
-
7/29/2019 Speech Recognitions
48/70
Work continues on representations more closelyreflecting auditory properties (Greenberg, 1988) and onrepresentations reconstructing articulatory gestures fromthe speech signal (Schroeter & Sondhi, 1994).
It is attractive because it holds out the promise of a smallset of smoothly varying parameters that could deal in asimple and principled way with the interactions that occurbetween neighbouring phonemes and with the effects of
differences in speaking rate and of carefulness ofenunciation.
-
7/29/2019 Speech Recognitions
49/70
The ultimate challenge is to match the superiorperformance of human listeners over automaticrecognisers.
This superiority is especially marked when there is littlematerial to allow adaptation to the voice of the current
speaker, and when the acoustic conditions are difficult. The fact that it persists even when nonsense words are
used shows that it exists at least partly at theacoustic/phonetic level and cannot be explained purelyby superior language modelling in the brain.
It confirms that there is still much to be done indeveloping better representations of the speech signal,(Rabiner and Schafer, 1978; Hunt, 1993).
-
7/29/2019 Speech Recognitions
50/70
Signal Recognition Technologies
Signal Recognition methodologies fall intoto four categories, most system will applyone or more in the conversion process.
-
7/29/2019 Speech Recognitions
51/70
Template Matching,
Template match is the oldest and least effective method.It is a form of pattern recognition.
It was the dominant technology in the 1950's and 1960's. Each word or phrase in an application is stored as a
template. The user input is also arranged into templates at the
word level and the best match with a system template isfound.
Although Template matching is currently in decline asthe basic approach to recognition, it has been adaptedfor use in word spotting applications. It also remains theprimary technology applied to speaker verification,(Moore, 1982).
-
7/29/2019 Speech Recognitions
52/70
Acoustic-Phonetic Recognition
Acoustic-phonetic recognition functions at thephoneme level. It is an attractive approach tospeech as it limits the number of representationsthat must be stored. In English there are about
forty discernible phonemes no matter how largethe vocabulary, (Markowitz, 1995). Acoustic phonetic recognition involves three
steps,Feature Extraction.Segmentation and Labelling.Word-Level recognition.
-
7/29/2019 Speech Recognitions
53/70
Acoustic phonetic recognition supplantedtemplate matching in the early 1970's.
The successful ARPA SUR systems
highlighted potential benefits of thisapproach. Unfortunately acoustic phoneticwas at the time a poorly researched area
and many of the expected advances failedto materialise.
-
7/29/2019 Speech Recognitions
54/70
The high degree of acoustic similarity amongphonemes combined with phoneme variabilityresulting from the co-articulation effect and other
sources create uncertainty with regard topotential phoneme labels, (Cole 1986).
If these problems can be overcome, there iscertainly an opportunity for this technology to
play a part in future Speech Recognition system.
-
7/29/2019 Speech Recognitions
55/70
Stochastic Processing,
The term stochastic refers to the process of making asequence of non-deterministic selections from among aset of alternatives.
They are non-deterministic because the choices duringthe recognition process are governed by the
characteristics of the input and not specified in advance,(Markowitz, 1995). Like template matching, stochastic processing requires
the creation and storage of models of each of the itemsthat will be recognised.
It is based on a series of complex statistical orprobabilistic analyses. These statistics are stored in anetwork-like structure called a Hidden Markov Model(HMM), (Paul, 1990).
-
7/29/2019 Speech Recognitions
56/70
HMM
A Hidden Markov Model is made up of states andtransitions, which are shown, in the diagram. Each staterepresents of a HMM holds statistics for a segment of aword, which describe the value and variations that arefound in the model of that word segment. The transitions
allow for speech variations such as The prolonging of a word segment, this would causeseveral recursive transitions in the recogniser.
The omission of a word segment, This would cause atransition that skips a state.
Stochastic processing using Hidden Markov Models isaccurate, flexible, and capable of being fully automated,(Rabiner and Juang, 1986).
-
7/29/2019 Speech Recognitions
57/70
Neural networks
"if speech recognition systems could learn speechknowledge automatically and represent this knowledgein a parallel distributed fashion for rapid evaluation such a system would mimic the function of the humanbrain, which consists of several billion simple, inaccurate
and slow processors that perform reliable speechprocessing", (Waibel and Hampshire, 1989).
An artificial neural network is a computer program, whichattempt to emulate the biological functions of the Human
brain. They are an excellent classification systems, andhave been effective with noisy, patterned, variable datastreams containing multiple, overlapping, interacting andincomplete cues, (Markowitz, 1995).
-
7/29/2019 Speech Recognitions
58/70
Neural networks do not require the completespecification of a problem, learning instead throughexposure to large amount of example data. Neuralnetworks comprise of an input layer, one or more hiddenlayers, and one output layer. The way in which the nodesand layers of a network are organised is called thenetworks architecture.
The allure of neural networks for speech recognition liesin their superior classification abilities.
Considerable effort has been directed towardsdevelopment of networks to do word, syllable andphoneme classification.
-
7/29/2019 Speech Recognitions
59/70
Auditory Models,
The aim of auditory models to allow a SpeechRecognition system to screen all noise from thesignal and concentrate on the central speechpattern in a similar way to the Human Brain.
Auditory modelling offers the promise of beingable to develop robust Speech Recognitionsystems that are capable of working in difficultenvironments.
Currently, it is purely an experimentaltechnology.
Performance of Speech
-
7/29/2019 Speech Recognitions
60/70
Performance of SpeechRecognitions systems
Performance of speech recognition systems is typicallydescribed in terms of word error rate, defined as:
Deletion, The loss of a word within the original speech.The system outputs "A E I U" while the input was "A E I
O U". Substitution, The replacement of an element of the input,such as a word, with another. The system outputs "song"while the input was "long".
Insertion, The system adds an element to the input, such
as a word, when no word was input. The system outputs"A E I O U" while the input was "A E I U".
Speech Recognition as Assistive
-
7/29/2019 Speech Recognitions
61/70
Speech Recognition as AssistiveTechnology
Main use is as alternative Hands FreeData entry mechanism
Very effective
Much faster than switch access
Mainstream technology
Used in many applications where handsare needed for other things e.g. mobilephone while driving, in surgical theatres
-
7/29/2019 Speech Recognitions
62/70
Dictation is a big part of officeadministration and commercial speechrecognition systems are targeted at this
market.
-
7/29/2019 Speech Recognitions
63/70
Some interesting facts
Switch access users who were at around 5words per minute achieved 80 words withSR
This allowed them to do state exams
SR can be used for environmental controlsystems around the home e.g.
Open Curtains
-
7/29/2019 Speech Recognitions
64/70
People with speech impairment (DysarthicSpeech) have shown improved articulationafter using SR systems especially Discrete
systems
Reasons why SR may fail some
-
7/29/2019 Speech Recognitions
65/70
Reasons why SR may fail somepeople
Crowded room - Cannot have everyonetalking at once
Too many errors because all noises,
coughs, throat clearances etc are pickedup
Speech not good enough to use it
Not enough training Cognitive overhead too much for some
people
-
7/29/2019 Speech Recognitions
66/70
Too demanding physically Hard work totalk for a long time
Cannot be bothered with Initial Enrolment
Drinking- Adversely affects vocal cords
Smoking, Shouting, Dry Mouth and illnessall affect the vocal tract
Need to drink water
Room must not be too stuffy
-
7/29/2019 Speech Recognitions
67/70
Some links
The following are links to major speechrecognition links
Carnegie Mellon Speech
-
7/29/2019 Speech Recognitions
68/70
Carnegie Mellon SpeechDemos
CMU Communicator
Call: 1-877-CMU-PLAN (268-7526), also268-5144, or x8-1084
the information is accurate; you can use it foryour own travel planning
CMU Universal Speech Interface (USI) CMU Movie Line
Seems to be about apartments now
Call: (412) 268-1185
T l h D
http://www.speech.cs.cmu.edu/Communicator/http://www.speech.cs.cmu.edu/usi/http://www.speech.cs.cmu.edu/Movieline/http://www.speech.cs.cmu.edu/Movieline/http://www.speech.cs.cmu.edu/usi/http://www.speech.cs.cmu.edu/Communicator/ -
7/29/2019 Speech Recognitions
69/70
Telephone Demos
Nuancehttp://www.nuance.com
Banking: 1-650-847-7438
Travel Planning: 1-650-847-7427
Stock Quotes: 1-650-847-7423
SpeechWorkshttp://www.speechworks.com/demos/demos.htm
Banking: 1-888-729-3366
Stock Trading: 1-800-786-2571
http://www.nuance.com/http://www.speechworks.com/demos/demos.htmhttp://www.speechworks.com/demos/demos.htmhttp://www.nuance.com/ -
7/29/2019 Speech Recognitions
70/70
MIT Spoken Language SystemsLaboratoryhttp://www.sls.lcs.mit.edu/sls/whatwedo/applicati
ons.html Travel Plans (Pegasus): 1-877-648-8255
Weather (Jupiter): 1-888-573-8255
IBM http://www-3.ibm.com/software/speech/ Mutual Funds, Name Dialing: 1-877-VIA-
VOICE
http://www.sls.lcs.mit.edu/sls/whatwedo/applications.htmlhttp://www.sls.lcs.mit.edu/sls/whatwedo/applications.htmlhttp://www-3.ibm.com/software/speech/http://www-3.ibm.com/software/speech/http://www-3.ibm.com/software/speech/http://www-3.ibm.com/software/speech/http://www.sls.lcs.mit.edu/sls/whatwedo/applications.htmlhttp://www.sls.lcs.mit.edu/sls/whatwedo/applications.html