speech recognitions

Upload: ravishankar-yadav

Post on 03-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Speech Recognitions

    1/70

    Speech Recognition

  • 7/29/2019 Speech Recognitions

    2/70

    Definition

    Speech recognition is the process of convertingan acoustic signal, captured by a microphone ora telephone, to a set of words.

    The recognised words can be an end inthemselves, as for applications such ascommands & control, data entry, and documentpreparation.

    They can also serve as the input to furtherlinguistic processing in order to achieve speechunderstanding

  • 7/29/2019 Speech Recognitions

    3/70

    Speech Processing

    Signal processing: Convert the audio wave into a sequence of feature vectors

    Speech recognition: Decode the sequence of feature vectors into a sequence

    of words Semantic interpretation: Determine the meaning of the recognized words

    Dialog Management: Correct errors and help get the task done

    Response Generation What words to use to maximize user understanding

    Speech synthesis (Text to Speech): Generate synthetic speech from a marked-up word string

  • 7/29/2019 Speech Recognitions

    4/70

    Dialog Management

    Goal: determine what to accomplish in responseto user utterances, e.g.:

    Answer user question

    Solicit further information Confirm/Clarify user utterance

    Notify invalid query

    Notify invalid query and suggest alternative

    Interface between user/language processingcomponents and system knowledge base

  • 7/29/2019 Speech Recognitions

    5/70

    What you can do with Speech

    Recognition

    Transcription

    dictation, information retrieval

    Command and control data entry, device control, navigation, callrouting

    Information access

    airline schedules, stock quotes, directoryassistance

    Problem solving

    travel planning, logistics

  • 7/29/2019 Speech Recognitions

    6/70

    Transcription and Dictation

    Transcription is transforming a stream ofhuman speech into computer-readableform

    Medical reports, court proceedings, notes

    Indexing (e.g., broadcasts)

    Dictation is the interactive composition of

    text Report, correspondence, etc.

  • 7/29/2019 Speech Recognitions

    7/70

    Speech recognition andunderstanding

    Sphinx system

    speaker-independent

    continuous speech

    large vocabulary

    ATIS system

    air travel information retrieval

    context management

  • 7/29/2019 Speech Recognitions

    8/70

    Speech Recognition and CallCentres

    Automate services, lower payroll

    Shorten time on hold

    Shorten agent and client call time

    Reduce fraud

    Improve customer service

  • 7/29/2019 Speech Recognitions

    9/70

    Applications related to Speech

    Recognition

    Speech Recognition

    Figure out what a person is saying.

    Speaker Verification

    Authenticate that a person is who she/heclaims to be.

    Limited speech patterns

    Speaker Identification

    Assigns an identity to the voice of anunknown person.

    Arbitrary speech patterns

  • 7/29/2019 Speech Recognitions

    10/70

    Many kinds of Speech RecognitionSystems

    Speech recognition systems can becharacterised by many parameters.

    An isolated-word (Discrete) speechrecognition system requires that thespeaker pauses briefly between words,whereas a continuous speech recognition

    system does not.

  • 7/29/2019 Speech Recognitions

    11/70

    Spontaneous V Scripted

    Spontaneous, speech containsdisfluencies, periods of pause and restart,and is much more difficult to recognise

    than speech read from script.

  • 7/29/2019 Speech Recognitions

    12/70

    Enrolment

    Some systems require speaker enrolment,a user must provide samples of his or herspeech before using them, whereas other

    systems are said to be speaker-independent, in that no enrolment isnecessary.

  • 7/29/2019 Speech Recognitions

    13/70

    Large V small vocabularies

    Some of the other parameters depend on thespecific task. Recognition is generally moredifficult when vocabularies are large with manysimilar-sounding words.

    When speech is produced in a sequence ofwords, language models or artificial grammarsare used to restrict the combination of words.

    The simplest language model can be specifiedas a finite-state network, where the permissiblewords following each word are given explicitly.

  • 7/29/2019 Speech Recognitions

    14/70

    Perplexity

    One popular measure of the difficulty ofthe task, combining the vocabulary sizeand the language model, is perplexity.

    Loosely defined as the geometric mean ofthe number of words that can follow aword after the language model has been

    applied., (Zue, Cole, and Ward, 1995).

  • 7/29/2019 Speech Recognitions

    15/70

    Finally, some external parameters canaffect speech recognition systemperformance. These include the

    characteristics of the environmental noiseand the type and the placement of themicrophone.

  • 7/29/2019 Speech Recognitions

    16/70

    Properties of RecognizersSummary

    Speaker Independent vs. Speaker Dependent Large Vocabulary (2K-200K words) vs.

    Limited Vocabulary (2-200)

    Continuous vs. Discrete

    Speech Recognition vs. Speech Verification Real Time vs. multiples of real time

  • 7/29/2019 Speech Recognitions

    17/70

    Continued

    Spontaneous Speech vs. Read Speech Noisy Environment vs. Quiet Environment High Resolution Microphone vs. Telephone vs.

    Cellphone Push-and-hold vs. push-to-talk vs. always-

    listening Adapt to speaker vs. non-adaptive Low vs. High Latency With online incremental results vs. final results Dialog Management

  • 7/29/2019 Speech Recognitions

    18/70

    Features That Distinguish

    Products & Applications

    Words, phrases, and grammar

    Models of the speakers

    Speech flow

    Vocabulary: How many words

    How you add new words

    GrammarsBranching Factor (Perplexity)

    Available languages

  • 7/29/2019 Speech Recognitions

    19/70

    Systems are also defined by Users

    Different Kinds of Users

    One time vs. Frequent users

    Homogeneity Technically sophisticated

    Based on Users have different speaker

    models

  • 7/29/2019 Speech Recognitions

    20/70

    Speaker Models

    Speaker Dependent

    Speaker Independent

    Speaker Adaptive

  • 7/29/2019 Speech Recognitions

    21/70

    Automate services, lower

    payroll

    Shorten time on hold

    Shorten agent and client call

    timeReduce fraud

    Improve customer service

    Sample Market: Call Centers

  • 7/29/2019 Speech Recognitions

    22/70

    A TIMELINE OF SPEECHRECOGNITION

    1890s Alexander Graham Bell discovers Phone whiletrying to develop speech recognition system for deafpeople.

    1936AT&T's Bell Labs produced the first electronic

    speech synthesizer called the Voder (Dudley, Riesz andWatkins).

    This machine was demonstrated in the 1939 World Fairsby experts that used a keyboard and foot pedals to playthe machine and emit speech.

    1969John Pierce of Bell Labs said automatic speechrecognition will not be a reality for several decadesbecause it requires artificial intelligence.

  • 7/29/2019 Speech Recognitions

    23/70

    Early 70s

    Early 1970'sThe Hidden Markov Modeling(HMM) approach to speech recognition wasinvented by Lenny Baum of Princeton Universityand shared with several ARPA (AdvancedResearch Projects Agency) contractors includingIBM.

    HMM is a complex mathematical pattern-matching strategy that eventually was adoptedby all the leading speech recognition companiesincluding Dragon Systems, IBM, Philips, AT&Tand others.

  • 7/29/2019 Speech Recognitions

    24/70

    70+

    1971DARPA (Defense Advanced Research Projects Agency)established the Speech Understanding Research (SUR) program todevelop a computer system that could understand continuousspeech.

    Lawrence Roberts, who initiated the program, spent $3 million peryear of government funds for 5 years. Major SUR project groups

    were established at CMU, SRI, MIT's Lincoln Laboratory, SystemsDevelopment Corporation (SDC), and Bolt, Beranek, and Newman(BBN). It was the largest speech recognition project ever.

    1978The popular toy "Speak and Spell" by Texas Instruments wasintroduced. Speak and Spell used a speech chip which led to huge

    strides in development of more human-like digital synthesis sound.

  • 7/29/2019 Speech Recognitions

    25/70

    80+

    1982Covox founded. Company brought digital sound (viaThe Voice Master, Sound Master and The SpeechThing) to the Commodore 64, Atari 400/800, and finallyto the IBM PC in the mid 80s.

    1982Dragon Systems was founded in 1982 by speechindustry pioneers Drs. Jim and Janet Baker. DragonSystems is well known for its long history of speech andlanguage technology innovations and its large patentportfolio.

    1984SpeechWorks, the leading provider of over-the-telephone automated speech recognition (ASR)solutions, was founded.

  • 7/29/2019 Speech Recognitions

    26/70

    90s

    1993 Covox sells its products out to Creative Labs, Inc. 1995 Dragon released discrete word dictation-level speech

    recognition software. It was the first time dictation speechrecognition technology was available to consumers. IBM andKurzweil followed a few months later.

    1996 Charles Schwab is the first company to devote resourcestowards developing up a speech recognition IVR system withNuance. The program, Voice Broker, allows for up to 360simultaneous customers to call in and get quotes on stock andoptions... it handles up to 50,000 requests each day. The systemwas found to be 95% accurate and set the stage for othercompanies such as Sears, Roebuck and Co., and United Parcel

    Service of America Inc., and E*Trade Securities to follow in theirfootsteps. 1996 BellSouth launches the world's first voice portal, called Val

    and later Info By Voice.

  • 7/29/2019 Speech Recognitions

    27/70

    95+

    1997 Dragon introduced "Naturally Speaking", the first"continuous speech" dictation software available(meaning you no longer need to pause between wordsfor the computer to understand what you're saying).

    1998 Lernout & Hauspie bought Kurzweil. Microsoftinvested $45 million in Lernout & Hauspie to form apartnership that will eventually allow Microsoft to usetheir speech recognition technology in their systems.

    1999 Microsoft acquired Entropic, giving Microsoft

    access to what was known as the "most accurate speechrecognition system" in the Old VCR!

  • 7/29/2019 Speech Recognitions

    28/70

    2000

    2000 Lernout & Hauspie acquired Dragon Systemsfor approximately $460 million.

    2000 TellMe introduces first world-wide voiceportal.

    2000 NetBytel launched the world's first voiceenabler, which includes an on-line orderingapplication with real-time Internet integration forOffice Depot.

  • 7/29/2019 Speech Recognitions

    29/70

    2000s

    2001ScanSoft Closes Acquisition of Lernout& Hauspie Speech and Language Assets.

    2003ScanSoft Ships Dragon

    NaturallySpeaking 7 Medical, LowersHealthcare Costs through Highly AccurateSpeech Recognition.

    2003ScanSoft closes deal to distribute andsupport IBM ViaVoice Desktop Products.

  • 7/29/2019 Speech Recognitions

    30/70

    Signal Variability

    Speech recognition is a difficult problem, largely becauseof the many sources of variability associated with thesignal.

    The acoustic realisations of phonemes, the recognitionsystems smallest sound units of which words are

    composed, are highly dependent on the context in whichthey appear.

    These phonetic variables are exemplified by the acousticdifferences of the phoneme 't/'in two, true, and butter inEnglish.

    At word boundaries, contextual variations can be quitedramatic, and devo andare sound like devandare inItalian.

  • 7/29/2019 Speech Recognitions

    31/70

    More

    Acoustic variability can result from changes inthe environment as well as in the position andcharacteristics of the transducer.

    Within-speaker variability can result fromchanges in the speaker's physical and emotionalstate, speaking rate, or voice quality.

    Differences in socio-linguistic background,dialect, and vocal tract size and shape cancontribute to across-speaker variability.

  • 7/29/2019 Speech Recognitions

    32/70

    What is a speech recognitionsystem?

    Speech recognition is generally used as ahuman computer interface for other software.When it functions in this role, three primary tasks

    need be performed. Pre-processing, the conversion of spoken input

    into a form the recogniser can process. Recognition, the identification of what has been

    said. Communication, to send the recognised input tothe application that requested it.

  • 7/29/2019 Speech Recognitions

    33/70

    How is pre-processing performed

    To understand how the first of thesefunctions is performed, we must examine,

    Articulation, the production of the sound.

    Acoustics, the stream of the speech itself.

    What characterises the ability tounderstand spoke input, Auditoryperception.

  • 7/29/2019 Speech Recognitions

    34/70

    Articulation

    The science of articulation is concerned with howphonemes are produced. The focus of articulation is onthe vocal apparatus of the throat, mouth and nose wherethe sounds are produced.

    The phonemes themselves need to be classified, thesystem most often used by speech recognition is theARPABET, (Rabiner and Juang, 1993) The ARPABETwas created in the 1970s by and for contractors workingon speech processing for the Advanced Research

    Projects Agency of the U.S. department of defence.

  • 7/29/2019 Speech Recognitions

    35/70

    ARPABET

    Like most phoneme classifications, theARPABET separates consonants from vowels.

    Consonants are characterised by a total orpartial blockage of the vocal tract.

    Vowels are characterised by strong harmonicpatterns and relatively free passage of air

    through the vocal tract. Semi-Vowels, such as the y in you, fall between

    consonants and vowels.

  • 7/29/2019 Speech Recognitions

    36/70

    Consonant Classifcation

    Consonant classification uses the,

    Point of articulation.

    Manner of articulation. Presence or absence of voicing.

  • 7/29/2019 Speech Recognitions

    37/70

    Acoustics

    Articulation provides valuable informationabout how speech sounds are produced,but a speech recognition system cannot

    analyse movements of the mouth. Instead, the data source for speech

    recognition is the stream of speech itself.

    This is an analogue signal, a soundstream, and a continuous flow of soundwaves and silence.

  • 7/29/2019 Speech Recognitions

    38/70

    Important Features (Acoustics)

    Four important features of the acoustic analysisof speech are, (Carter, 1984)

    Frequency, the number of vibrations per second

    a sound produces Amplitude, the loudness of the sound.

    Harmonic structure added to the fundamentalfrequency of a sound are other frequencies thatcontribute to its quality or timbre.

    Resonance.

  • 7/29/2019 Speech Recognitions

    39/70

    Auditory perception, hearingspeech.

    "Phonemes tend to be abstractions that are implicitlydefined by the pronunciation of the words in thelanguage. In particular, the acoustic realisation of aphoneme may heavily depend on the acoustic context in

    which it occurs. This effect is usually called co-articulation", (Ney, 1994).

    The way a phoneme is pronounced can be affected byits position in a word, neighbouring phonemes and eventhe word's position in a sentence. This affect is called the

    co-articulation effect. The variability in the speech signal caused by co-

    articulation and other sources make speech analysisvery difficult.

  • 7/29/2019 Speech Recognitions

    40/70

    Human Hearing

    The human ear can detect frequencies from 20Hz to20,000Hz but it is most sensitive in the critical frequencyrange, 1000Hz to 6000Hz, (Ghitza, 1994).

    Recent Research has uncovered the fact that humansdo not process individual frequencies.

    Instead, we hear groups of frequencies, such as formatpatterns, as cohesive units and we are capable ofdistinguishing them from surrounding sound patterns,(Carrell and Opie, 1992) .

    This capability, called auditory object formation, or

    auditory image formation, helps explain how humans candiscern the speech of individual people at cocktail partiesand separate a voice from noise over a poor telephonechannel, (Markowitz, 1995).

  • 7/29/2019 Speech Recognitions

    41/70

    Pre-processing Speech

    Like all sounds, speech is an analoguewaveform. In order for a Recognition System toperform action on speech, it must berepresented in a digital manner.

    All noise patterns silences and co-articulationeffects must be captured.

    This is accomplished by digital signalprocessing. The way the analogue speech isprocessed is one of the most complex elementsof a Speech Recognition system.

  • 7/29/2019 Speech Recognitions

    42/70

    Recognition Accuracy

    To achieve high recognition accuracy thespeech representation process should,(Markowitz, 1995),

    Include all critical data.

    Remove Redundancies.

    Remove Noise and Distortion.

    Avoid introducing new distortions.

  • 7/29/2019 Speech Recognitions

    43/70

    Signal Representation

    In statistically based automatic speechrecognition, the speech waveform is sampled ata rate between 6.6 kHz and 20 kHz andprocessed to produce a new representation as asequence of vectors containing values of whatare generally called parameters.

    The vectors typically comprise between 10 and20 parameters, and are usually computed every10 or 20 milliseconds.

  • 7/29/2019 Speech Recognitions

    44/70

    Parameter Values

    These parameter values are then used insucceeding stages in the estimation of theprobability that the portion of waveform justanalysed corresponds to a particular phonetic

    event that occurs in the phone-sized or whole-word reference unit being hypothesised.

    In practice, the representation and theprobability estimation interact strongly: what one

    person sees as part of the representationanother may see as part of the probabilityestimation process.

  • 7/29/2019 Speech Recognitions

    45/70

    Emotional State

    Representations aim to preserve the informationneeded to determine the phonetic identity of aportion of speech while being as impervious as

    possible to factors such as speaker differences,effects introduced by communications channels,and paralinguistic factors such as the emotionalstate of the speaker.

    They also aim to be as compact as possible.

  • 7/29/2019 Speech Recognitions

    46/70

    Representations used in current speechrecognisers, concentrate primarily on propertiesof the speech signal attributable to the shape ofthe vocal tract rather than to the excitation,

    whether generated by a vocal-tract constrictionor by the larynx.

    Representations are sensitive to whether thevocal folds are vibrating or not (thevoiced/unvoiced distinction), but try to ignoreeffects due to variations in their frequency ofvibration.

    F t I t i S h

  • 7/29/2019 Speech Recognitions

    47/70

    Future Improvements in SpeechRepresentation.

    The vast majority of major commercial andexperimental systems use representations akinto those described here.

    However, in striving to develop betterrepresentations, wave-let transforms(Daubechies, 1990) are being explored, and

    neural network methods are being used toprovide non-linear operations on log spectralrepresentations.

  • 7/29/2019 Speech Recognitions

    48/70

    Work continues on representations more closelyreflecting auditory properties (Greenberg, 1988) and onrepresentations reconstructing articulatory gestures fromthe speech signal (Schroeter & Sondhi, 1994).

    It is attractive because it holds out the promise of a smallset of smoothly varying parameters that could deal in asimple and principled way with the interactions that occurbetween neighbouring phonemes and with the effects of

    differences in speaking rate and of carefulness ofenunciation.

  • 7/29/2019 Speech Recognitions

    49/70

    The ultimate challenge is to match the superiorperformance of human listeners over automaticrecognisers.

    This superiority is especially marked when there is littlematerial to allow adaptation to the voice of the current

    speaker, and when the acoustic conditions are difficult. The fact that it persists even when nonsense words are

    used shows that it exists at least partly at theacoustic/phonetic level and cannot be explained purelyby superior language modelling in the brain.

    It confirms that there is still much to be done indeveloping better representations of the speech signal,(Rabiner and Schafer, 1978; Hunt, 1993).

  • 7/29/2019 Speech Recognitions

    50/70

    Signal Recognition Technologies

    Signal Recognition methodologies fall intoto four categories, most system will applyone or more in the conversion process.

  • 7/29/2019 Speech Recognitions

    51/70

    Template Matching,

    Template match is the oldest and least effective method.It is a form of pattern recognition.

    It was the dominant technology in the 1950's and 1960's. Each word or phrase in an application is stored as a

    template. The user input is also arranged into templates at the

    word level and the best match with a system template isfound.

    Although Template matching is currently in decline asthe basic approach to recognition, it has been adaptedfor use in word spotting applications. It also remains theprimary technology applied to speaker verification,(Moore, 1982).

  • 7/29/2019 Speech Recognitions

    52/70

    Acoustic-Phonetic Recognition

    Acoustic-phonetic recognition functions at thephoneme level. It is an attractive approach tospeech as it limits the number of representationsthat must be stored. In English there are about

    forty discernible phonemes no matter how largethe vocabulary, (Markowitz, 1995). Acoustic phonetic recognition involves three

    steps,Feature Extraction.Segmentation and Labelling.Word-Level recognition.

  • 7/29/2019 Speech Recognitions

    53/70

    Acoustic phonetic recognition supplantedtemplate matching in the early 1970's.

    The successful ARPA SUR systems

    highlighted potential benefits of thisapproach. Unfortunately acoustic phoneticwas at the time a poorly researched area

    and many of the expected advances failedto materialise.

  • 7/29/2019 Speech Recognitions

    54/70

    The high degree of acoustic similarity amongphonemes combined with phoneme variabilityresulting from the co-articulation effect and other

    sources create uncertainty with regard topotential phoneme labels, (Cole 1986).

    If these problems can be overcome, there iscertainly an opportunity for this technology to

    play a part in future Speech Recognition system.

  • 7/29/2019 Speech Recognitions

    55/70

    Stochastic Processing,

    The term stochastic refers to the process of making asequence of non-deterministic selections from among aset of alternatives.

    They are non-deterministic because the choices duringthe recognition process are governed by the

    characteristics of the input and not specified in advance,(Markowitz, 1995). Like template matching, stochastic processing requires

    the creation and storage of models of each of the itemsthat will be recognised.

    It is based on a series of complex statistical orprobabilistic analyses. These statistics are stored in anetwork-like structure called a Hidden Markov Model(HMM), (Paul, 1990).

  • 7/29/2019 Speech Recognitions

    56/70

    HMM

    A Hidden Markov Model is made up of states andtransitions, which are shown, in the diagram. Each staterepresents of a HMM holds statistics for a segment of aword, which describe the value and variations that arefound in the model of that word segment. The transitions

    allow for speech variations such as The prolonging of a word segment, this would causeseveral recursive transitions in the recogniser.

    The omission of a word segment, This would cause atransition that skips a state.

    Stochastic processing using Hidden Markov Models isaccurate, flexible, and capable of being fully automated,(Rabiner and Juang, 1986).

  • 7/29/2019 Speech Recognitions

    57/70

    Neural networks

    "if speech recognition systems could learn speechknowledge automatically and represent this knowledgein a parallel distributed fashion for rapid evaluation such a system would mimic the function of the humanbrain, which consists of several billion simple, inaccurate

    and slow processors that perform reliable speechprocessing", (Waibel and Hampshire, 1989).

    An artificial neural network is a computer program, whichattempt to emulate the biological functions of the Human

    brain. They are an excellent classification systems, andhave been effective with noisy, patterned, variable datastreams containing multiple, overlapping, interacting andincomplete cues, (Markowitz, 1995).

  • 7/29/2019 Speech Recognitions

    58/70

    Neural networks do not require the completespecification of a problem, learning instead throughexposure to large amount of example data. Neuralnetworks comprise of an input layer, one or more hiddenlayers, and one output layer. The way in which the nodesand layers of a network are organised is called thenetworks architecture.

    The allure of neural networks for speech recognition liesin their superior classification abilities.

    Considerable effort has been directed towardsdevelopment of networks to do word, syllable andphoneme classification.

  • 7/29/2019 Speech Recognitions

    59/70

    Auditory Models,

    The aim of auditory models to allow a SpeechRecognition system to screen all noise from thesignal and concentrate on the central speechpattern in a similar way to the Human Brain.

    Auditory modelling offers the promise of beingable to develop robust Speech Recognitionsystems that are capable of working in difficultenvironments.

    Currently, it is purely an experimentaltechnology.

    Performance of Speech

  • 7/29/2019 Speech Recognitions

    60/70

    Performance of SpeechRecognitions systems

    Performance of speech recognition systems is typicallydescribed in terms of word error rate, defined as:

    Deletion, The loss of a word within the original speech.The system outputs "A E I U" while the input was "A E I

    O U". Substitution, The replacement of an element of the input,such as a word, with another. The system outputs "song"while the input was "long".

    Insertion, The system adds an element to the input, such

    as a word, when no word was input. The system outputs"A E I O U" while the input was "A E I U".

    Speech Recognition as Assistive

  • 7/29/2019 Speech Recognitions

    61/70

    Speech Recognition as AssistiveTechnology

    Main use is as alternative Hands FreeData entry mechanism

    Very effective

    Much faster than switch access

    Mainstream technology

    Used in many applications where handsare needed for other things e.g. mobilephone while driving, in surgical theatres

  • 7/29/2019 Speech Recognitions

    62/70

    Dictation is a big part of officeadministration and commercial speechrecognition systems are targeted at this

    market.

  • 7/29/2019 Speech Recognitions

    63/70

    Some interesting facts

    Switch access users who were at around 5words per minute achieved 80 words withSR

    This allowed them to do state exams

    SR can be used for environmental controlsystems around the home e.g.

    Open Curtains

  • 7/29/2019 Speech Recognitions

    64/70

    People with speech impairment (DysarthicSpeech) have shown improved articulationafter using SR systems especially Discrete

    systems

    Reasons why SR may fail some

  • 7/29/2019 Speech Recognitions

    65/70

    Reasons why SR may fail somepeople

    Crowded room - Cannot have everyonetalking at once

    Too many errors because all noises,

    coughs, throat clearances etc are pickedup

    Speech not good enough to use it

    Not enough training Cognitive overhead too much for some

    people

  • 7/29/2019 Speech Recognitions

    66/70

    Too demanding physically Hard work totalk for a long time

    Cannot be bothered with Initial Enrolment

    Drinking- Adversely affects vocal cords

    Smoking, Shouting, Dry Mouth and illnessall affect the vocal tract

    Need to drink water

    Room must not be too stuffy

  • 7/29/2019 Speech Recognitions

    67/70

    Some links

    The following are links to major speechrecognition links

    Carnegie Mellon Speech

  • 7/29/2019 Speech Recognitions

    68/70

    Carnegie Mellon SpeechDemos

    CMU Communicator

    Call: 1-877-CMU-PLAN (268-7526), also268-5144, or x8-1084

    the information is accurate; you can use it foryour own travel planning

    CMU Universal Speech Interface (USI) CMU Movie Line

    Seems to be about apartments now

    Call: (412) 268-1185

    T l h D

    http://www.speech.cs.cmu.edu/Communicator/http://www.speech.cs.cmu.edu/usi/http://www.speech.cs.cmu.edu/Movieline/http://www.speech.cs.cmu.edu/Movieline/http://www.speech.cs.cmu.edu/usi/http://www.speech.cs.cmu.edu/Communicator/
  • 7/29/2019 Speech Recognitions

    69/70

    Telephone Demos

    Nuancehttp://www.nuance.com

    Banking: 1-650-847-7438

    Travel Planning: 1-650-847-7427

    Stock Quotes: 1-650-847-7423

    SpeechWorkshttp://www.speechworks.com/demos/demos.htm

    Banking: 1-888-729-3366

    Stock Trading: 1-800-786-2571

    http://www.nuance.com/http://www.speechworks.com/demos/demos.htmhttp://www.speechworks.com/demos/demos.htmhttp://www.nuance.com/
  • 7/29/2019 Speech Recognitions

    70/70

    MIT Spoken Language SystemsLaboratoryhttp://www.sls.lcs.mit.edu/sls/whatwedo/applicati

    ons.html Travel Plans (Pegasus): 1-877-648-8255

    Weather (Jupiter): 1-888-573-8255

    IBM http://www-3.ibm.com/software/speech/ Mutual Funds, Name Dialing: 1-877-VIA-

    VOICE

    http://www.sls.lcs.mit.edu/sls/whatwedo/applications.htmlhttp://www.sls.lcs.mit.edu/sls/whatwedo/applications.htmlhttp://www-3.ibm.com/software/speech/http://www-3.ibm.com/software/speech/http://www-3.ibm.com/software/speech/http://www-3.ibm.com/software/speech/http://www.sls.lcs.mit.edu/sls/whatwedo/applications.htmlhttp://www.sls.lcs.mit.edu/sls/whatwedo/applications.html