open problems in speech recognition nelson morgan, eecs and icsi

21
Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Post on 20-Dec-2015

226 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Open Problems in Speech Recognition

Nelson Morgan, EECS and ICSI

Page 2: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

ICSI and EECSICSI and EECS

•International Computer Science Institute

•Nonprofit, closely affiliated with UCB-EECS:

- faculty (e.g., Morgan, Feldman)- Board (Berlekamp, Karp, Malik)- students (PhD, MS)

• Focus areas in speech,language,theory, internet research; CITRIS involvement

Page 3: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

A working speech A working speech recognizer (circa 1920)recognizer (circa 1920)

Page 4: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

A working speech A working speech recognizer (circa 2002)recognizer (circa 2002)

Page 5: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Current ApplicationsCurrent Applications

•Toys

•Telephone queries (operator/touch tone replacement)

• Voice dialing (for cell phones)

• Dictation (esp. for specific domains)

Page 6: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Major Reasons for Major Reasons for SuccessSuccess

• Late 60’s statistical methodology (HMMs, developed for cryptography) applied to speech in 70’s and 80’s

• Moore’s Law + engineering refinements to HMM training/recognition (1986-now)

• Normalization approaches (mean norms, RASTA filtering, vocal tract length approx)

Page 7: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Two examples of things Two examples of things that helpedthat helped

• RASTA: 2% digit error -> 60% for different phone system; down to 3% using RASTA; now used for voice dialing in millions of cell phones

• Vocal tract length normalization: 1 parameter for each speaker, significant effect on errors; now used in all large research systems

Page 8: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Major Technical Major Technical ChallengesChallenges

•Speaker variability for fluent/conversational (pronunciation, rate, overlaps)

25-40%error on conversations

•Acoustic variability for general environments (noise, reverb, talker movement) 3-10%error on read digits (vs <1% in clean conditions)

Page 9: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Modern ASR SystemsModern ASR Systems

• From 50,000 ft, all ASR systems the same:

- compute local spectral envelope- determine likelihoods of speech

sounds- search for most likely HMMs

• Spectral envelope distorted by many things

- Alternatives often are bad fits to the statistical models

Page 10: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Pronunciation Lexicon

Signal Processing

PhoneticProbabilityEstimator

Decoder(word search)

WordsSpeech

Grammar

ASR in BriefASR in Brief

Page 11: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

ASR is half-deafASR is half-deaf

• Phonetic classification very poor

• Success due to constraints (domain, speaker, noise-canceling mic, etc)

• These constraints can mask the underlying weakness of the technology

Page 12: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Rethinking Acoustic Rethinking Acoustic Processing for ASRProcessing for ASR

• Escape dependence on spectral envelope

• Use multiple front ends across time/freq

• Modify statistical models to accommodate new front ends

• Design optimal combination schemes for multiple models

Page 13: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

The DARPA (IAO) The DARPA (IAO) “EARS” Program“EARS” Program

• New 5 year program to radically reduce errors in conversational speech-to-text

• Two components: - Rich Transcription (large reductions

in error rate, improvements in readability and portability to new languages)

- Novel Approaches (radical changes)

Page 14: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

EARS: Effective Affordable EARS: Effective Affordable Reusable Speech-to-textReusable Speech-to-text

• Rich Transcription: 4 teams- SRI/ICSI/UW- BBN/U.Pitt/UW/LIMSI- Cambridge U.- IBM

• Novel Approaches: 2 teams- ICSI/SRI/UW/OGI/Columbia/IDIAP- Microsoft

Page 15: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

time

Novel Approach 1: Novel Approach 1: Pushing the Envelope Pushing the Envelope

(aside)(aside)

• Problem: Spectral envelope is a fragile information carrier

estimate of sound identity

info

rmat

ion

fusi

on

10 msOLD

PROPOSED

• Solution: Probabilities from multiple time-frequency patches

i-th estimate

up to 1s

k-th estimate

n-th estimate

estimate of sound identity

Page 16: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Novel Approach 2: Novel Approach 2: Beyond Frames…Beyond Frames…

• Solution: Advanced features require advanced models, not limited by fixed-frame-rate paradigm

OLD

PROPOSED

conventional HMMshort-term features

• Problem: Features & models interact, new features may require different models

advanced features multi-rate / dynamic scale classifier

Page 17: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Other speech-to-text Other speech-to-text projectsprojects

• Dialog systems: DARPA Communicator/Symphony, German SmartKom

• Noise/reverberation for cell phone, military environments: DARPA SPINE program, various European projects (EU, ETSI)

• Recognition/retrieval/summarization for multiparty meetings: Swiss IM2, EU m4, ICSI/UW/SRI/Columbia NSF-ITR

Page 18: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Resource generation Resource generation from Berkeley from Berkeley researchersresearchers

• gmtk - a new graphical model toolkit specialized for speech (extension of 2 PhD theses, Bilmes [UW] and Zweig [IBM]) -

• Publicly available speech/neural network software (RASTA, speech neural network training system)

• Soon: a “meeting data” corpus

Page 19: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Campus interactionCampus interaction

• Within EECS (CIS):- Feldman (also ICSI), NLU- Jordan and Russell, machine

learning

• Linguists:- Ohala, phonology- Fillmore(ICSI), semantic

lexicography

Page 20: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

Natural Speech + Natural Speech + Language Projects at Language Projects at

ICSI/EECSICSI/EECS• Berkeley Restaurant Project (BeRP) - online stochastic context free grammar probabilities with natural mixed initiative

• SmartKom - tourist information query system w/American pronunciations of German place names

Page 21: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI

SummarySummary

• Progress in speech recognition research led to working systems in particular domains

• Performance still severely limited for conversational speech, noisy/reverberant conditions

• We and others are working to transcend these limitations with novel approaches