a few thoughts about asat some slides from nsf workshop presentation on knowledge integration...

29
A few thoughts about ASAT • Some slides from NSF workshop presentation on knowledge integration • Thoughts about “islands of certainty” • Neural networks: the good, the bad, and the ugly • Short intro to the OSU team du jour

Upload: malcolm-roberts

Post on 03-Jan-2016

239 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

A few thoughts about ASAT

• Some slides from NSF workshop presentation on knowledge integration

• Thoughts about “islands of certainty”

• Neural networks: the good, the bad, and the ugly

• Short intro to the OSU team du jour

Page 2: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Outline (or, rather, my list of questions)

• What is Knowledge Integration (KI)?

• How has KI influenced ASR to date?

• Where should KI be headed?– What types of cues should we be looking for?– How should cues be combined?

Page 3: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

What is Knowledge Integration?

• It means different things to different people– Combining multiple hypotheses– Bringing linguistic information to bear in ASR

• Working definition: – Combining multiple sources of evidence to

produce a final (or intermediate) hypothesis– Traditional ASR process uses KI

• Combines acoustic, lexical, and syntactic information

• But this is only the tip of the iceberg

Page 4: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI examples in ASR

FeatureCalculation

LanguageModeling

AcousticModeling

k @

PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…

cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …

• Acoustic model gives state hypotheses from features• Search integrates knowledge from acoustic,

pronunciation, and language models• Statistical models have “simple” dependencies

The cat chased the dog

S E A R C H

P(X|Q) P(Q|W) P(W)

Page 5: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI: Statistical Dependencies

FeatureCalculation

LanguageModeling

AcousticModeling

k @

PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…

cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …

• “Side information” from the speech waveform• Speaking rate• Prosodic information• Syllable boundaries

The cat chased the dog

S E A R C H

Page 6: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI: Statistical Dependencies

FeatureCalculation

LanguageModeling

AcousticModeling

k @

PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…

cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …

• Information from sources outside “traditional” system• Class n-grams, CFG/Collins-style parsers• Sentence-level stress• Vocal-tract length normalization

The cat chased the dog

S E A R C H

Page 7: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI: Statistical Dependencies

FeatureCalculation

LanguageModeling

AcousticModeling

k @

PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…

cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …

• Information from “internal” knowledge sources• Pronunciations w/ multi-words, LM probabilities• State-level pronunciation modeling• Buried Markov Models

The cat chased the dog

S E A R C H

Page 8: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI: Statistical Dependencies

FeatureCalculation

LanguageModeling

AcousticModeling

k @

PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…

cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …

• Information from errors made by system• Discriminative acoustic, pronunciation,

and language modeling

The cat chased the dog

S E A R C H

Page 9: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI: Model Combination

FeatureCalculation

LanguageModeling

AcousticModeling

PronunciationModeling

• Integrate multiple “final” hypotheses• ROVER• Word sausages (Mangu et al.)

The cat chased the dog

FeatureCalculation

LanguageModeling

AcousticModeling

PronunciationModeling

X

Page 10: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI: Model Combination

FeatureCalculation

AcousticModeling

• Combine multiple “non-final” hypotheses• Multi-stream modeling• Synchronous phonological feature modeling• Boosting• Interpolated language models

The cat chased the dogFeature

Calculation

LanguageModeling

AcousticModeling

PronunciationModeling

X

Page 11: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Summary: Current uses of KI

• Probability conditioningP(A|B) -> P(A|B,X,Y,Z)– More refined (accurate?) models– Can complicate overall equation

• Model mergingP(A|B) -> f(P1(A|B),w1) + f(P2(A|B),w2)– Different views of information are (usually) good– But sometimes combination methods are not as

principled as one would like

Page 12: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Where should we go from here?

• As a field have investigated many sources of knowledge– We learn more about language this way

• Cf. “More data is better data” school

• To make an impact we need– A common framework– Easy ways to combine knowledge– “Interesting” sources of knowledge

Page 13: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI in Event-Driven ASR

• Phonological features as events(from Chin’s proposal)

back alveolar

consonant consonantvowel

nasal

closure burstmid-low

closure burst

can’t

Page 14: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI in Event-Driven ASR

• Integrating multiple detectors– Easy if detectors are of the same type– Use both conditioning and model combination

back alveolar

consonant consonantvowel

nasal

closure burstmid-low

closure burst

can’t

P(back|detector1)P(back|detector2)

Page 15: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI in Event-Driven ASR

• Integrating multiple cross-type detectors– Simplest to use Naïve Bayes assumption

P(X|e1,e2,e3)=(P(e1|X)P(e2|X)P(e3|X)P(X))/Z

back alveolar

consonant consonantvowel

nasal

closure burstmid-low

closure burst

can’tP(k|features)

Page 16: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI in Event-Driven ASR

• Breakdown in Naïve Bayes– Detectors aren’t always independent

back alveolar

consonant consonantvowel

nasal

closure bursthigh

closure burst

can’t

k

Feature spreading correlated with vowel raising

New non-independent detector

Page 17: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

KI in Event-Driven ASR

• Wanted: Gestalt detector– View overall shape of detector streams

back alveolar

consonant consonantvowel

nasal

closure bursthigh

closure burst

P(can’t| )

k

Page 18: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

The Challenge of Plug-n-Play

• Shouldn’t have to re-learn entire system every time a new detector is added– Can’t have one global P(can’t|all variables)– Changes should be localized

• Implies need for hierarchical structure

• Composition structure should enable combination of radically different forms of information– E.g., audio-visual speech recognition

Page 19: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

The Challenge of Plug-n-Play

• Perhaps need three types of structures– Event integrators

• Is this a CVC syllable?• Problems like feature spreading become local

– Hypothesis generators• I think the word “can’t” is here.• Combines evidence from top-level integrators

– Hypothesis validators• Is this hypothesis consistent?• Language model, word boundary detection, …

• Still probably have Naïve Bayes problems

Page 20: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

What type of detectors should we be thinking about?

• Phonological features

• Phones

• Syllables? Words? Function Words?

• Syllable/word boundaries

• Prosodic stress

• … and a whole bunch of other things– We’ve already looked at a number of them– And Jim’s already made some of these points

Page 21: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Putting it all together

• Huge multi-dimensional graph search

• Should not be strictly “left-to-right”– “Islands of certainty”– People tend to emphasize the important

words• …and we can usually detect them better

– Work backwards to firm up uncertain segments

Page 22: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Summary

• As a field, we have looked at many influences on our probabilistic models

• Have gained expertise in– Probability conditioning– Model combination

• Event-driven ASR may provide challenging, but interesting framework for incorporating different ideas

Page 23: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Thoughts about “islands of certainty”

Page 24: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

We can’t parse everything

• At least not on the first pass

• Need to find ways to cleverly reduce computation: center around things that we’re sure about– Can we use confidence values from “light”

detectors and refine? (likely)– Can we use external sources of knowledge to

help guide search? (likely)

Page 25: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Word/syllable onset detection

• Several factors point to existence of factors that can help with word segmentation– Psychology experiments have suggested that

phonotactics plays a big role (e.g., Saffran et al.)– Shire (at ICSI) was able to train a pretty reliable

syllable boundary detector from acoustics– Syllable onsets pronounced more canonically than

nuclei or codas -- 84% vs 65% Switchboard, 90% vs 62%/80% TIMIT (Fosler-Lussier et al 99)

• Can we build “island of certainty” models by looking at a combination of acoustic/phonetic factors?

Page 26: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Pronunciation numbers

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 27: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Integrating multiple units

• Naïve method: just try to combine everything in sight

• Refined method: process left to right, but process a buffer (e.g. .5-2 sec) – Look for islands– Back-fit other material in a way that makes

sense given the islands– Can use external measures like speaking rate

to validate likelihood of inferred structure

Page 28: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Neural nets

• ANNs are good as non-linear discriminators• But they have a problem: when they’re wrong,

they are often REALLY wrong– Ex: training on TI digits (30 phones, easy)– CV frame-level margin: P(correct)-P(next competitor)

• 9% margin < -0.4, 8% margin -0.4--0• 8% margin 0-0.4, 75% margin >0.4

• Could chalk this up to “pronunciation variation”• Current thinking: if training more responsive to

margin, might move some of that 9% upward.

Page 29: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the

Current personnel

• Me• Keith as consultant• Anton Rytting (Linguistics): part time senior grad

student, works on word segmentation in Greek; currenly twisting his arm

• Linguistics student TBA 1/05.• Incoming students (we’ll see who works)

– 1 ECE student (signal processing)– 2 CSE students (MS in reinforcement learning, BA in

genetic algorithms)