a few thoughts about asat some slides from nsf workshop presentation on knowledge integration...
TRANSCRIPT
![Page 1: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/1.jpg)
A few thoughts about ASAT
• Some slides from NSF workshop presentation on knowledge integration
• Thoughts about “islands of certainty”
• Neural networks: the good, the bad, and the ugly
• Short intro to the OSU team du jour
![Page 2: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/2.jpg)
Outline (or, rather, my list of questions)
• What is Knowledge Integration (KI)?
• How has KI influenced ASR to date?
• Where should KI be headed?– What types of cues should we be looking for?– How should cues be combined?
![Page 3: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/3.jpg)
What is Knowledge Integration?
• It means different things to different people– Combining multiple hypotheses– Bringing linguistic information to bear in ASR
• Working definition: – Combining multiple sources of evidence to
produce a final (or intermediate) hypothesis– Traditional ASR process uses KI
• Combines acoustic, lexical, and syntactic information
• But this is only the tip of the iceberg
![Page 4: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/4.jpg)
KI examples in ASR
FeatureCalculation
LanguageModeling
AcousticModeling
k @
PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…
cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …
• Acoustic model gives state hypotheses from features• Search integrates knowledge from acoustic,
pronunciation, and language models• Statistical models have “simple” dependencies
The cat chased the dog
S E A R C H
P(X|Q) P(Q|W) P(W)
![Page 5: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/5.jpg)
KI: Statistical Dependencies
FeatureCalculation
LanguageModeling
AcousticModeling
k @
PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…
cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …
• “Side information” from the speech waveform• Speaking rate• Prosodic information• Syllable boundaries
The cat chased the dog
S E A R C H
![Page 6: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/6.jpg)
KI: Statistical Dependencies
FeatureCalculation
LanguageModeling
AcousticModeling
k @
PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…
cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …
• Information from sources outside “traditional” system• Class n-grams, CFG/Collins-style parsers• Sentence-level stress• Vocal-tract length normalization
The cat chased the dog
S E A R C H
![Page 7: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/7.jpg)
KI: Statistical Dependencies
FeatureCalculation
LanguageModeling
AcousticModeling
k @
PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…
cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …
• Information from “internal” knowledge sources• Pronunciations w/ multi-words, LM probabilities• State-level pronunciation modeling• Buried Markov Models
The cat chased the dog
S E A R C H
![Page 8: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/8.jpg)
KI: Statistical Dependencies
FeatureCalculation
LanguageModeling
AcousticModeling
k @
PronunciationModelingcat: k@tdog: dogmail: mAlthe: D&, DE…
cat dog: 0.00002cat the: 0.0000005the cat: 0.029the dog: 0.031the mail: 0.054 …
• Information from errors made by system• Discriminative acoustic, pronunciation,
and language modeling
The cat chased the dog
S E A R C H
![Page 9: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/9.jpg)
KI: Model Combination
FeatureCalculation
LanguageModeling
AcousticModeling
PronunciationModeling
• Integrate multiple “final” hypotheses• ROVER• Word sausages (Mangu et al.)
The cat chased the dog
FeatureCalculation
LanguageModeling
AcousticModeling
PronunciationModeling
X
![Page 10: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/10.jpg)
KI: Model Combination
FeatureCalculation
AcousticModeling
• Combine multiple “non-final” hypotheses• Multi-stream modeling• Synchronous phonological feature modeling• Boosting• Interpolated language models
The cat chased the dogFeature
Calculation
LanguageModeling
AcousticModeling
PronunciationModeling
X
![Page 11: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/11.jpg)
Summary: Current uses of KI
• Probability conditioningP(A|B) -> P(A|B,X,Y,Z)– More refined (accurate?) models– Can complicate overall equation
• Model mergingP(A|B) -> f(P1(A|B),w1) + f(P2(A|B),w2)– Different views of information are (usually) good– But sometimes combination methods are not as
principled as one would like
![Page 12: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/12.jpg)
Where should we go from here?
• As a field have investigated many sources of knowledge– We learn more about language this way
• Cf. “More data is better data” school
• To make an impact we need– A common framework– Easy ways to combine knowledge– “Interesting” sources of knowledge
![Page 13: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/13.jpg)
KI in Event-Driven ASR
• Phonological features as events(from Chin’s proposal)
back alveolar
consonant consonantvowel
nasal
closure burstmid-low
closure burst
can’t
![Page 14: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/14.jpg)
KI in Event-Driven ASR
• Integrating multiple detectors– Easy if detectors are of the same type– Use both conditioning and model combination
back alveolar
consonant consonantvowel
nasal
closure burstmid-low
closure burst
can’t
P(back|detector1)P(back|detector2)
![Page 15: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/15.jpg)
KI in Event-Driven ASR
• Integrating multiple cross-type detectors– Simplest to use Naïve Bayes assumption
P(X|e1,e2,e3)=(P(e1|X)P(e2|X)P(e3|X)P(X))/Z
back alveolar
consonant consonantvowel
nasal
closure burstmid-low
closure burst
can’tP(k|features)
![Page 16: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/16.jpg)
KI in Event-Driven ASR
• Breakdown in Naïve Bayes– Detectors aren’t always independent
back alveolar
consonant consonantvowel
nasal
closure bursthigh
closure burst
can’t
k
Feature spreading correlated with vowel raising
New non-independent detector
![Page 17: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/17.jpg)
KI in Event-Driven ASR
• Wanted: Gestalt detector– View overall shape of detector streams
back alveolar
consonant consonantvowel
nasal
closure bursthigh
closure burst
P(can’t| )
k
![Page 18: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/18.jpg)
The Challenge of Plug-n-Play
• Shouldn’t have to re-learn entire system every time a new detector is added– Can’t have one global P(can’t|all variables)– Changes should be localized
• Implies need for hierarchical structure
• Composition structure should enable combination of radically different forms of information– E.g., audio-visual speech recognition
![Page 19: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/19.jpg)
The Challenge of Plug-n-Play
• Perhaps need three types of structures– Event integrators
• Is this a CVC syllable?• Problems like feature spreading become local
– Hypothesis generators• I think the word “can’t” is here.• Combines evidence from top-level integrators
– Hypothesis validators• Is this hypothesis consistent?• Language model, word boundary detection, …
• Still probably have Naïve Bayes problems
![Page 20: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/20.jpg)
What type of detectors should we be thinking about?
• Phonological features
• Phones
• Syllables? Words? Function Words?
• Syllable/word boundaries
• Prosodic stress
• … and a whole bunch of other things– We’ve already looked at a number of them– And Jim’s already made some of these points
![Page 21: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/21.jpg)
Putting it all together
• Huge multi-dimensional graph search
• Should not be strictly “left-to-right”– “Islands of certainty”– People tend to emphasize the important
words• …and we can usually detect them better
– Work backwards to firm up uncertain segments
![Page 22: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/22.jpg)
Summary
• As a field, we have looked at many influences on our probabilistic models
• Have gained expertise in– Probability conditioning– Model combination
• Event-driven ASR may provide challenging, but interesting framework for incorporating different ideas
![Page 23: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/23.jpg)
Thoughts about “islands of certainty”
![Page 24: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/24.jpg)
We can’t parse everything
• At least not on the first pass
• Need to find ways to cleverly reduce computation: center around things that we’re sure about– Can we use confidence values from “light”
detectors and refine? (likely)– Can we use external sources of knowledge to
help guide search? (likely)
![Page 25: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/25.jpg)
Word/syllable onset detection
• Several factors point to existence of factors that can help with word segmentation– Psychology experiments have suggested that
phonotactics plays a big role (e.g., Saffran et al.)– Shire (at ICSI) was able to train a pretty reliable
syllable boundary detector from acoustics– Syllable onsets pronounced more canonically than
nuclei or codas -- 84% vs 65% Switchboard, 90% vs 62%/80% TIMIT (Fosler-Lussier et al 99)
• Can we build “island of certainty” models by looking at a combination of acoustic/phonetic factors?
![Page 26: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/26.jpg)
Pronunciation numbers
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 27: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/27.jpg)
Integrating multiple units
• Naïve method: just try to combine everything in sight
• Refined method: process left to right, but process a buffer (e.g. .5-2 sec) – Look for islands– Back-fit other material in a way that makes
sense given the islands– Can use external measures like speaking rate
to validate likelihood of inferred structure
![Page 28: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/28.jpg)
Neural nets
• ANNs are good as non-linear discriminators• But they have a problem: when they’re wrong,
they are often REALLY wrong– Ex: training on TI digits (30 phones, easy)– CV frame-level margin: P(correct)-P(next competitor)
• 9% margin < -0.4, 8% margin -0.4--0• 8% margin 0-0.4, 75% margin >0.4
• Could chalk this up to “pronunciation variation”• Current thinking: if training more responsive to
margin, might move some of that 9% upward.
![Page 29: A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the](https://reader033.vdocuments.site/reader033/viewer/2022061603/56649efc5503460f94c0f2c3/html5/thumbnails/29.jpg)
Current personnel
• Me• Keith as consultant• Anton Rytting (Linguistics): part time senior grad
student, works on word segmentation in Greek; currenly twisting his arm
• Linguistics student TBA 1/05.• Incoming students (we’ll see who works)
– 1 ECE student (signal processing)– 2 CSE students (MS in reinforcement learning, BA in
genetic algorithms)