acoustic / lexical model
DESCRIPTION
Acoustic / Lexical Model. Derk Geene. Speech recognition. P(words|signal)= P(signal|words) P(words) / P(signal) P(signal|words): Acoustic model P(words): Language model Idea: Maximize P(signal|words) P(words) Today: Acoustic model. Variability. Variation Speaker Pronunciation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/1.jpg)
Acoustic / Lexical Model
Derk Geene
![Page 2: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/2.jpg)
Speech recognition P(words|signal)=
P(signal|words) P(words) / P(signal)
P(signal|words): Acoustic model P(words): Language model
Idea: Maximize P(signal|words) P(words) Today: Acoustic model
![Page 3: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/3.jpg)
Variability Variation
Speaker Pronunciation Environmental Context
Static acoustic model will not work in real applications.
Dynamically adapt P(signal|words) while using the system.
![Page 4: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/4.jpg)
Measuring errors (1) 500 sentences of 6 – 10 words each from 5
to 10 different speakers. 10% relative error reduction
Training set / Development set
First decide optimal parameter settings.
![Page 5: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/5.jpg)
Measuring errors (2) Word recognition errors:
Substitution Deletion Insertion
Correct: Did mob mission area of the Copeland ever go to m4 in nineteen eighty one?
Recognized: Did mob mission area ** the copy land ever go to m4 in nineteen east one?
![Page 6: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/6.jpg)
Measuring errors (3)Correct: The effect is clearRecognised: Effect is not clear
Error RateOne by one: 75%
Subs + Dels + Ins#words in correct sentence
Word error rate=100% x
Word error rate
![Page 7: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/7.jpg)
Units of speech (1) Modeling is language dependent.fixme
Modeling unit Accurate Trainable Generalizable
![Page 8: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/8.jpg)
Units of speech (2) Whole-word models
Only suitable for small vocabulary recognition
Phone models Suitable for large vocabulary recognition Problem: over-generalize less accurate
Syllable models
![Page 9: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/9.jpg)
Context dependency (1) Recognition accuricy can be improved by
using context-dependent parameters.
Important in fast / spontanious speech.
Example: the phoneme /ee/
![Page 10: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/10.jpg)
Peat
Wheel
![Page 11: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/11.jpg)
Context dependency (2) Triphone model: phonetic model that takes into
consideration both the left and the right neightbouring phones.
If two phones have the same identity, but different left or right contexts, there are considered different triphones.
Interword context-dependent phones. Place in the word:
Beginning Middle End
![Page 12: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/12.jpg)
Context dependency (3) Stress
Longer duration Higher pitch More intensity
Word-level stress Import – Import Italy – Italian
Sentence-level stress I did have dinner. I did have dinner.
![Page 13: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/13.jpg)
Radio
Radio
![Page 14: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/14.jpg)
Context dependency (4) Vary much triphones.
503 = 125.000 Many phonemes have the same effects
/b/ & /p/ labial (pronounces by using lips) /r/ & /w/ liquids
Clustered acoustic-phonetic unitsIs the left-context phone a fricative?Is the right-context phone a front vowel?
![Page 15: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/15.jpg)
Acoustic model After feature extraction, we have a
sequence of feature vectors, such as the MFCC vector, as input data.
Feature stream
Phonemes / units
Words
Segmentation and labeling
Lexical access problem
![Page 16: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/16.jpg)
Acoustic model Signal Phonemes
Problem: phonemes can be pronounced differently Speaker differences Speaker rate Microphone
![Page 17: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/17.jpg)
Acoustic model Phonemes Words
The three major ways to do this: Vector Quantization Hidden Markov Models Neural Networks
![Page 18: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/18.jpg)
Acoustic model Problem: Multiple pronunciations:
owt
aa
eyt ow
t
ow
ax
m
aa
ey
t ow
0,5
0,5
0,8
m
Dialect variation
Coarticulation
0,5
0,5
0,2
![Page 19: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/19.jpg)
![Page 20: Acoustic / Lexical Model](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814f19550346895dbcaa44/html5/thumbnails/20.jpg)
The End