speech recognition principles
TRANSCRIPT
Speech Recognition Principles
Speech Recognition Concepts
2
NLPSpeech
Processing
Text Speech
NLPSpeech
Processing
SpeechUnderstanding
Speech Synthesis
TextPhone
Sequence
Speech Recognition
Speech recognition is the inverse of Speech Synthesis
Speech Recognition Approaches
Bottom-Up Approach
Top-Down Approach
Blackboard Approach
3
Bottom-Up Approach
4
Signal Processing
Feature Extraction
Segmentation
Signal Processing
Feature Extraction
Segmentation
Segmentation
Sound Classification Rules
Phonotactic Rules
Lexical Access
Language Model
Voiced/Unvoiced/SilenceK
now
ledge
Sourc
es
Recognized Utterance
Top-Down Approach
5
Recognized Utterance
Unit
Matching
System
Feature
Analysis
Lexical
Hypo
thesis
Syntactic
Hypo
thesis
Semantic
Hypo
thesis
Utterance
Verifier/
Matcher
Inventory
of speech
recognition
units
Word
DictionaryGrammar
Task
Model
Blackboard Approach
6
Environmental
Processes
Acoustic
Processes Lexical
Processes
Syntactic
Processes
Semantic
Processes
Black
board
7
An overall view of a speech recognition system
bottom up
top down
From Ladefoged 2001
Recognition TheoriesArticulatory Based Recognition
◦ Use from Articulatory system for recognition
◦ This theory is the most successful until now
Auditory Based Recognition◦ Use from Auditory system for recognition
Hybrid Based Recognition◦ Is a hybrid from the above theories
Motor Theory◦ Model the intended gesture of speaker
8
Recognition Problem
We have the sequence of acoustic symbols and we want to find the words expressed by speaker
Solution : Finding the most probable word sequence having Acoustic symbols
9
Recognition Problem
A : Acoustic Symbols
W : Word Sequence
we should find so that
10
w
)|(max)|ˆ( AwPAwPw
Bayse Rule
),()()|( yxPyPyxP
11
)(
)()|()|(
yP
xPxyPyxP
)(
)()|()|(
AP
wPwAPAwP
Bayse Rule (Cont’d)
12
)(
)()|(max
AP
wPwAP
w
)|(max)|ˆ( AwPAwPw
)()|(max
)|(maxˆ
wPwAPArg
AwPArgw
w
w
Simple Language Model
13
nwwwww 321
)|()( 1211
wwwwPwP iii
n
i
Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.
Simple Language Model (Cont’d)
14
)|()( 211
iii
n
iwwwPwP
)|()( 11
ii
n
iwwPwP
Trigram :
Bigram :
)()(1
i
n
iwPwP
Monogram :
Simple Language Model (Cont’d)
15
)|( 123 wwwP
Computing Method :
Number of happening W3 after W1W2
Total number of happening W1W2
AdHoc Method :
)()|()|()|( 332321231123 wfwwfwwwfwwwP
16
From Ladefoged 2001
P(A|W) Computing Approaches
Dynamic Time Warping (DTW)
Hidden Markov Model (HMM)
Artificial Neural Network (ANN)
Hybrid Systems
17
Dynamic Time Warping Method (DTW)To obtain a global distance between two speech patterns a time alignment must be performed
18
Ex :
A time alignment
path between a
template pattern
“SPEECH” and a
noisy input
“SsPEEhH”
Recognition Tasks
Isolated Word Recognition (IWR) And Continuous Speech Recognition (CSR)
Speaker Dependent And Speaker Independent
Vocabulary Size◦ Small <20
◦ Medium >100 , <1000
◦ Large >1000, <10000
◦ Very Large >10000
19
Error Production Factor
Prosody (Recognition should be Prosody Independent)
Noise (Noise should be prevented)
Spontaneous Speech
20
Artificial Neural Network
21
.
.
.
1x
0x
1w0w
1Nw
1Nx
y)(
1
1
i
N
i
i xwy
Simple Computation Element
of a Neural Network
Artificial Neural Network (Cont’d)
Neural Network Types◦ Perceptron
◦ Time Delay
◦ Time Delay Neural Network Computational Element (TDNN)
22
Artificial Neural Network (Cont’d)
23
. . .
. . .0x
0y 1My
1Nx
Single Layer Perceptron
Artificial Neural Network (Cont’d)
24
. . .
. . .
Three Layer Perceptron
. . .
. . .
Hybrid MethodsHybrid Neural Network and Matched Filter For Recognition
25
PATTERN
CLASSIFIER
SpeechAcoustic
Features DelaysOutput Units
Neural Network Properties
The system is simple, But too much iterative
Doesn’t determine a specific structure
Regardless of simplicity, the results are good
Training size is large, so training should be offline
Accuracy is relatively good
26
Hidden Markov Model
Observation : O1,O2, . . .
States in time : q1, q2, . . .
All states : s1, s2, . . .
27
tOOOO ,,,, 321
tqqqq ,,,, 321
Si Sjjia
ija