1 2.5.4.1 basics of neural networks. 2 2.5.4.2 neural network topologies
TRANSCRIPT
1
2.5.4.1 Basics of Neural Networks2.5.4.1 Basics of Neural Networks0X
INPUT
1X
2X
1NX
Y
OUTPUT
1
0
N
iii xWfy
2
2.5.4.2 Neural Network Topologies2.5.4.2 Neural Network Topologies
3
2.5.4.2 Neural Network 2.5.4.2 Neural Network TopologiesTopologies
4
2.5.4.2 Neural Network Topologies2.5.4.2 Neural Network Topologies
5
TDNNTDNN
6
2.5.4.6 Neural Network Structures for 2.5.4.6 Neural Network Structures for Speech RecognitionSpeech Recognition
7
2.5.4.6 Neural Network Structures for 2.5.4.6 Neural Network Structures for
Speech RecognitionSpeech Recognition
8
3.1.1 Spectral Analysis Models3.1.1 Spectral Analysis Models
9
3.1.1 Spectral Analysis Models3.1.1 Spectral Analysis Models
10
3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR
11
3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR
12
3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR
13
3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR
14
3.2 THE BANK-OF-FILTERS 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSORFRONT- END PROCESSOR
15
3.2.1 Types of Filter Bank Used for 3.2.1 Types of Filter Bank Used for Speech RecognitionSpeech Recognition
N
Fb
NQ
QiiN
Ff
si
si
2/
1,
16
Nonuniform Filter BanksNonuniform Filter Banks
1
1
11
1
1
,2
)(
2,i
j
iji
ii
bbbff
Qibb
cb
17
Nonuniform Filter BanksNonuniform Filter Banks
HzbHzfFilter
HzbHzfFilter
HzbHzfFilter
HzbHzfFilter
1600,2400:4
800,1200:3
400,600:2
200,300:1
44
33
22
11
18
3.2.1 Types of Filter Bank Used for 3.2.1 Types of Filter Bank Used for Speech RecognitionSpeech Recognition
19
3.2.1 Types of Filter Bank Used for 3.2.1 Types of Filter Bank Used for Speech RecognitionSpeech Recognition
20
3.2.2 Implementations of Filter Banks3.2.2 Implementations of Filter Banks
Instead of direct convolution, which is Instead of direct convolution, which is computationally expensive, we assume computationally expensive, we assume each bandpass filter impulse response to each bandpass filter impulse response to be represented by:be represented by:
Where w(n) is a fixed lowpass filterWhere w(n) is a fixed lowpass filter
nji
ienwnh )()(
21
3.2.2 Implementations of Filter Banks3.2.2 Implementations of Filter Banks
22
3.2.2.1 Frequency Domain Interpretation of the Short-3.2.2.1 Frequency Domain Interpretation of the Short-
Time Fourier TransformTime Fourier Transform
23
3.2.2.1 Frequency Domain 3.2.2.1 Frequency Domain Interpretation of the Short-Time Interpretation of the Short-Time
Fourier TransformFourier Transform
24
3.2.2.1 Frequency Domain 3.2.2.1 Frequency Domain Interpretation of the Short-Time Interpretation of the Short-Time
Fourier TransformFourier Transform
25
3.2.2.1 Frequency Domain 3.2.2.1 Frequency Domain Interpretation of the Short-Time Interpretation of the Short-Time
Fourier TransformFourier Transform
26
Linear Filter Interpretation of the Linear Filter Interpretation of the STFTSTFT
)(~
ns)(ns)(nw
ije
)( 1jn eS
27
3.2.2.4 FFT Implementation of a 3.2.2.4 FFT Implementation of a Uniform Filter BankUniform Filter Bank
28
Direct implementation of an arbitrary Direct implementation of an arbitrary filter bankfilter bank
)(ns
)(1 nh
)(nX Q
)(2 nh
)(nhQ
)(1 nX
)(2 nX
29
3.2.2.5 Nonuniform FIR Filter Bank 3.2.2.5 Nonuniform FIR Filter Bank ImplementationsImplementations
30
3.2.2.7 Tree Structure Realizations of 3.2.2.7 Tree Structure Realizations of Nonuniform Filter BanksNonuniform Filter Banks
31
3.2.4 Practical Examples of Speech-3.2.4 Practical Examples of Speech-Recognition Filter Banks Recognition Filter Banks
32
3.2.4 Practical Examples of Speech-3.2.4 Practical Examples of Speech-Recognition Filter BanksRecognition Filter Banks
33
3.2.4 Practical Examples of Speech-3.2.4 Practical Examples of Speech-Recognition Filter BanksRecognition Filter Banks
34
3.2.4 Practical Examples of Speech-3.2.4 Practical Examples of Speech-Recognition Filter BanksRecognition Filter Banks
35
3.2.5 Generalizations of Filter-Bank Analyzer 3.2.5 Generalizations of Filter-Bank Analyzer
36
3.2.5 Generalizations of Filter-Bank Analyzer 3.2.5 Generalizations of Filter-Bank Analyzer
37
3.2.5 Generalizations of Filter-Bank Analyzer 3.2.5 Generalizations of Filter-Bank Analyzer
38
3.2.5 Generalizations of Filter-Bank Analyzer 3.2.5 Generalizations of Filter-Bank Analyzer
39
40
41
42
43
44
45
46
روش مل-کپسترومروش مل-کپستروم
Mel-scaling بندی فریم
IDCT
|FFT|2
Low-order coefficientsDifferentiator
Cepstra
Delta & Delta Delta Cepstra
زمانی سیگنال
Logarithm
47
Time-Frequency analysisTime-Frequency analysis
Short-term Fourier TransformShort-term Fourier Transform Standard way of frequency analysis: decompose the Standard way of frequency analysis: decompose the
incoming signal into the constituent frequency incoming signal into the constituent frequency components.components.
W(n): windowing functionW(n): windowing function N: frame lengthN: frame length p: step sizep: step size
48
Critical band integrationCritical band integration
Related to masking phenomenon: the Related to masking phenomenon: the threshold of a sinusoid is elevated when threshold of a sinusoid is elevated when its frequency is close to the center its frequency is close to the center frequency of a narrow-band noisefrequency of a narrow-band noise
Frequency components within a critical Frequency components within a critical band are not resolved. Auditory system band are not resolved. Auditory system interprets the signals within a critical interprets the signals within a critical band as a wholeband as a whole
49
Bark scaleBark scale
50
Feature Feature orthogonalizationorthogonalization
Spectral values in adjacent Spectral values in adjacent frequency channels are highly frequency channels are highly correlatedcorrelated
The correlation results in a The correlation results in a Gaussian model with lots of Gaussian model with lots of parameters: have to estimate all the parameters: have to estimate all the elements of the covariance matrixelements of the covariance matrix
Decorrelation is useful to improve Decorrelation is useful to improve the parameter estimation.the parameter estimation.
51
CepstrumCepstrum Computed as the inverse Fourier transform Computed as the inverse Fourier transform
of the log magnitude of the Fourier of the log magnitude of the Fourier transform of the signaltransform of the signal
The log magnitude is real and symmetric -> The log magnitude is real and symmetric -> the transform is equivalent to the Discrete the transform is equivalent to the Discrete Cosine Transform.Cosine Transform.
Approximately decorrelatedApproximately decorrelated
52
Principal Component Principal Component AnalysisAnalysis
Find an orthogonal basis such that the Find an orthogonal basis such that the reconstruction error over the training set reconstruction error over the training set is minimizedis minimized
This turns out to be equivalent to This turns out to be equivalent to diagonalize the sample autocovariance diagonalize the sample autocovariance matrixmatrix
Complete decorrelationComplete decorrelation Computes the principal dimensions of Computes the principal dimensions of
variability, but not necessarily provide variability, but not necessarily provide the optimal discrimination among classesthe optimal discrimination among classes
53
Principal Component Analysis Principal Component Analysis ((PCAPCA))
MathematicalMathematical procedure that transforms a number of procedure that transforms a number of (possibly) correlated variables into a (smaller) number of (possibly) correlated variables into a (smaller) number of uncorrelateduncorrelated variables called variables called principal components (PC)principal components (PC)
Find an orthogonal basis such that the reconstruction error Find an orthogonal basis such that the reconstruction error over the training set is minimizedover the training set is minimized
This turns out to be equivalent to diagonalize the sample This turns out to be equivalent to diagonalize the sample autocovariance matrixautocovariance matrix
Complete decorrelationComplete decorrelation
Computes the principal dimensions of variability, but not Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classesnecessarily provide the optimal discrimination among classes
54
PCA PCA (Cont.)(Cont.)
AlgorithmAlgorithm
xFy
Apply Transform
Output =
(R- dim vectors)
MRy *
Input=
(N-dim vectors)
MNx * Covariance matrix
1
1
M
xxxxCov
M
i
T
ii
iN
i
EigVec
EigValNi ...1
Transform matrix
NEigVec
EigVec
EigVec
F.
2
1
...21 EigValEigVal
Eigen values
Eigen vectors
55
PCA PCA (Cont.)(Cont.) PCA in speech recognition systemsPCA in speech recognition systems
56
Linear discriminant Linear discriminant AnalysisAnalysis
Find an orthogonal basis such that the Find an orthogonal basis such that the ratio of the between-class variance ratio of the between-class variance and within-class variance is and within-class variance is maximizedmaximized
This also turns to be a general This also turns to be a general eigenvalue-eigenvector problemeigenvalue-eigenvector problem
Complete decorrelationComplete decorrelation Provide the optimal linear separability Provide the optimal linear separability
under quite restrict assumptionunder quite restrict assumption
57
PCA vs. LDAPCA vs. LDA
58
Spectral smoothingSpectral smoothing
Formant information is crucial for Formant information is crucial for recognitionrecognition
Enhance and preserve the formant Enhance and preserve the formant information:information: Truncating the number of cepstral Truncating the number of cepstral
coefficientscoefficients Linear prediction: peak-hugging Linear prediction: peak-hugging
propertyproperty
59
Temporal processingTemporal processing
To capture the temporal features of To capture the temporal features of the spectral envelop; to provide the the spectral envelop; to provide the robustness:robustness: Delta Feature: first and second order Delta Feature: first and second order
differences; regressiondifferences; regression Cepstral Mean Subtraction:Cepstral Mean Subtraction:
For normalizing for channel effects and For normalizing for channel effects and adjusting for spectral slopeadjusting for spectral slope
60
RASTA (RelAtive SpecTral RASTA (RelAtive SpecTral Analysis)Analysis)
Filtering of the temporal trajectories of Filtering of the temporal trajectories of some function of each of the spectral some function of each of the spectral values; to provide more reliable values; to provide more reliable spectral featuresspectral features
This is usually a bandpass filter, This is usually a bandpass filter, maintaining the linguistically important maintaining the linguistically important spectral envelop modulation (1-16Hz)spectral envelop modulation (1-16Hz)
61
62
RASTA-PLPRASTA-PLP
63
64