speech lab, ece, state university of new york at binghamton classification accuracies of neural...
TRANSCRIPT
Speech Lab, ECE, State University of New York at Binghamton
Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of training data using NLPCA2 (10 features) and original features (10 and 39 features)
Classification accuracies of original features and NLPCA2 reduced features with 2% (left) and 50% (right) of the training data
Simulation of NLPCA1
Plot of input and output for semi-random 2-D data. The output data is reconstructed data using an NLPCA1 trained neural network with 1 hidden node
An example with 3D data. Input and output plots of 3-D Gaussian data before and after using neural network with 2 hidden nodes
Dimensionality Reduction of Speech Features Using Nonlinear Principal Components Analysis Stephen A. Zahorian, Tara Singh*, Hongbing Hu
Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY, USA* Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA, USA
Introduction Difficulties in automatic speech recognition
Large dimensionality of acoustic feature spaces Significant load in feature training (“Curse of
dimensionality”)
Linear dimensionality reduction methods Principal Components Analysis (PCA) Linear Discriminant Analysis (LDA)
Drawback of linear methods Can result in poor data representations
The straight line fit to the data obtained by linear PCA does not accurately represent the original distribution of the data
NLPCA Approaches Nonlinear Principal Components Analysis (NLPCA)
Nonlinear transformation is applied to obtain a transformed version of the data for PCA
Nonlinear transformation
Two approaches (NLPCA1 and NLPCA2) were used for training the neural network
)(xx (x): Transformed feature of the data point x for machine learning
RM: M dimension feature space MD RR :(.)(.): A neural network mapping to obtain data more suitable for linear transformations
NLPCA Approaches
NLPCA1 The neural network is trained as an identity map
– Minimize mean square error using targets that are the same as the inputs– Training with regularization is often needed to “guide” the network to a better
minimum in error
NLPCA2 The neural network is trained as classifier
– The network is trained to maximize discrimination
Input Data
Bottleneck neural network
Dimensionality Reduced Data
Experimental Evaluation Database
Transformation methods compared Original features, LDA, PCA, NLPCA1 and NLPCA2
Classifiers Neural network and MXL (maximum likelihood Mahalanobis distance based Gaussian
assumption classifier)
Experiment 1 The same training data were used to train the transformations and the classifiers The number of features varied from 1 to 39 Variable percentages of training data (1%, 2%, 5%, 10%, 25%, 50% and 100%)
were used
Experiment 1 Results Classification accuracies of neural network (left) and MXL (right) classifiers with various types of features
using all available training data
(Figures on next column)
NTIMIT database
Target (vowels) /ah/, /ee/, /ue/, /ae/, /ur/, /ih/, /eh/, /aw/, /uh/, /oo/
Training data 31,300 tokens
Testing data 11,625 tokens
Feature 39 DCTC-DCS
Conclusions The nonlinear technique minimizing mean square
reconstruction error (NLPCA1) can be very effective for representing data which lies in curved subspaces, but does not appear to offer any advantages over linear dimensionality reduction methods for a speech classification task
The nonlinear technique based on minimizing classification error (NLPCA2) is quite effective for accurate classification in low dimensionality spaces
The reduced features appear to be well modeled as Gaussian features with a common covariance matrix
Nonlinear PCA (NLPCA2) is much more effective than normal PCA for reducing dimensionality; however, with a “good” classification method, neither dimensionality reduction method improves classification accuracy.
Acknowledgement This work was partially supported by JWFC 900
20
30
40
50
60
70
80
1 4 7 10 13 16 19 22 25 28 31 34 37
Number of Features
Cla
ssfi
cati
on
Acc
ura
cy [
%]
OrgLDAPCANLPCA1NLPCA2
20
30
40
50
60
70
80
1 4 7 10 13 16 19 22 25 28 31 34 37
Number of Features
Cla
ssfi
cati
on
Acc
ura
cy [
%]
OrgLDAPCANLPCA1NLPCA2
For both cases, highest accuracy was obtained with NLPCA2, especially with a small numbers of features.
NLPCA2 shows better performance than 10-D original features using 10% of training data or more, and has similar performance with 39-D original features
20
30
40
50
60
70
80
1% 2% 5% 10% 25% 50% 100%Percentage of Training Data
Cla
ssfi
cati
on
Acc
ura
cy [
%]
Org (10-D)Org (39-D)NLPCA2
20
30
40
50
60
70
80
1% 2% 5% 10% 25% 50% 100%
Percentage of Training Data
Cla
ssfi
cati
on
Acc
ura
cy [
%]
Org (10-D)Org (39-D)NLPCA2
20
30
40
50
60
70
Number of Features
Cla
ssfi
cati
on
Acc
ura
cy [
%]
Org(Neu)
Org(MXL)
NLPCA2(Neu)
NLPCA2(MXL)
20
30
40
50
60
70
80
Number of Features
Cla
ssfi
cati
on
Acc
ura
cy [
%]
Org(Neu)
Org(MXL)
NLPCA2(Neu)
NLPCA2(MXL)
Using 50% of the training data, NLPCA2 performs substantially better than original features, at least for 12 or fewer features
Experiment 2 50% of the training data was used for training transformations
and a variable percentage, ranging from 1% to 100% of the other half of the training data, was used for training classifiers
Experiment 2 Results Classification accuracies of neural network (left) and MXL (right) classifiers
using 10% of classifier training data for training classifier
Classification accuracies of neural network (left) and MXL (right) classifiers with various percentages of classifier training data using 4 features
20
30
40
50
60
70
1 2 4 8 16 32Number of Features
Cla
ssfi
cati
on
Acc
ura
cy [
%]
OrgLDAPCANLPCA1NLPCA2
20
30
40
50
60
70
1 2 4 8 16 32Number of Features
Cla
ssfi
cati
on
Acc
ura
cy [
%]
OrgLDAPCANLPCA1NLPCA2
For both the neural network and MXL classifiers, NLPCA2 clearly performs much better than the other transformations or the original features.
20
30
40
50
60
70
1% 2% 5% 10% 25% 50% 100%Percentage of Training Data
Cla
ssfi
cati
on
Acc
ura
cy [
%]
Org LDAPCA NLPCA1NLPCA2
20
30
40
50
60
70
1% 2% 5% 10% 25% 50% 100%Percentage of Training Data
Cla
ssfi
cati
on
Acc
ura
cy [
%]
Org LDAPCA NLPCA1NLPCA2
NLPCA2 yields the best performance, with about 68% accuracy for both cases. Similar trends were also observed for 1, 2, 8, 16, and 32 features.