acoustic modeling using deep belief networks
DESCRIPTION
This is my report for speech recognition. I hope it is of help for you.TRANSCRIPT
CCNT, ZJU
Acoustic Modeling using Deep Belief Networks
Yueshen [email protected]
Zhejiang University
1
CCNT, ZJU
Abstract
Problem Achieving better phone recognition
Method Deep neural networks which contain many layers of features and
numbers of parameters Rather than Gaussian Mixture Models
Step Step 1: Pre-trained as a multi-layer generative models without
making use of any discriminative information spectral feature vector
Step2: Using backpropagation to make those features better at predicting a probability distribution
2
CCNT, ZJU
Introduction
Typical Automatic Speech Recognition System Model the sequential structure of speech signals: Hidden Markov
Model Spectral representation of the sound wave: HMM state + mixture
of Gaussians+Mel-frequency Cepstral Coefficients(梅尔倒频谱系数 )
New research direction Deeper acoustic models containing many layers of features Feedforward neural networks
Advantages The estimation of posterior probabilities of HMM does not require
detailed assumptions about data distribution Suitable for discrete and continuous features
3
CCNT, ZJU
Introduction
Comparison among MFCCs, GMM MFCCs
Partially overcome the very strong conditional independence assumption of HMM
GMM Easy to fit to data using the EM algorithm Inefficient at modeling high-dimensional data
Previous work of neural network Using backpropagation algorithms to train neural networks
discriminatively Generative modeling vs discriminative training Efficient to handle those unlabeled speech
4
CCNT, ZJU
Introduction
Main novelty of this paper Achieve consistently better phone recognition performance by pre-
training a multi-layer neural network One layer at a time, as a generative model
General Description The generative pre-training creates many layers of feature detector Using backpropagation algorithm to adjust the features in every
layer to make features more useful for discrimination
5
CCNT, ZJU
Learning a multilayer generative model
Two vital assumptions of this paper The discrimination is more directly related to the underlying causes
of data than to the individual elements of data itself A good feature vector representation of the underlying causes can
be recovered from the input data by modeling its higher order statistical structure
Directed view Fit a multilayer generative model having infinitely layers of latent
variablesUndirected view
Fitting a relatively simple type of learning module that only has one layer of latent variables
6
CCNT, ZJU
Learning a multilayer generative model
Undirected view Restricted Boltzmann Machine(RBM)
Bipartite graph in which visible units are connected to hidden units No visible-visible or hidden-hidden connections
Visible units vs. Hidden units Visible units: representing observation Hidden units: representing features using undirected weighted
connections
RBM in this paper Binary RBM
Both hidden and visible units are binary and stochastic Gaussian-Bernouli RBM
Hidden units are binary but visible units are linear with Gaussian noise
7
CCNT, ZJU
Learning a multilayer generative model
Binary RBM The weights on the connections and biases of individual units
define a probability distribution over the joint states of visible and hidden units via an energy function
The conditional distribution p(h| v, )
The conditional distribution p(v| h, )
8
CCNT, ZJU
Learning a multilayer generative model
Learning DBN Updating each weight wij using the difference between two
measured, pairwise correlations:
Directed view A sigmoid belief net consisting of multiple layers of binary
stochastic units
Hidden layers Binary features
Visible layers Binary data vectors
9
CCNT, ZJU
Learning a multilayer generative model
Generating data from the model Binary states are chosen for the top layer of hidden units Adjusting the weights on the top-down connections
Performing gradient ascent in the expected log probabilityChallenge
Getting unbiased samples from exponentially large posterior is intractable
Lack of conditional independenceLearning with tied weights (1/2)
Learning Context: a sigmoid belief net with an infinite number of layers and tied symmetric weights between layers
The posterior can be computed by simply multiplying visible vectors by transposed weight matrix
10
CCNT, ZJU
Learning a multilayer generative model
Learning is a little more difficult Because every copy of tied weight matrix gets different derivatives
• An infinite sigmoid
belief net with
weights
• Inference is easy since once posteriors have been sampled for the first hidden layer, the same process can be used for the next hidden layer
11
CCNT, ZJU
Learning a multilayer generative model
Unbiased estimate of the sum of derivatives h(2) can be viewed as a noisy but unbiased estimate of probabilities
for visible units predicted by h(1)
h(3) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(2)
Unbiased estimate of the sum of derivatives h(2) can be viewed as a noisy but unbiased estimate of probabilities
for visible units predicted by h(1)
h(3) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(2)
12
CCNT, ZJU
Learning a multilayer generative model
Learning different weights in each layer Making the generative model more powerful by allowing different
weights in different layers Step1: Learn with all of weight matrices tied together Step2: Untie the bottom weight matrix form the other matrices Step3: Obtain the frozen matrix W(1)
Step4: Keeping all remaining matrices tied together, and continuing to learn higher matrices
This involves first inferring h(1) from v by using W(1) and then inferring h(2) , h(3) , and h(4) in a similar bottom up manner using W or WT
13
CCNT, ZJU
Learning a multilayer generative model
Deep belief net(DBN) Having learned K layers of features, we get a directed generative
model called ’Deep Belief Net’ DBN has K different weight matrices between lower layers and an
infinite number of higher layers This paper models the whole system as a feedforward,
deterministic neural network This network is then discriminatively fine tuned by using
backpropagation to maximize the log probability of correct HMM states
14
CCNT, ZJU
Using Deep Belief Nets for Phone Recognition
Visible unit Using a context window of n successive frames of speech
coefficients Generate phone sequences The resulting feedforward neural network is discriminatively trained
to output a probability distribution over all possible labels of central frames
Then the pdfs over all possible labels for each frame is fed into a standard Viterbi decoder
15
CCNT, ZJU
Conclusions
Initiative This is the first application to acoustic modeling of neural networks
in which multiple layers of features are generatively pre-trained This approach can be extended to explicitly model the covariance
structure of input features It can be used to jointly train acoustic and language models It can be applied to a large vocabulary task replace of GMM
16
CCNT, ZJU
Thank you
17