boosting training scheme for acoustic modeling rong zhang and alexander i. rudnicky language...
TRANSCRIPT
Boosting Training Scheme for Acoustic Modeling
Rong Zhang and Alexander I. RudnickyLanguage Technologies Institute,
School of Computer Science
Carnegie Mellon University
Reference Papers
• [ICASSP 03] Improving The Performance of An LVCSR System Through Ensembles of Acoustic Models
• [EUROSPEECH 03] Comparative Study of Boosting and Non-Boosting Training for Constructing Ensembles of Acoustic Models
• [ICSLP 04] A Frame Level Boosting Training Scheme for Acoustic Modeling
• [ICSLP 04] Optimizing Boosting with Discriminative Criteria
• [ICSLP 04] Apply N-Best List Re-Ranking to Acoustic Model Combinations of Boosting Training
• [EUROSPEECH 05] Investigations on Ensemble Based Semi-Supervised Acoustic Model Training
• [ICSLP 06] Investigations of Issues for Using Multiple Acoustic Models to Improve Continuous Speech Recognition
Improving The Performance of An LVCSR System Through Ensembles of Acoustic Models
ICASSP 2003
Rong Zhang and Alexander I. RudnickyLanguage Technologies Institute,
School of Computer Science
Carnegie Mellon University
Introduction
• An ensemble of classifiers is a collection of single classifiers – which is used to select a hypothesis based on the majority vote
from its components
• Bagging and Boosting are the two most successful algorithms for construction ensembles
• This paper describes the work on applying ensembles of acoustic models to the problem of LVCSR
plurality voting
majority voting
weighted voting
Bagging vs. Boosting
• Bagging– In each round, bagging randomly selects a number of examples
from the original training set, and produces a new single classifier based on the selected subset
– The final classifier is built by choosing the hypothesis best agreed on by single classifiers
• Boosting– In boosting, the single classifiers are iteratively trained in a
fashion such that hard-to-classify examples are given increasing emphasis
– A parameter that measures the classifier’s importance is determined in respect of its classification accuracy
– The final hypothesis is the weighted majority vote from the single classifiers
Algorithms
• The first algorithm is based on the intuition that an incorrectly recognized utterance should receive more attention in training
• If the weight of an utterance is 2.6, we first add two copies of the utterance to the new training set, and then add its third copy with probability 0.6
Algorithms
• The exponential increase in the size of training set is a severe problem for algorithm 1
• Algorithm 2 is proposed to address this problem
Algorithms
• In algorithm 1 and 2, there is no concern to measure how important a model is relative to others– Good model should play
more important role than bad one
x
x1 1
expi
T
tittecL
otherwise
ife
ti
ti
it
0
x
x
x
x
xx
1
1
1
1
1
exp
expexp
iiTT
Ti
iiTT
T
titt
ecw
ececL
Experiments
• Corpus : CMU Communicator system
• Experimental results :
A Frame Level Boosting Training Scheme for Acoustic Modeling
ICSLP 2004
Rong Zhang and Alexander I. RudnickyLanguage Technologies Institute,
School of Computer Science
Carnegie Mellon University
Introduction
• In the current Boosting algorithm, utterance is the basic unit used for acoustic model training
• Our analysis shows that there are two notable weaknesses in this setting..– First, the objective function of current Boosting algorithm is designed
to minimize utterance error instead of word error
– Second, in the current algorithm, an utterance is treated as a unity for resample
• This paper proposes a frame level Boosting training scheme for acoustic modeling to address these two problems
is the pseudo loss for frame t, which describes the
degree of confusion of this frame for recognition
Frame Level Boosting Training Scheme
• The metrics that we will use in Boosting training is the frame level conditional probability -----(word level)
• Objective function :
x|taP
NBest
ttNBest
h
ahtt
hP
hP
P
aPaP
x
x
x
x
x
xx
,
,,
| label,
N
i
T
t aaiii
i
t
aPaPL1 1
||exp xx
taa
iii aPaP xx ||exp
Frame Level Boosting Training Scheme
• Training Scheme:– How to resample the frame
level training data?• to duplicate for
times and creates a new utterance for acoustic model training
ti,x ti ,x
Experiments
• Corpus : CMU Communicator system
• Experimental results :
Discussions
• Some speculations on this outcome : – 1. The mismatch between training criterion and target in principle still
exists in the new scheme
– 2. The frame based resample method is exclusively depends on the weight calculated in the training process
– 3. Forced-alignment is used to determine the correct word for each frame
– 4. A new training scheme considering both utterance and frame level recognition errors may be more suitable for accurate modeling