alpha nets - a recurrent neural network architecture with a hidden markov model (hmm) interpretation

Upload: viniciusleodido

Post on 04-Jun-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    1/10

    S p e e c h C o m m u n i c a t i o n 9 ( 1 9 9 0 ) 8 3 - 9 2 8 3N o r t h - H o l l a n d

    A L P H A - N E T S : A R E C U R R E N T ' N E U R A L ' N E T W O R K A R C H I T E C T U R E W I T HA H I D D E N M A R K O V M O D E L I N T E R P R E T A T I O NJ o h n S . B R I D L ESpeech Research Unit Royal Signals and Radar Establishment St. Andrews Road Gt. Malvern WR14 3PS UKR e c e i v e d 3 0 O c t o b e r 1 9 8 9

    A b s t r a c t . A h i d d e n M a r k o v m o d e l i s o l a t e d w o r d r e c o g n i s e r u s in g fu l l l i k e l i h o o d s c o r i n g f o r e a c h w o r d m o d e l c a n b e t r e a t e da s a r e c u r r e n t ' n e u r a l ' n e t w o r k . T h e u n i t s i n t h e r e c u r r e n t l o o p a r e l in e a r , b u t t h e o b s e r v a t i o n s e n t e r t h e l o o p v i a am u l t i p l i c a t io n . T r a i n i n g c a n u s e b a c k - p r o p a g a t i o n o f p a r t i a l d e r i v a ti v e s t o h i l l- c l im b o n a m e a s u r e o f d i s c r i m i n a b i l it yb e t w e e n w o r d s. T h e b a c k - p r o p a g a t i o n h a s e x a ct ly t h e s a m e f o r m a s t h e b a c k w a r d p a s s o f t h e B a u m - W e l c h ( E M ) a l g o r i th mf o r m a x i m u m - l i k e l i h o o d H M M t r a i n i n g . T h e u s e o f a p a r t i c u l a r e r r o r c r i t e r i o n b a s e d o n r e l a t iv e e n t r o p y ( e q u i v a l e n t t o t h es o - c a ll e d M u t u a l I n f o r m a t i o n c r i t e r i o n w h i c h h a s b e e n u s e d f o r d i s c r i m i n a t i v e t r a i n i n g o f H M M s ) c a n h a v e d e r i v a t i v e s w h i c ha r e i n t e r e s ti n g l y r e l a t e d t o t h e B a u m - W e l c h r e - e s t i m a t e s a n d t o C o r r e c t i v e T r a i n i n g .Z u s m m m e a f a s s u g . E i n S p r a c h e r k e n n u n g s s y s te m f i ir i so l ie r te W ~ r t e r a u f d e r B a s i s e i n es H i d d e n - M a r k o v - M o d e l l s m i tv o l ls t ~ in d i g er L i k e l i h o o d b e w e r t u n g f o r j e d e s W o r t m o d e l l k a n n a l s e i n r e k u r r e n t e s ' n e u r o n a l e s ' N e t z b e t r a c h t e t w e r d e n . D i eZ e l l e n i n d e r R i i c k k o p p l u n g s s c h l e i f e d ie s e s N e t z e s b e s i tz e n e i n e l i n e a r e K e n n l i n i e , a b e r d i e B e o b a c h t u n g s w e r t e a m E i n g a n gg e l a n g e n m i t t e l s e i n e r M u l t i p l i k a t i o n i n d i e S c h le i fe . I n d e r L e r n p h a s e k a n n e i n B a c k - P r o p a g a t i o n - A l g o r i t h m u s ft i r p a r t i e l leA b l e i t u n g e n d a z u v e r w e n d e t w e r d e n , e i n M a s s d e r U n t e r s c h e i d b a r k e i t z w is c h en e i n z e ln e n W r r t e r n z u o p t i m i e re n . D i e s e rB a c k - P r o p a g a t i o n - A l g o r i th m u s h a t g e n a u d i e g l e i c h e F o r m w i e d e r r i ic k l~ u fi g e Z w e i g d e s B a u m - W e l c h - A l g o r i t h m u s f o rd i e L e r n p h a s e v o n H i d d e n - M a r k o v - M o d e l l e n n a c h d e m M a x i m u m - L i k e l i h oo d - P r in z i p . W i r d e i n s p e z ie l le s F e h l e r k ri t e r iu mv e r w e n d e t , d a s a u f d e r r e l a ti v e n E n t r o p i e b a s i e r t ( d ie s i s t e q u i v a l e n t z u d e m s o g e n a n n t e n T r a n s i n f o r m a t i o n s k r i t e r iu m , d a sf ti r d i e O p t i m i e r u n g d e r D i s k r i m i n a ti o n s l e is t u n g v o n H i d d e n - M a r k o v - M o d e l l e n v e r w e n d e t w i rd ) , s o k a n n d i e s A b l e i t u n g e nb e s i tz e n , d i e a u f i n t e r e s s a n te W e i s e m i t d e n E r g e b n i s s e n d e s B a u m - W e l c h - A l g o r i t h m u s i n d e r k o r r e k t i v e n L e r n p h a s ev e r w a n d t s i n d .R r s u m r . U n m o d u l e d e M a r k o v c a c h 6 ( H M M ) ~ m a x i m u m d e v r a i s em b l a n c e p o u r l a r e c o n n a is s a n c e d e m o t s i s o l rs p e u t6 t r e t r a it 6 c o m m e u n r r s e a u n e u r o n a l r r c u r r e n t . L e s u n i t r s d u n s l a b o u c l e d e r r c u r r e n c e s o n t l i n r a ir e s m a i s l e s o b s e r v a t i o n se n t r e n t d u n s l a b o u c l e v ia u n e m u l t i p li c a t i o n . L ' a p p r e n t i s s a g e p e u t u t i l i s e r l a r r t r o - p r o p a g a t i o n d e s d r r i v r e s p a r t i e l l e s p o u rm a x i m i s e r u n e m e s u r e d e l a d i s c r i m i n a t i o n e n t r e l e s m o t s . L a r r t r o - p r o p a g a t i o n a e x a c t e m e n t la m ~ m e f o r m e q u e l a' B a c k w a r d P a s s ' d e l ' a l g o r it h m e d e B a u m - W e l c h ( E M ) p o u r l 'a p p r e n t is s a g e d e s H M M ~ m a x i m u m d e v r a is e m b l a n c e . I le s t i n t r r e s s a n t d ' o b s e r v e r q u e l ' u t il i s a t i o n d ' u n c r i t~ r e p a r t i c u l ie r d ' e r r e u r b u s 6 s u r I 'e n t r o p i e r e l a t i v e ( r q u i v a t e n t a u c r i t ~ r ed ' i n f o r m a t i o n m u t u e l l e u t i l i s 6 p o u r l ' a p p r e n t i s s a g e d i s c r i m i n a n t d e s H M M ) p e u t m e n e r ~ d e s d r r i v r e s q u i s o n t l i r e s a u xr r e s t i m a t i o n s s e l o n l ' a l g o r i t h m e d e B a u m - W e l c h e t ~t l ' a p p r e n t i s s a g e c o r re c t i f.

    1 . In tr oduc t ionA l t h o u g h t h e r e h a s b e e n m u c h i n t e r e s t i n s o -

    c a ll e d n e u r a l n e t w o r k s ( N N s ) f o r a u t o m a t i cs p e e c h r e c o g n i t i o n ( A S R ) i t h a s a l w a y s b e e n c l e art h a t w e l a c k a n a d e q u a t e a n d t r a c t a b l e m e t h o df o r d e a l i n g w i t h s e q u e n t i a l s t r u c t u r e . F o r i n -s t a n c e , s e e L i p p m a n n ( 1 9 8 9 ) .S p e e c h p a t t e r n s a r e t o a g o o d a p p r o x i m a t i o n

    o n e t h i n g a f t e r a n o t h e r ( w h e t h e r t h e t h i n g sa r e a c o u s t i c - p h o n e t i c s e g m e n t s , s y l l a b l e s , o rw o r d s ) . H o w e v e r , s p e e c h p a t t e r n s c a n b e s u f f i -c i e n t l y a m b i g u o u s l o c a l l y t h a t i t i s n o t a d e q u a t et o m a k e h a r d d e c i s i o n s l o c a l l y a n d t h e n p r o c e s ss e q u e n c e s o f s y m b o l s .T h e T i m e - D e l a y N e u r a l N e t w o r k ( L a n g a n dH i n t o n , 1 9 8 8 ; W a i b e l e t a l . , 1 9 8 8 b ) a p p e a r s t oh a v e a m e t h o d f o r h a n d l i n g s e q u e n c e s , a n d i t h a s

    B r i t i s h C r o w n C o p y r i g h t 1 9 9 0

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    2/10

    84 J . S . Br i d l e / A l ph a ne t sbeen applied successfully to ASR at the segmentallevel (Waibel et al., 1988b) and to whole isolatedwords (Bottou et al., 1989). However, althoughthe TDNN has powerful methods for dealing withlocal dynamic properties, it cannot deal with se-quences explicitly.The most successful approach to ASR at pres-ent, which is based on stochastic models, doeshave powerful, principled and tractable methodsof dealing with sequential structure. In speechrecognition using stochastic models (e.g. hiddenMarkov models HMMs) the speech pattern istreated as if it were the output of a stochasticsystem with an internal state whose evolution isgoverned by probabilistic laws (the state transi-tion matrix). The observed pattern is supposed todepend on the current internal state throughanother probabilistic relationship (the output dis-tribution). This model of the generator is used todesign a recognition algorithm which processestheories about sequences of internal states. Foran introduction to the application HMMs tospeech recognition see chapter 8 of Holmes(1988). There are two al ternat ive recognition al-gorithms, which we shall call DP and Alpha. TheDP (dynamic programming or Viterbi) algorithmcomputes the best (most likely) sequence ofstates, and the Alpha (or forward pass) algorithmcomputes sums over alternative state sequences.For whole-word recognition, the Alpha methodis theoretically the more appropriate: each wordchecker computes the likelihood of the datagiven its own word model (summed over all statesequences) and we choose the model with thehighest likelihood.

    If stochastic model approaches have so muchto offer, why is there so much interest in (artifi-cial) neural networks? I hope we can dismiss thefatuous argument the human brain is the bestspeech recognition System we know, the humanbrain is a neural network, therefore we shoulduse a system called a neural network . The mainattractions of NN approaches to ASR seem to bethe inherently discriminative nature of the train-ing methods, the possibility of more general non-linear structures than tractable stochastic modelbased methods offer , and a more intuitively acces-sible formalism.

    One important avenue for exploration of the

    relative merits of networks and HMMs is experi-mental: if the proponents of networks can dem-onstrate better performance than current HMMson even a specialised speech recognition task thenthey must be taken seriously (Moore and Peeling,1989; Waibel et al., 1988a; Bedworth et al.,1989). There are many differences between say astandard HMM method and a simple applicationof feedforward networks to isolated word dis-crimination. For instance, both TDNNs andwhole-word feedforward networks can be sensi-tive to properties defined on long intervals (com-pared with 10ms), whereas standard HMMmethods make an assumption of independence ofthe data in successive time slices (e.g. 10 ms), ex-cept for the dependence via the sequence of hid-den states. Again, HMMs are normally trainedusing a within-class method (modelling separatelythe distribution of acoustic patterns correspond-ing to each word) whereas neural networks aretrained to produce outputs which discriminate be-tween the classes. It is true tha t HMM techniquescan be modified so that they have some of thedesirable properties of NNs in both these respects(e.g. Bahl (1987)), but there remains a large gapbetween the two approaches.

    We also need theoretical analysis: if we canshow that one class of method includes the other,for instance, or that both occupy limited regionsin an otherwise unexplored continuum, the ex-perimental issue becomes the effect on perfor-mance of particular properties of the methodsused. For instance, Bourlard and Wellekens(1989a, b) have shown that a network with feed-back of the appropriate form can be regarded ascomputing discriminant probabilities related toan HMM.

    The main purpose of this paper is to treat theAlpha computation as a network, and see wherethat leads. We briefly review the variety of recur-rent neural networks (RNNs) that have beenproposed for ASR in Section 2, establish our no-tation, which is based on HMM methods, in Sec-tion 3, and compare the main properties of HMMcalculations and RNNs in Section 4. In the centralsection (Section 5) we see that the Alpha compu-tation for HMM word discrimination can bethought of as performed by a particular form ofrecurrent network (which we call an alpha-net ).

    Speech Communication

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    3/10

    J .S . Br idle / Alpha net s 85

    The parameters of this network are parameters ofthe HMMs (state transition probabilities etc.).When we back-propagate partial derivativesthrough this network (and therefore backwardsin time) we find the same computations as wewould use for the backwards recurrence (Betacomputation) which is part of the Baum-Welchtraining method for HMMs. The initialisation ofthe Be ta pass depends on the discrimination taskand the scoring method used. We then look at theform of the gradient of a relative entropy scorewith respect to the parameters of a set of HMMsof simple type, and find interesting relationshipsto the Baum-Welch re-estimates of the HMMparameters. Finally, the above relationshipssuggest methods of constructing and training net-works which combine properties of HMMs andestablished types of networks, leading to exten-sions which offer more realistic or more powerfulmethods of processing patterns such as speech.

    other ext reme, Watrous (1988, 1989) constructsrecurrent networks with quite complicated struc-ture, based on insight into the kinds of featuredetectors which may be appropriate for speechrecognition.

    Training of recurrent networks can be doneusing the Backpropagation method for partialderivatives, but the propagation is backwardsthrough time. This needs the storage of inter-mediate results on the forward pass, is inconve-nient in computer programs for long inputs, andis rather cont rary to the spirit of neural networks.Kuhn (1987; 1990, this issue) has shown that i t ispossible to avoid backpropagation through time,at the expense of a large extra set of partialderivatives propagated forwards.

    In this paper we concentrate on the backpropa-gation-through-time method, and explore its re-lationship with the Backward Pass used in HMMcalculations.

    2 . R e c u r r e n t n e u r a l n e t w o r k sPerhaps the ideal form of neura l network for

    speech recognition would accept input vectors asthey become available, and have some form ofinternal state which contains all the informationabout past inputs which is needed to deal withcurrent and future inputs. The internal statewould be a function of the current input and theprevious internal state (Prager et al., 1986). Var-ious recurrent networks have been proposedand tried. Usually they have been derived by ad-ding feedback connections to a feedforward net-work, often a network with one layer of hiddenunits. Among arrangements that have been triedare local connection around each input unit orthe hidden units or the outputs, full recurrent con-nections around a single layer, and feedback fromone layer to a previous layer. In most cases it isnot clear what these systems can compute.

    One of the simplest types of recurrent networkis the so-called time-delay neural network(Waibel et al., 1988a). There is one local recur-rent loop around a linear unit for each class, tointegrate the evidence for and against the class,which comes from quite elaborate feed-forwardstructures incorporating shift-registers. At the

    3 . A s u m m a r y o f H M M n o t a t io n a n d a l ge b raWe restrict our attention in this paper to iso-lated word recognition: we assume that the begin-

    ning and end of the word have been identified,and that we have an HMM for each word. Inpractice we would use some form of connectedword algorithm, if only to avoid the otherwiseunsolved problem of word endpoint location. Al-though the word-models are distinct, and forsome results we rely on this, there are goodreasons for using a notation which treats the com-plete set of models together.

    Random variables are in upper case, vectorsare in boldface. W is the word-class of the acousticdata, (Yl .. .. . Y t . . . . Y r ) (the sequence of observa-tions). Y t is the output of the model ( observa,,tion ) at time t. Xt is the state of the generatingprocess at time t. The states of the (model)generating process form a first-order Markovchain controlled by a matrix of_state transitionprobabilities,a ij ~ P X t = j l X t 1 = i ) .Each observation is directly dependent on onlythe state at the same time, via observation likeli-hood functions, one for each state, which mayhave the form

    Vol. 9, No, 1. February 1990

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    4/10

    86 J . S . B r i d l e / A l p h a - n e t sb j ( Y t , O j , ~ )) ~ P Y t = Y t ] X t = j ) ,w h e r e 0 j is a v e c t o r o f p a r a m e t e r s w h i c h d e p e n do n t h e s t a te j a n d ~ i s a v e c t o r o f p a r a m e t e r sw h i c h a r e i n d e p e n d e n t o f t h e s t at e . ( I n S e c t i o n 8w e t r e a t t h e c a se w h e r e t h e s t a t e - d e p e n d e n tp a r a m e t e r s Oj a r e t h e m e a n s o f si m p l e G a u s s i a nd e n s i t i e s . ) W e s h a l l u s e a s h o r t h a n d w h i c h i s v a l i dw h e n w e a r e g i v e n th e a c o u s t i c d a t a f o r e a c h t im et:bi t ~ b j y t , Oj , dp) .W e s h a l l u s e yt t2 t o m e a n ( Y a , Y t l + 1 , . . . , Y a) .

    W e u s e a n o t a t i o n f o r s t a te s w h i c h e m p h a s i s e st h e u n i ty o f th e s e t o f H M M w o r d m o d e l n e t -w o r k s : s t a t e i n d i c e s s u c h a s i a n d j a p p l y t o t h ec o m p l e t e s e t o f st a te s i n a ll th e m o d e l s . ( T h e s y m -b o l S e w i ll b e u s e d t o s t a n d f o r t h e s e t o f a ll t h e s es t a t e s . ) T h e m a i n p r o p e r t y t h a t w e n e e d i n a na l p h a - n e t i s a d i s t i n g u i s h e d e n d s t a t e f o r e a c hw o r d ( w e u s e F w a s th e i n d e x o f th e f i n a l s t a te f o rw o r d w ) b u t f o r c o n s i s te n c y w i t h s e p a r a t e m o d e l sw e a l so i n t r o d u c e o n e i n i t ia l s t a t e I ,, f o r e a c hw o r d w . T h e s y m b o l ~ s t a n d s f o r t h e s e t o f a lls t a r t s t a t e s , ~ f o r t h e s e t o f a l l f i n a l s t a te s . F o rs o m e r e s u lt s ( r e la t in g b a c k p r o p a g a t e d d e r i v a t i v e st o B a u m - W e l c h r e - e s t i m a t e s ) w e i n s i s t t h a t a l lt h e t r a n s i t i o n s a r e w i t h i n w o r d m o d e l s , s o t h es t a te t r a n s it i o n m a t r i x is b l o c k d i a g o n a l ( n o n -z e r o e n t r i e s a r e c o n f i n e d t o s q u a r e b l o c k s o n t h ed i a g o n a l , b e t w e e n r o w s a n d c o l u m n s t h a t a r e el e -m e n t s o f ~ a n d ~ ) ' . E a c h s t a t e t h e n b e l o n g s to aw o r d m o d e l , a n d w e s h a ll u s e w j t o m e a n t h ei n d e x o f t h e w o r d t h a t s t a t e j i s i n , a n d 5P~ f o r t h es e t o f s ta t e s b e l o n g i n g t o w o r d - m o d e l w . T o d e a lw i t h t h e i n i t ia l i s at i o n a n d f i n i sh i n g o f f o f t h e c a l -c u l a t i o n s , w e i n t r o d u c e f i c t i t io u s t i m e s 0 a n d T +1 . S t a t e t r a n s i t i o n p r o b a b i l i t i e s s u c h a s a l ~ , , js p e c i f y t h e i n i t ia l d i s t r i b u t i o n o f s t a te s ( a t t = 1 ) .T o c o l l e c t s c o r e s f o r a l l th e p o s s i b l e s t a t e s o f aw o r d m o d e l a t t = T , w e d e f i n e bj , r + 1 m _ 1 V j~ , a n d ai , Fw, = P ( f r a m e t is e n d o f t h e p a t t e r nI X t = i ) .W e c o m p u t e t h e l ik e l ih o o d o f t h e c o m p l e t ea c o u s t ic p a t t e r n f o r e a c h w o r d - h y p o t h e s i s , P Y= yxrl W = w) v i a t h e j o i n t p r o b a b i l i t i e s f o r p a r t i a ls e q u e n c e s

    t j t ~= P Y t = y { a n d X t = j ] W = w j ) .

    T h i s m a y b e e f f ic i e n tl y c o m p u t e d u s i n g th e f o r -w a r d r e c u r r e n c e :a j t = b j , ~ , n i t _ l a i j (1 )if o r t = 1 , 2 . . . . . T + 1 , w h e r e a j0 = l f o r j e ~ ,a n d 0 o t h e r w i s e . E q u a t i o n ( 1 ) i s c e n t r a l t o t h ep r e s e n t p a p e r .

    T h e o u t p u t o f e a c h w o r d - c h e c k e r is t h e l ik e li -h o o d o f t h e c o m p l e t e o b s e r v a t i o n s e q u e n c e a s t h ec o m p l e t e o u t p u t o f t h e m o d e l , w h i c h is t h e l ik e li -h o o d o f g e n e r a t i n g a ll th e d a t a a n d b e i n g i n th ef i n a l s t a t e a t t i m e T + 1 . T h i s is t h e a f o r t h e f i n a ls t a te o f th e r e l e v a n t w o r d m o d e l ,L w A= p y ~ = Y ~ I W = w - - a F w , T + 1I n a c o n v e n t i o n a l i s o l a te d - w o r d d i s c r im i n a t o r , w ew o u l d d e c i d e o n t h e c la s s w f o r w h i c h L w is l a r g e s t( i f w e a s s u m e e q u a l p r i o r s a n d c o s t s ) .

    W e s h a l l al s o n e e d t h e l i k e l i h o o d o f t h e r e s t o ft h e s e q u e n c e s t a r t i n g f r o m e a c h s t a te a t e a c ht i m e :f l i t ~ p y r + 1 = y rt + 1 ] X t = j a n d W = w j ) ,a n d a s im i l a r ( b u t r e v e r s e - d i r e c t i o n ) r e c u r r e n c ec o m p u t e s :f li , t - l = 2 alibi, flit ,J (2)flj, g + l = 1 f o r j e ~ ,

    0 o t h e r w i s e .N o t e t h a t f o r a n y t w e c a n c o m p u t e t h e l i k e l i h o o do f t h e d a t a g i v e n a p a r t i c u l a r w o r d m o d e l b y as u m m a t i o n a c r os s a l l s t a te s o f t h a t m o d e l :L w = ~ P Y r l = y ~ a n d X t = j l W = w )

    J e ~ w= ~ P Y ~ = y { a n d X t = j l W = w )

    J '~ w P ( Y ~ r + l = y , r + l I S , = j a n d W = w )= Z O l j tf l j rj ~ ( 3 )

    A s a s h o r t h a n d I s h a l l u s e 7 w i t h v a r i o u s s u b -s c r i p t s a n d o v e r b a r s t o d e n o t e p o s t e r i o r p r o b -a b i li t ie s o f s t a t e o c c u p a n c i e s e t c . A d r o p p e d s u b -s c r i p t i m p l i e s a s u m , a n d a n o v e r b a r i m p l i e s an o r m a l i s a ti o n ( w i t h i n w o r d m o d e l ) , f o r e x a m p l e~ i t = a i t f l i t ' ~ ]i = E ~ i t 't

    Speech Communication

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    5/10

    J . S . B r i d l e / A l p h a - n e t s 87

    Table 1Characteristic HMM recogniser Recurrent NNForm of recu rrent calculationConstraints on parameters

    Form of classification calculations

    Output numbers

    Criterion optimised in training

    Form of backward passOptimisation method

    Separate network for each wordTo make the a' s likelihoods of the datagiven some stochastic model, forexample aq >~ 1 , E j a o = 1b , = bjO't)aj t = bi t Y i a i lai , t - 1(Very) small positive numbers (likeli-hoods of the data given the models)Likelihood of data given correct model

    f l i t - 1 = 2 a q f l j t b j tJEM- re-estimate

    Single network with separate outputsNone

    O k t = Y k t for k = 1 .. .. len yO i t = F ( Y i a o d O i . , - a )Indicators between 0 and I, approxima-ting probability of word given dataError (E) b etween outputs and true-class-indicator target vector

    3 E _ 5 E 3 0 j . ,+ d3 0 i , 3 0 j , , d O a q dGradient method

    ~ i j t = ( ~i , t - l a i j b j t f l j t ,~ i t : ~ i t / E ~ j t = Y i t / L w i j e b% ,In this notation the Baum-Welch re-estimate forthe a's is~q = 70/YvThe Baum-Welch re-estimates are guaranteed toincrease the likelihood of the observations.

    4 . C o m p a r i so n o f H M M s a n d r e cu r re n t n e t w o r k sTable 1 summarises important differences be-

    tween HMM calculations and the types of recur-rent neural network that have been used for ASR.

    5 . T h e A l p h a c a l c u la t i o n a s a r e c ur r e n t n e t w o r kWe would like to be able to select a form of

    recurrent network which could perform at leastas well as an HMM system. We have only to thinkof the Alpha calculation (eq. (1)) as a recurrentnetwork. Figure 1 shows a fragment of an al-phanet performing the recurrent (alpha) compu-tations within a word-model. The H MM is one ofthe simplest which is suitable for ASR: the states

    are ordered, and each state can be followed bythe same state or the next one. The network hasa recurrent loop. The weights are the state tran-sition probabilities {air} . The hidden units inthe loop are l inear , and so act as unit delays. In-formation about the observations (the b's) entersthe loop via multiplications. (The part of the net-work which computes the b's can easily imple-ment any of the standard HMM continuous-densi-ty observation likelihood functions, using weigh-ted sums, squared Euclidian distances and expo-nentials.) The recurrent part of the network is inseparate pieces, one for each word, and the out-puts are likelihoods of the data given each model,which are all very small numbers.

    To convert the final-state likelihoods to a formsuch as a classifier network should provide, wenormalise them by dividing by the sum across allwords:P ,, ~ L w / ~ L v = L w / L . (4)Since the Lw's are positive, the Pw's are all be-tween 0 and 1, and can be used as indicators ofword class. In fact, if the L 's are likelihoods ofthe data given the word hypotheses, then (P~} isactually the posterior distribution of word labels(assuming equal priors P W = w ) = l / n ) . Wesimply apply Bayes' rule and substitute P W = w )= l l n a n d L w = P Y = y [ W = w):

    VoL 9, No. 1, February 1990

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    6/10

    88 J.S. Bridle / Alpha-nets

    a j , t = b i t ~ i a i j a i , t - 1

    4

    b j - l , t

    m

    \Ot j, t-1

    b i t b j + l , tFig. 1. Chang es in dom estic R& D intensities due to MN E entry as a function of relative deman d-cost margins (for various levelsof product substitutability).

    P ( W = w I Y = y ) =w h e r eP ( V = y )

    P ( Y = y l = w ) P ( W = w )P ( Y = y )

    = ~ , P = Y l w = v ) P ( W : v )v1 ~ p y=y [W =v) .

    n

    T h e P ' s a r e t h e o u tp u t s o f t h e A l p h a - n e t .W e c a n u s e a n y e r r o r c r i t e r i o n , a n d t h eb a c k p r o p a g a t i o n o f p a r ti a l d e r i v a t iv e s i s q u i t es t r a i g h t f o r w a r d ( a p a r t f r o m t h e f i n a l n o r m a l i s a -t i o n s t ag e ) . W e c o u l d u s e a n y o p t i m i s a t io nm e t h o d b a s e d o n t h e s e d e r i v a t i v e s . W e c a n e v e nr e m o v e t h e s e p a r a t i o n b e t w e e n t h e r e c u r r e n tp a r ts f o r e a c h w o r d . T h e p e r f o r m a n c e o f th e A I -p h a n e t i n s p e e c h r e c o g n i t i o n w il l d e p e n d o n t h e s ec h o i c e s o f c o u r s e .

    G e n e r a l l y , f o r a n y p a r a m e t e r , p o f t h e n e t -w o r k , a n d a n y e r r o r m e a s u r e E ( { P w } , t r u e c l a s s ) ,3 E _ ~ 3 E 5 L ~~q~ . . O L w 9 ~ (5 )I f q~ a p p l i e s a t s e v e r a l t i m e s o r t o s e v e r a l s t a t e st h e n 3Lw/hCp w i ll c o n s i s t o f s u m s o v e r t i m e s a n ds t at e s. F o r p a r a m e t e r s t h a t a p p l y t o o n l y o n e

    w o r d - m o d e l , 5 L w / O ~ s i s z e r o e x c e p t f o r aL~s/3qJ s ,SO3 E _ ~ E ~ Lw ~5cP ~L w, 9cbs

    T h e f o l lo w i n g s e c ti o n s c o n c e n t r a t e o n t h e c a s eo f s e p a r a t e n e t w o r k s a n d t h e u s e o f a p a r t i c u la rt r a i n i n g c r i t e r i o n ( r e l a t i v e e n t r o p y ) w h i c h i s a p -p r o p r i a t e f o r c la s s if i e r n e t w o r k s , a n d h a s i n t e r e s t -i n g r e l a t i o n s h i p s w i t h c r i t e r i a u s e d i n t h eB o l t z m a n n m a c h i n e a n d f o r d i s c ri m i n a t iv e t r ai n -i ng o f H M M s .

    6. Scoring and initial backpropagationF o r t h e r e s t o f t h is p a p e r w e c o n s i d e r a n a l te r -n a t i v e t o t h e u s u a l s q u a r e d e r r o r c r i t e ri o n f o r

    t r a in i n g n e t w o r k s . T h e f o l l o w i n g lo g p r o b a b i l i tys c o r e is b a s e d o n t h e r e l a t iv e e n t r o p y o f th e o u t -p u t s { P w } t a k e n a s a p r o b a b i l i t y d i s t r ib u t i o n , a n dt h e d i s t r i b u t i o n o f t h e t r u e c l a s s l a b e l s , b o t h c o n -d i t i o n e d o n t h e t r a i n i n g p a t t e r n s .J ~ - l o g P c,

    w h e r e t h e c o r r e c t w o r d c la s s i s c . ( 6 )O n e w a y t o a p p r e c i a t e t h i s s c o r e i s a s ( m i n u s t h e

    Speech Communication

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    7/10

    J . S . B r i d l e / A l p h a - n e t s 89l o g o f) t h e p r o b a b i l i t y o f c h o o s i n g t h e w o r d l a b e lc o r r e c t l y i f w e w e r e t o g u e s s i t b y p i c k i n g f r o mt h e d i s t r i b u t i o n s p e c i f i e d b y t h e P ' s . T h e s u m o ft h e J ' s o v e r t h e t r a i n i n g s e t is t h e n t h e l o g o f t h ep r o b a b i l i t y o f g u e s si n g a l l o f t h e t r a i n i n g s e t l a b e l sco r r ec t ly . J i s c lo s e ly r e la ted to s evera l c r i t e r i aw h i c h h a v e b e e n u s e d i n p a t t e r n r e c o g n i t i o n ,n e u r a l n e t w o r k s a n d s p e e c h r e c o g n i t io n ( B r i dl e ,1 9 8 9 a ). I n p a r t i c u l a r , m i n i m i s i n g J i s e q u i v a l e n t( B r i d l e , 1 9 8 9 b ) t o m a x i m i s i n g t h e s o - c a l l e dM u t u a l I n f o r m a t i o n , w h i c h h a s b e e n u s e d su c -c e s s f u l ly t o i m p r o v e t h e a b i l i t y o f H M M s t o d i s -c r i m i n a t e w o r d s ( B a h l e t a l . , 1 9 8 6 ) .

    W e w i s h t o c o m p u t e 3J/3ai j a n d 3J/3bi t s o w ec a n o p t i m i s e J u s i n g s o m e g r a d i e n t - b a s e dm e t h o d .3 J _ 9 J 3 L w j _ 3 J ~ 3 L w ; 3 a j t

    a a q 3 L ~ , j ~ a i j 3 L w j 7 3 a jt 3 a i j (7)b e c a u s e e a c h aq has an e f f ec t o n i t s L ~j v ia eacho f t h e ajt 's .3 J _ 3 J 9 L w j 3 ( tj t ( 8 )

    ab / t 3L w , aa j , a b j ,s o w e n e e d aJ /aL w j , 3L , , j / 3a# , 9a j t /3a i j , a n d 3 a # /3bit . W e p r o c e e d i n t h a t o r d e r .

    ~ J ~ J a P C ~ J- w h e r e - - = - 1 / P o3 L w 3 P c 3 L ~ ' 3 P ca n d f r o m e q . ( 4 ) ,3 P c _ ( r c w - P c ) 6 c w = {~ i f w = c ,3 L w L ' o t h e r w i s e( n o t e t h a t P c i s a f f e c t e d b y a l l the L w ' s )s o

    3 J 1 ( 6 c ~ - P c ) ( P w - 5 c w ) (9)3 L ~ P c L L ~T h e t e c h n i q u e d e s c r ib e d b e l o w m a k e s h e a v y

    u s e o f t h e i d en t it y o f th e B a u m - W e l c h b a c k w a r dp a s s ( B e t a c a l c u l a t i o n , e q . ( 2 ) ) a n d t h e M L Pb a c k p r o p a g a t i o n o f p a r t ia l d e r i v a ti v e s . W e c a ns e e t h a t i n th e c a s e o f s e p a r a t e n e t w o r k s , i f t h e ap a s s ( e q . ( 1 ) ) is t h e f o r w a r d p a s s o f a r e c u r r e n tn e t w o r k , t h e t h e f l 's ( e q . ( 2 ) ) a r e t h e b a c k p r o p a -g a t e d p a r t i a l d e r i v a t i v e s o f t h e f i n a l l i k e l i h o o d .F r o m e q . ( 3 ) ,

    ~ L w ._ , = & . ( l O )3 R j t

    7. State transi t ion probabi l i t iesO ~ j t E a i j o t i t _ l b j tiSO? O t J t =3aij O ~ i > - l b j r3 L w j = ~ ~ L w j 3 a i r (11)a a i j 7 ' 3 o ~ j t 9 a i j

    = E f l j t o ~ i , t _ l b j tt= y ~ / i j t ~ / i j

    7 aq aqT h i s d e r i v a t i v e i s a l w a y s p o s i t i v e , s o w e c a n

    a l w a y s i n c r e a s e L ~ j b y i n c r e a s i n g t h e aq's . H o w -e v e r , i f w e w i s h t o p r e s e r v e t h e H M M b a s is o f t h er e c o g n i s e r , w e m u s t e n s u r e t h a t t h e a ' s a n d b ' sh a v e t h e a p p r o p r i a t e f o r m s . E a c h b j , v a l u e m u s tb e p o s i t i v e , b u t a n y s c a l e f a c t o r i s r e m o v e d b y t h en o r m a l i s a t i o n w h i c h t a k e s p l a c e l a t e r . T h e s t a t et r a n s i t i o n p r o b a b i l i t i e s , a , a r e m o r e d i f f i c u l t , b e -c a u s e w e m u s t h a v ea q > - - O V i , j a nd ~ a q = 1 V i .JO u r p r o p o s e d s o l u t i o n i s t o a d a p t n o t t h e a ' s b u ta s e t o f u n c o n s t r a i n e d v a r i a b l e s d e n o t e d { A q } ,w h i c h d e t e r m i n e t h e v a l u e s o f t h e a ' s w h il e k e e p -i n g t h e s t o c h a s t i c c o n s t r a i n t s a t i s f i e d . T h e f o l l o w -i n g t r a n s f o r m a t i o n h a s t h e r e q u i r e d p r o p e r t i e s :a i j = e m i i / E e A i '. (12)1A n y s e t o f v a l u e s { A q } p r o d u c e s a v a l i d s e t o f{ a q }' s, a n d t h e r e l a t i v e v a l u e s o f t h e A ' s a r e p r e -s e r v e d i n t h e a ' s . W e u s e a n e x p o n e n t i a l t o e n s u r ep o s i t i v i t y , a n d n o r m a l i s e a c r o s s a l l d e s t i n a t i o n ss ta tes to ens u re ~ ,~aq = 1 . Th i s t r ans fo rm at io nh a s o t h e r u s e s i n n e t w o r k s , a n d i t c a n h a v e s i m p l ei m p l e m e n t a t i o n s i n e l e c t r o n i c c i r cu i t s ( B r i d l e ,1989a) .

    D i f f e r e n t i a t i n g e q . ( 1 2 ) ,~ a i t - - a i j ( r j l - a i l ) = a i t ( f } j l - a i j ) . (13)3 A i j

    Vol. 9, No. 1, February 1990

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    8/10

    9 0 J . S . B r i d l e / A l p h a - n e t sU s i n g e q s . ( 1 1 ) a n d ( 1 3 ) w e c a n f i n d9 L w j O L w j ~ a i l~ g i j - ~ l 3 a l l 3 A i j

    - - - - ~ ~ f l l tO t i , t - l b l t a i l ( 6 f l - a q )l t= ~ ( a i , t - la i j b j t f l j t ) - a i j ~ a i , t - 1 ~ a i t b l t f l l tt t l= ~ ( o ~ i , t _ l a i j b j t f l j t ) - a i j ~ a i , t - l f l i , t - 1t t> - a / , } 1 4 )= (70 - aq h) = 7i L 7i

    T h i s i s t h e d e r i v a t iv e f o r A q w h i c h w e w o u l du s e f o r g r a d i e n t a s c e n t o n L w , t h e l i k e l i h o o d o ft h e d a t a f o r a si n gl e w o r d - m o d e l ( if t h e w o r dm o d e l s a r e s e p a r a t e ) . N o t e t h a t w e s t i l l h a v e t oc o n v e r t f r o m A ' s t o a ' s. T h e f i r s t t e r m i n th e b r a c -k e t is t h e B a u m - W e l c h r e - e s t i m a t e f o r t h e a 's , s ow e c a n w r i t eOL,,j5 A i j - - 7 i ( h i j - - a i j ) . (15)

    F o r t h e d i s c r i m i n a t o r s e t o f m o d e l s , u s i n g e q s.( 1 5 ) a n d ( 9 ) ,9 J 3 J 3 L ~ j9 A 0 = 3 L w , 9 A q

    = ( l j - 6 w ) ' i (a j - a o )L w ~= ( P ~ , - 6 c ~ , ) g i ( a q - a o ) . ( 1 6 )

    8 . T h e o b s e r v a t i o n l i k e l i h o o d sT h e o b s e r v a t i o n s a t e a c h t i m e e n t e r t h e l o o p

    v i a t h e q u a n t i t i e s { bit} , w h i c h a r e v a l u e s o f f u n c -t i o n s , o n e f u n c t i o n p e r s t a t e , o f t h e i n p u t , Y t , a te a c h t i m e . ( T h e y ,' s c o u l d b e m o r e t h a n s i n g le -f r a m e a c o u s t i c v e c t o r s . ) I f t h e f u n c t i o n s a r e a l l o ft h e s a m e f o r m w e c a n w r i t ebi t = b(y t , Oj, 0 ) ,w h e r e Oj is a v e c t o r o f p a r a m e t e r s w h i c h d e p e n do n t h e s t a t e , j , a n d i s a v e c t o r o f p a r a m e t e r sw h i c h a r e i n d e p e n d e n t o f t h e s ta t e . T o o p t i m i s e{ Oj} a nd q~ w e re qu i re3 J 3 J. . . . and - -aoj a0

    W e p r o c e e d v i a d e r i v a t i v e s w i t h r e s p e c t t o b j ,a j t = ~ O ~ i, - l a i j b j t ,is o

    ~ j t _ _ ~ i a i t - la i j ~ c t j t / b j r3 b i t

    3 L w j _ 3 L w j 3 a j t _ f l j t a j t _ 7jr (17 )3 b i t 3 a j t 9 b j t b i t b / t

    T h e s e p a r t i a l d e r i v a t i v e s , a n d 3 J / a b j t , c a n b ep r o p a g a t e d d o w n i n t o w h a t e v e r n e t w o r k p r o -d u c e d t h e b ' s. F o r i n s t a n c e ,3 L w l _ S ~ 3 L w j 3 b i t = S ~ 7 jr 3 b i t

    W e s h a l l c o n s i d e r a s p e c i fi c c a s e w h i c h i s p o p u l a rin H M M s .

    9 . S i m p l e G a u s s i a n o u t p u t d i st r i b u t io n sC o n s i d e r t h e s i m p l e s t c o n t i n u o u s o u t p u t d i s -

    t r i b u t io n m o d e l : u n i t v a r i a n c e s in g l e - m o d e G a u s -s ia n d i s t r ib u t i o n s p a r a m e t e r i s e d b y t h e i r m u l -t i v a r i a t e m e a n s { m j }. W e s h a ll s e e t h a t t h e d e r i v a -t iv e w i t h r e s p e c t t o t h e m e a n o f t h e d i s t r ib u t i o nis a si m p l e f u n c t i o n o f t h e m e a n s a s t h e y w o u l db e r e -e s t im a t e d b y t h e B a u m - W e l c h m e t h o d o fw i t h i n - w o r d - m o d e l l i k e l ih o o d h i ll cl im b i n g .

    T h e l i k e l i h o o d i s a n e x p o n e n t i a l f u n c t i o n o ft h e E u c l i d e a n d i s t a n c e s q u a r e d b e t w e e n t h e m e a na n d t h e i n p u t v e c t o r y :b ] - - 1 e_lly _mill..

    Zw h e r e z i s a c o n s t a n t ,

    s o

    3 b j t _ 2 b j t ( y t _ m j ) .3 m j3 L ~ = S 3 L w j 3 b j t3 m j ~ t 3 b i t 3 m j

    7 i t b j t ( y , - m / )

    ( 1 8 )

    S p e e c h C o m m u n i c a t io n

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    9/10

    J .S . B r i d l e / A l p h a - n e t s 91T h i s i s t h e g r a d i e n t f o r t h e q u a n t i t y w h i c h i s

    o p t im i s e d b y th e B a u m - W e l c h i t e r a ti o n . W e c a ne x p r e ss t h e r e s u lt in t e rm s o f t h e B a u m - W e l c hr e - e s t i m a t e s f o r t h e r e ' s ,

    ~ ' t ~ ' j t Y t ~ t ~ j t Y tn j = z j t=

    g i v i n g~ L w j - 2 e j (l f tl - - m j ) . (19)mj

    H o w e v e r , f o r d i s c r im i n a t i v e t r a i n i n g w e a r e in -t e r e s t e d i n

    ~ J 2 ( P w ~ - 6 ~ . , ) 7J(thJ mj)~mj Lvcj= 2 ( P w ~ - 6 c w ) ~ j ( t h j - m i ) . (20 )

    T h e p a r a m e t e r s o f th e t r u e m o d e l a lw a y s m o v et o w a r d s t h e B a u m - W e l c h r e - e s t i m a t e s , b u tw h e r e a s i n th e B a u m - W e l c h i t er a ti o n th ep a r a m e t e r s o f th e o t h e r m o d e l s a r e u n t o u c h e d , i nt h e p r e s e n t m e t h o d t h e p a r a m e t e r s o f al l t h ew r o n g m o d e l s m o v e a w a y f r o m t h e i r B a u m -W e l c h r e - e s t i m a t e s f o r t h a t w o r d ( B a h l e t a l . ,1 9 87 ). W e c a n s e e t h a t t h i s i s o f t h e s a m e f o r m a st h e C o r r e c t i v e T r a i n i n g m e t h o d ( B a h l e t a l . ,1988) .

    1 0 . C o n c l u s i o n sT h e a l p h a - n e t s d e s c ri b e d h e r e h a v e a n e x a c t

    i n t e r p r e t a t i o n i n t e r m s o f a s e t o f h i d d e n M a r k o vw o r d m o d e l s . I s u g g e s t t h is is a v e r y u s e f u l r e f e r -e n c e p o i n t f o r a t t e m p t s t o u s e re c u r r e n t n e t w o r k sf o r s p e e c h r e c o g n i t i o n ( o r i n t e r p r e ti n g o t h e r a m -b i g u o u s p a t t e r n s w i t h s e q u e n t i a l s t r u c t u r e ) . N o t et h a t w h e n t h e p a r a m e t e r s o f t h e n e t w o r k h a v eb e e n t r a i n e d u s i n g a d i s c r i m i n a t i v e c r i t e r i o n s u c ha s J t h e s e p a r a m e t e r s w i l l n o t n e c e s s a r i l y b e g o o de s t i m a t e s o f t h e p a r a m e t e r s o f t h e g e n e r a t o r s o ft h e d a t a i n t h e i n d i v i d u a l w o r d c l a ss e s .

    T h e r e a r e s e v e r a l s t r a i g h t f o r w a r d g e n e r a li s a -t i o ns , s o m e o f w h i c h re t a i n th e H M M i n t e r p re t a -b i l i ty .- T y i n g s o m e o f t h e p a r a m e t e r s c a u s e s n o p ro b -

    l e m s ( e .g . e a c h w o r d m o d e l m a d e u p o f a c o n -c a t e n a t i o n o f p h o n e t i c a l l y - m o t i v a t e d s u b w o r d

    m o d e l s ) . O n e t y p e o f t y i n g i s s h a r i n g a n i n p u tt r a n s f o r m a t i o n . H u n t ( 19 8 9) h a s r e p o r t e d g o o dr e s u l t s u s i n g a l i n e a r d i s c r i m i n a t i v e t r a n s f o r -m a t i o n o f t h e r a w s p e c t r u m v e c t o r s , b a s e d o nt i m e - a l i g n m e n t o f s a m e a n d d i f f e re n t w o r d s.W h e n w e i n c l u d e a t r a n s f o r m a t i o n ( l i n e a r o rn o n - l i n e a r ) i n t h e b n e t w o r k w e c a n d e r i v e t h ea p p r o p r i a t e d e r i v a t i v e s o f J .

    - I t s h o u l d b e p o s s i b l e t o t r a i n a w o r d d i s -c r i m i n a t o r w h i c h s t a r t s w i t h a s e t o f s t a t e s u n -c o m m i t t e d t o p a r t ic u l a r w o r d s ( e x c e p t f o r th ef i n a l s t a t e s 3 ) .

    - I t is a l so p o s si b l e t o s h a r e o u t p u t d i s t r i b u t i o nm o d e s , i n t h e s t y l e b i t = X g C k j N Y t , m Xj) ,w h e r e ckj i s t h e m i x t u r e f a c t o r f r o m m o d e k t os t a t e j , a n d N is a G a u s s i a n d e n s i t y .

    - T h e m o s t p r i m i ti v e a l p h a - n e t w o u l d h a v e o n l yo n e s t a t e p e r w o r d , s o e a c h o f th e s e s t a t e sc o u l d o n l y a c c u m u l a t e e v i d e n c e f o r a n d a g a i n stt h e w o r d . S u c h a s y st e m w o u l d n e e d t o u s em u l t i - m o d a l o u t p u t d i s t r i b u t i o n s o v e r s e v e r a lf r a m e s a t a t i m e , a n d w o u l d t h e n b e v e r y s i m i -l a r t o a T D N N .M e t h o d s b a s e d o n d e r i va t iv e s a r e u n l i k e l y t o

    b e a s e f f ic i e n t a s t h e B a u m - W e l c h m e t h o d , a n dt h e t e c h n i q u e d e s c r i b e d h e r e i s m o s t a t t r a c t i v ef o r f i n a l t u n i n g - u p a f t e r t h e s t r u c t u r e o f t h e n e t -w o r k h a s b e e n e s t a b l i s h e d b y o t h e r m e t h o d s .G o p a l a k r i s h n a n e t a l. h a v e r e c e n t l y s h o w n t h a t ar e - e s t i m a t e - t y p e m e t h o d i s a v a i l a b l e f o r t h eM a x i m u m M u t u a l I n f o r m a t i o n E st i m a t i o n ofH M M p a r a m e t e r s ( 19 8 9) . I t is n o t c l e a r h o w g e n -e r a l t h i s m e t h o d i s .

    H a v i n g e l i m i n a t e d th e q u e s t i o n W h i c h is b e t -t e r: H M M s o r B a c k p r o p n e t s ? , w e c a n n o w c o n -c e n t r a t e o n f i n d in g o u t w h i c h p a r t i c u l a r p r o p e r -t i es o f a s p e e c h r e c o g n i t i o n s y s t e m w o r k b e s t i np r a c t i c e .

    R e f e r e n c e sL . R . B a h l , P .F . B r o w n , P . V . d e S o u z a a n d R . L . M e r c e r

    ( 1 9 8 6 ), M a x i m u m m u t u a l i n f o r m a t i o n e s t i m a t i o n o fh i d d e n M a r k o v m o d e l p a r a m e t e r s , P r o c . I E E EI C A S S P 8 6 , p p . 4 9 - 5 2 .

    L . R . B a h l , P . F . B r o w n , P . V . d e S o u z a a n d R . L . M e r c e r( 1 9 8 7) , S p e e c h re c o g n i t i o n w i t h c o n t i n u o u s - p a r a m e t e rh i d d e n M a r k o v m o d e l s , C o m p u t e r S p e e c h a n d L a n -g u a g e , V ol . 2 ( 3 / 4 ) , pp . 219 - 234 .

    Vol. 9, No. 1, February 1990

  • 8/13/2019 Alpha Nets - A Recurrent Neural Network Architecture With a Hidden Markov Model (HMM) Interpretation

    10/10

    92 J.S. Bridle / Alpha-netsL . R . B a h l , P . F . B r o w n , P . V . d e S o u z a a n d R . L . M e r c e r

    ( 1988) , A ne w a l go r i thm f o r t he e s t i m a t i on o f H M Mp a r a m e t e r s , Proc. IEEE ICASSP 88, pp. 493-496.P .S . Gopa l a k r i s ha n , D . K a ne vs ky , A . N a da s a nd D .

    N a h a m oo ( 1989), A ge ne r a l is a t i on o f t he B a um a l -go r i t hm t o r a t i ona l ob j e c t i ve f unc t i ons , Proc. ICASSP89, pp. 631-634.M .D. B e d w or t h , L . B o t t ou , J .S . B r i d l e e t a l . ( 1989) , C om -pa r i s on o f ne u r a l a nd c onve n t i ona l c l as s if i er s on a s pe e c hr e c ogn i t i on p r ob l e m , Proc. lEE First Int. Conf. on Ar-tificial Neural Networks, pp. 86-89.

    L . B o t t ou , F . Fo ge l m a n - Sou l i , P . B l a nc he t a nd J .S . L i e na r d( 1989) , E xp e r i m e n t s w i t h t i m e de l a y ne t w or ks a nddyna m i c t i m e w a r p i ng f o r s pe a ke r i nde pe nde n t i s o l a t e dd i g it s r e c ogn i t i on , Proc. Eurospeech 89.H . B ou r l a r d a nd C . J . We l l e ke ns (1989a ), Spe e c h pa t t e r n d is -c r i m i na t i on a n d m u l t i l a ye r pe r c e p t i ons , Computer,Speech and Language.

    H . B our l a r d a nd C . J . We l l e ke ns (1989b), L i nks be t w e e nMa r kov m ode l s a nd m u l t i la ye r pe r c e p t i ons , i n Ad-vances in Neural Information Processing Systems I, ed.b y D . S . T o u r e t z k y ( M o r g a n K a u f m a n n , L o s A l to s , C A ) ,pp. 502-510.

    J .S. Br idle (1989a), Prob abi l i s t i c in terp re ta t ion of feed for -ward c lass i f i ca t ion ne twork outputs , wi th re la t ionshipsto s ta t i s t i ca l pa t te rn recogni t ion , in Neuro-computing:Algorithms, architectures and applications, e d . by F .Foge l m a n- Sou l i e a nd J . H 6r a u l t ( Sp r i nge r , N e w Yor k ) .

    J .S . B r i d l e (1989b), T r a i n i ng s toc ha s t ic m ode l re c ogn i t i ona l go r i t hm s a s ne t w or ks c a n l e a d t o m a x i m um m ut ua l i n -f o r m a t i o n o f p a r a m e t e r s , Proc. IEEE Conf. on NeuralInformation Processing Systems, Natural and Synthetic(NIPS 89).J .N. Holmes (1988) , Speech Synthesis and Recognition ( va nN o s t r an d R e i n h o l d , N e w Y o r k ) .

    M .J . H un t a nd C . L e f ~br e ( 1989) , A c om pa r i s on o f s e ve r a l

    a c ous t ic r e p r e s e n t a t i ons f o r s pe e c h r e c ogn i t i on w i t h de -g r a d e d a n d u n d e g r a d e d s p e e c h , Proc. IEEE Int. Conf.Acoust., Speech, Signal Proc., pp. 262-265.G .M . K uhn ( 1987), A f i rs t l ook a t phone t i c d i sc r i m i na t ion

    us i ng a c onne c t i on i s t ne t w or k w i t h r e c u r r e n t l i nks ,SC I M P Wo r k i ng Pa pe r 4 / 87, I n s t i tu t e f o r De f e ns eA na l ys i s , C om m uni c a t i ons R e s e a r c h D i v i s i on .G .M . K uhn , R .L . Wa t r ous a nd B . L a d e ndo r f ( 1990) , C on -ne c t e d r e c og n i t i on w i th a r e c u r r e n t n e t w or k , SpeechCommunication, Vol . 9(1) , pp. 41-48 ( th i s i ssue) .

    K . L a ng a nd G .E . H i n t on ( 1988), The de ve l opm e n t o fTDN N a r c h i t e c t u r e f o r s pe e c h r e c ogn i t i on , Te c hn i c a lR e p or t C MU - C S- 88- 152 , C a r ne g i e - M e l l on U n i ve r s i t y .

    R .P . L i ppm a nn , R e v i e w o f ne u r a l ne t w or ks f o r s pe e c h r e c -ogn i t i on , Neural Computation, Vol. 1(1).R .K . M oor e a nd S .M. Pe e l i ng ( 1989) , Mi n i m a l l y d i s ti nc t

    w or d - pa i r d i s c r i m i na t i on u s i ng a ba c k - p r opa ga t i on ne t -w o r k , Computer Speech and Language, Vol . 3 , pp. 119-131.

    R .G . P r a ge r e t a l . ( 1986) , B o l t z m a nn m a c h i ne s f o r s pe e c hr e c ogn i t i on , Computer Speech and Language, Vol. 1(1).A . Wa i be l , T . H a na z a w a , G . H i n t on , K . Sh i ka no a nd K . L a ng

    ( 1988a) , Pho ne m e r e c ogn i t i on : N e ur a l ne t w or ks v s .h i d d e n M a r k o v m o d e l s , Proc. IEEE ICASSP 88, pp.107-110.

    A . W a i b e l , T . H a n a z a w a , G . H i n t o n , K . S h i k a n o a n d K . L a n g( 1988b) , Pho ne m e r e c ogn i t ion u s i ng t i m e - de l a y ne u r a lne t w or ks , IEEE Trans. Acoust., Speech, Signal Proc.,M a r c h .

    R .L . Wa t r ous ( 1988), C onn e c t i on i s t s pe e c h r e c ogn i ti on u s i ngt h e t e m p o r a l f l o w m o d e l , Proc. IEEE Workshop onSpeech Recognition.

    R .L . W a t r ous , B . L a de ndor f a nd G . K uhn (1989) , C om pl e t eg r a d i e n t op t i m i s a t i on o f a r e c u r r e n t ne t w or k a pp l i e d t o/b/, /d/, /g/ discrimination , J. Acoust. Soc. Am., to ap-pe a r .

    Speech Communication