speech recognition using hidden markov model_mee_03_19 (1)

8/12/2019 Speech Recognition Using Hidden Markov Model_MEE_03_19 (1)

1/79

Speech Recognition using Hidden Markov Model

An implementation of the theory on a

DSK ADSP-!"## $%-K&' (&'$ R$) *+"

,ick ardici

.rn Skarin

////////////////////////////////////////////////////

Degree of Master of Science in $lectrical $ngineering

M$$-0#-*1

Supervisor2 Mikael ,ilsson

School of $ngineering

Department of 'elecommunications and Signal Processing

lekinge &nstitute of 'echnology

March3 4005


2/79

M$$-0#-*1 Speech Recognition using Hidden Markov Model

An implementation of the theory on a DSK-ADSP-!"## $%-K&' (&'$ R$) *+"

///////////////////////////////////////////////////////////////////////////

A6stract

'his master degree proect is ho7 to implement a speech recognition system on a DSK

ADSP-!"## $%-K&' (&'$ R$) *+" 6ased on the theory of the Hidden Markov Model

8HMM9+ 'he implementation is 6ased on the theory in the master degree proect Speech

Recognition using Hidden Markov Model 6y Mikael ,ilsson and Marcus $narsson3 M$$-

0*-4:+ 'he 7ork accomplished in the proect is 6y reference to the theory3 implementing a

M!;;3 Mel !rer ut p> att implementera en r.stigenk?nningssystem p> en DSK

ADSP-!"## $%-K&' (&'$ R$) *+" 6aserad p> teorin om HMM3 Hidden Markov Model+

&mplementeringen ?r 6aserad p> teorin i e=amensar6etet Speech Recognition using Hidden

Markov Model av Mikael ,ilsson och Marcus $narsson3 M$$-0*-4:+ Det som gorts i

ar6etet ?r att utifr>n teorin implementerat en M!;;3 Mel !re DSP2n 'e=as &nstruments 'MDS#40=5:**+ Sedan utv?rderadesrealtidstill?mpningen+

///////////////////////////////////////////////////////////////////////////4


3/79



///////////////////////////////////////////////////////////////////////////

Contents

*+ A6stract 2

4+ ;ontents 3

#+ &ntroduction 6

@+ Speech signal to !eature )ectors3 Mel !re


4/79



///////////////////////////////////////////////////////////////////////////

5+#+* 3 output distri6ution matri= #"

5+#+4 E3 'he !or7ard varia6le #B

5+#+# F3 ack7ard Algorithm @0

5+#+@ c3 scaled3 the scaling factor3 E scaled3 F scaled @*

5+#+" (og8P8GI993 (og(ikelihood @"

6.4 ,eestimation 46

5+@+* A/reest3 reestimated state transition pro6a6ility matri= @:

5+@+4 J/reest3 reestimated mean @B

5+@+# /reest3 variance matri= @B

5+@+@ ;heck threshold value @B

6.& +he res"lt 5 the model 49

5+"+* 'he Model @1

7. HMM'he testing of a 7ord against a model 'he determination pro6lem "0

7.1 SP##C S-/( &2:+*+* Speech signal "4

7.2 P,#P,*C#SS-/ &2

:+4+* M!;; "4

7.3 --+-(-(+-* &2

:+#+* (og8A93 state transition pro6a6ility matri= of the model "4

:+#+4 J3 mean matri= from model "4

:+#+# 3 variance matri= from model "4

:+#+4 (og893 initial state pro6a6ility vector "4

7.4 P,*8(8--+ #:((+-* &3

:+@+* (og89 "#

:+@+4 L3 delta "#:+@+# 3 psi "#

:+@+@ (og8PN9 "#

:+@+"


5/79



///////////////////////////////////////////////////////////////////////////

%.6 F#(+,# :#C+*,S 5 el Fre?"enc@ Cepstr"m Coe))icients 64

%.7 +#S+-/ 6&

%.% --+-(-(+-* *F +# *'# +* 8# S#' 66

B+B+* (og8A93 state transition pro6a6ility matri= of the model 55

B+B+4 J3 mean matri= from model 55

B+B+# 3 variance matri= from model 55

B+B+@ (og893 initial state pro6a6ility vector 55

%.9 P,*8(8--+ #:((+-* 66

B+1+* (og89 55

B+1+4 L3 delta 55

B+1+# 3 psi 55

B+1+@ (og8PN9 55

B+1+"


6/79



///////////////////////////////////////////////////////////////////////////

#+ &ntroduction

&n our minds the aim of interaction 6et7een a machine and a human is to use the most natural

7ay of e=pressing ourselves3 through our speech+ A speech recogniCer3 implemented on a

machine as an isolated 7ord recogniCer 7as done through this proect+ 'he proect also

included an implementation on a DSK 6oard due to the porta6ility of this device+

!irst the feature e=traction from the speech signal is done 6y a parameteriCation of the 7ave

formed signal into relevant feature vectors+ 'his parametric form is then used 6y the

recognition system 6oth in training the models and testing the same+

'he techni


7/79



///////////////////////////////////////////////////////////////////////////

@+ Speech signal to !eature )ectors3 Mel !re


8/79



///////////////////////////////////////////////////////////////////////////

Fig"re 4.1

///////////////////////////////////////////////////////////////////////////B


9/79



///////////////////////////////////////////////////////////////////////////

@+* SP$$;H S&Q,A(

@+*+* Speech signal

'he original analogue signal 7hich to 6e used 6y the system in 6oth training and testing isconverted from analogue to discrete3 =8n9 6oth 6y using the program ;ool$dit3

http2777+cooledit+comand 6y using the DSK ADSP-!"## $%-K&' (&'$ R$) *+"3

http2777+6lackfin+org+ 'he sample rate3 !s used 7as *5kHC+ An e=ample of a signal in

7aveform sampled is given in Fig"re 4.2+ 'he signals used in the follo7ing chapters are

denoted 7ith anxand an e=tension_ffte+g+x_fft(n)if an fft is applied to it+ 'he original

utterance signal is denotedx_utt(n)3 sho7n 6elo7+

Fig"re 4.2 5 Sampled signalA "tterance o) B)ram in !ae)orm

///////////////////////////////////////////////////////////////////////////1
http://www.cooledit.com/http://www.blackfin.org/http://www.cooledit.com/http://www.blackfin.org/


10/79



///////////////////////////////////////////////////////////////////////////

@+4 PR$PRG;$SS&,Q

@+4+* Preemphasis'here is a need for spectrally flatten the signal+ 'he preemphasiCer3 often represented 6y a

first order high pass !&R filter is used to emphasiCe the higher fre


11/79



///////////////////////////////////////////////////////////////////////////

Fig"re 4.3b 5 *riginal signalD@DnEE and preemphasiedD$DnEE

&n Fig"re 4.3b it sho7s ho7 the lo7er fre


12/79



///////////////////////////////////////////////////////////////////////////

@+4+4 )AD3 )oice Activation Detection

hen you have got access to a sampled discrete signal it is significant to reduce the data

to contain only the samples 7hich is represented 7ith signal values3 not noise+ 'herefore

the need of a good )oice Activation Detection function is needed+ 'here are many 7aysof doing this+ 'he function used is descri6ed inEq.4.2+

hen 6eginning the calculation and estimation of the signal it is useful to do some

assumptions+ !irst 7e needed to divide the signal into 6locks+ 'he length of each 6lock is

needed to 6e 40ms according to the stationary properties of the signal OMM+ hen using

the !s at *5 kHC3 it 7ill give us a 6lock length of #40 ms+ ;onsider the first *0 6locks to

6e 6ackground noise3 then mean and variance could 6e calculated and used as a reference

to the rest of the 6locks to detect 7here a threshold is reached+

B+04+03var33 ===+= wwwwww meant Eq. 4.2

'he threshold in our case 7here tested and tuned to 1.2N tw+ 'he result of the

preemphasiCed signal cut do7n 6y the )AD is presented in Fig"re 4.4a and )ig"re 4.4b+

///////////////////////////////////////////////////////////////////////////*4


13/79



///////////////////////////////////////////////////////////////////////////

Fig"re 4.4a

Fig"re 4.4b

///////////////////////////////////////////////////////////////////////////*#


14/79



///////////////////////////////////////////////////////////////////////////

@+# !RAM$(G;K&,Q V &,DG&,Q

@+#+* !rame6locking

'he o6ective 7ith fram6locking is to divide the signal into a matri= form 7ith anappropriate time length for each frame+ Due to the assumption that a signal 7ithin a frame

of 40 ms is stationary and a sampling rate at *5000HC 7ill give the result of a frame of

#40 samples+

&n the fram6locking event the use of an overlap of 543"W 7ill give a factor of separation

of *40 samples+

Fig"re 4.&

@+#+4 indo7ing using Hamming 7indo7

After the frame6locking is done a Hamming 7indo7 is applied to each frame+ 'his

7indo7 is to reduce the signal discontinuity at the ends of each 6lock+

'he e


15/79



///////////////////////////////////////////////////////////////////////////

Fig"re 4.&

///////////////////////////////////////////////////////////////////////////*"


16/79



///////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////////////////////////////////*5


17/79



///////////////////////////////////////////////////////////////////////////

'he Fig"re 4.6sho7s the result of the frame6locking3 6lock num6er 40+

Fig"re 4.6

Fig"re 4.7 sho7s the 6lock 7indo7ed 6y the 7indo7 in Fig"re 4.7

Fig"re 4.7

'he result gives a reduction of the discontinuity at the ends of the 6lock+

///////////////////////////////////////////////////////////////////////////*:


18/79



///////////////////////////////////////////////////////////////////////////

@+@ !$A'XR$ $Y'RA;'&G,

'he method used to e=tract relevant information from each frame6lock is the mel-

cepstrum method+ 'he mel-cepstrum consists of t7o methods mel-scaling and cepstrum

calculation+

@+@+* !!' on each 6lock

Xse "*4 point !!' on each 7indo7ed frame in the matri=+ 'o adust the length of the

40ms frame length3 Cero padding is used+ 'he result for the 6lock num6er 40 is given in

Fig"re 4.%.

Fig"re 4.%

///////////////////////////////////////////////////////////////////////////*B


19/79



///////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////////////////////////////////*1


20/79



///////////////////////////////////////////////////////////////////////////

@+@+4 Mel spectrum coefficients 7ith filter6ank

'he fact that the human perception of the fre


21/79



///////////////////////////////////////////////////////////////////////////

'he element inx_mel(1,1)are o6tained 6y summing the contri6ution from the first

filtertap denoted * 8Mat(a6 notation+ mel6ank8*24"532993 then elementx_mel(2,1)is

o6tained 6y summing the contri6ution from the second filtertap in mel6ank and so on+

///////////////////////////////////////////////////////////////////////////4*


22/79



///////////////////////////////////////////////////////////////////////////

@+@+# Mel-;epstrum coefficients3 D;' Discrete ;osine 'ransform

'o derive the mel cepstrum of the 7arped mel fre


23/79



///////////////////////////////////////////////////////////////////////////

@+@+@ (iftering3 the cepstral domain e


24/79



///////////////////////////////////////////////////////////////////////////

@+@+" $nergy Measure

'o add an e=tra coefficient containing information a6out the signal the log of signal

energy is added to each feature vector+ &t is the coefficient that 7ere e=changed mentioned

in the previous section+ 'he log of signal energy is defined 6yEq.4.%+

1+@+9Z8/log*

0

4 Eqmkw&n'(we'xEK

k

m

=

=

///////////////////////////////////////////////////////////////////////////4@


25/79



///////////////////////////////////////////////////////////////////////////

@+" D$('A V A;;$($RA'&G, ;G$!!&;&$,'S

'he delta and acceleration coefficients are calculated to increase the information of the

human perception+ 'he delta coefficients are a6out time difference3 the acceleration

coefficients are a6out the second time derivative+

@+"+* Delta coefficients'he delta coefficients are calculated according toEq.4.1.

*0+@+

99Z89Z88

4

P*O Eq

mncmnc

*

*

*

*

hh

=

=

+

=

///////////////////////////////////////////////////////////////////////////4"


26/79



///////////////////////////////////////////////////////////////////////////

@+"+4 Acceleration coefficients

'he acceleration coefficients are calculated according toEq.4.11.

**+@+

9*4898

9Z89*489Z8

4@44

44

P4OEq

*

mnc*mnc

*

*

*

*

*

*

*

*

hh

*

*

+

+++=

= =

= ==

///////////////////////////////////////////////////////////////////////////45


27/79



///////////////////////////////////////////////////////////////////////////

@+"+# 4-nd order polynomial appro=imation

Xsing the P0O 8Eq.+.1293 P*O and P4O the appro=imation of the mel-cepstrum

traectories could 6e appro=imated according to Eq.4.1+.'he Fig"re 4.10 is the result of

using the fitting 7idth P [ #+

*4+@+4

*9Z8

*4

* 4P4OP0O Eqmnc*

*

*

*

h

+

+=

==

*#+@+44P4OP*OP0O Eq =++

Fig"re 4.10

///////////////////////////////////////////////////////////////////////////4:


28/79



///////////////////////////////////////////////////////////////////////////

@+5 PGS'PRG;$SS&,Q'o achieve some enhancement in ro6ustness there is a need for postprocessing of the

coefficients+

@+5+* ,ormaliCation

'he enhancement done is a normaliCation3 meaning that the feature vectors are

normaliCed over time to get Cero mean and unit variance+ ,ormaliCation forces the feature

vectors to the same numerical range OMM+ 'he mean vector3 called 98nf, 3 can 6e

calculated according to $


29/79



///////////////////////////////////////////////////////////////////////////

@+: R$SX('

@+:+* !eature vectors Mel !re


30/79


31/79



///////////////////////////////////////////////////////////////////////////

5+ HMM 'he training of a model of a 7ord'he re-estimation

pro6lem

Qiven a , num6er of o6servation se


32/79



///////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////////////////////////////////#4


33/79



///////////////////////////////////////////////////////////////////////////

5+* M$A, A,D )AR&A,;$

5+*+* Signal 'he utterance

'he signal used for training purposes are ordinary utterances of the specific 7ord3 the 7ord to6e recogniCed+

5+*+4 M!;; Mel !re


34/79



///////////////////////////////////////////////////////////////////////////

5+*+# 3 mean

hen the M!;; is achieved3 there is a need to normaliCe all the given training utterance+ 'he

matri= is divided into a num6er of coefficients times num6er of states+ 'hen these are used for

calculating the mean and variance of all the matrices3 see section @+*+@ for variance

calculation+ 'he mean us calculated usingEq6.1

c(lumncnxN

xN

n

cc ==

=

398* *

0

/

Eq.6.1

,ote that if multiple utterances are used for training there is a need of calculating the mean of

x_(m,n)for that num6er of utterances+

@+*+@ 3 variance

'he variance is calculated usingEq6.2andEq6.++

c(lumncnxN

xN

n

cc ==

=

398* *

0

4/4

Eq. 6.2

c(lumncxx ccc =

= 34

//44 Eq. 6.+

A more e=plicit e=ample of calculating a certain inde= e+g thex_7(1,1) is done according to

the follo7ing e


35/79



///////////////////////////////////////////////////////////////////////////

5+4 &,&'&A(&%A'&G,

5+4+* A3 the state transition pro6a6ility matri=3 using the left-to-right model

'he state transition pro6a6ility matri=3 A is initialiCed 7ith the e


36/79



///////////////////////////////////////////////////////////////////////////

use+ 'his is less complicated in calculations 6ut it uses a vector


37/79



///////////////////////////////////////////////////////////////////////////

Gne =/mfcc feature vector is in the estimation versus each;and7 vector+ i+e+ $ach feature vector is

calculated for allx_;andx_ 7 clumn! one 6y one+

///////////////////////////////////////////////////////////////////////////#:


38/79



///////////////////////////////////////////////////////////////////////////

'he resulting state-dependent o6servation sym6ol pro6a6ilities matri=+ 'he columns gives theo6servation pro6a6ilities for each state+

5+#+4 E3 'he !or7ard Algorithm

hen finding the pro6a6ility of an o6servation se


39/79



///////////////////////////////////////////////////////////////////////////

'he definition of 98&t is that 98&t is the pro6a6ility at time tand in state &given the

model3 having generated the partial o6servation se


40/79



///////////////////////////////////////////////////////////////////////////

Return to step 4 if t 'Z

Gther7ise3 terminate the algorithm 8goto step @9+

@+ 'ermination

989H8*

&


41/79



///////////////////////////////////////////////////////////////////////////

*+ &nitialiCation

Set t [ ' *Z

N&&4 = *3*98

4+ &nduction

N&(5a&&N

9

t9&9tt ==

++ *3989898*

**

#+ Xpdate time

Set t[ t- *Z

Return to step 4 if t 0Z

Gther7ise3 terminate the algorithm+

@+#+@ c3 the scaling factor3 E scaled3 F scaled

Due to the comple=ity of precision range 7hen calculating 7ith multiplications of

pro6a6ilities makes a scaling of 6oth >and?necessary+ 'he comple=ity is that the

pro6a6ilities is heading e=ponentially to Cero 7hen t gro7s large+ 'he scaling factor for

scaling 6oth the for7ard and 6ack7ard varia6le is dependent only of the time tand

independent of the state &+ 'he notation of the factor is tc and is done for every t and state &3

N&* + Xsing the same scale factor is sho7n useful 7hen solving the parameter estimation

pro6lem 8pro6lem # ORa6B193 7here the scaling coefficients for >and?7ill cancel out eachother e=actly+

'he follo7ing procedure sho7s the calculation of the scale factor 7hich as mentioned is also

used to scale?+ &n the procedure the denotation 98&t is the unscaled for7ard varia6le3

98f &t denote the scaled for7ard varia6le and 98ff &t denote the temporary for7ard varia6le

6efore scaling+

*+ &nitialiCation

Set t 84ZN&(5&

&& = *93898 **

N&&& = *93898ff **

=

= N

&

&

c

*

*

*

98

*

9898f *** &c& =

4+ &nduction

///////////////////////////////////////////////////////////////////////////@*


42/79



///////////////////////////////////////////////////////////////////////////

N&a9(5&N

9

9&tt&t = =

*398f9898ff

*

*

==

N

&t

t

&

c

*

98ff

*

N&&c& ttt = *938ff98f

#+ Xpdate time

Set t[ t *Z

Return to step 4 if t Z

Gther7ise3 terminate the algorithm 8goto step @9+

@+ 'ermination

=

=4

t

tc


43/79



///////////////////////////////////////////////////////////////////////////

Fig"re 6.3

///////////////////////////////////////////////////////////////////////////@#


44/79



///////////////////////////////////////////////////////////////////////////

'he resulting 6eta/scaled2

Fig"re 6.4

///////////////////////////////////////////////////////////////////////////@@


45/79



///////////////////////////////////////////////////////////////////////////

5+#+" (og 8P8GI993 save the pro6a6ility of the o6servation se


46/79



///////////////////////////////////////////////////////////////////////////

5+@ R$-$S'&MA'&G, G! 'H$ PARAM$'$RS !GR 'H$ MGD$(3 [8 3

(3 89

'he recommended algorithm used for this purpose is the iterative aum-elch algorithm that

ma=imiCes the likelihood function of a given model [8 3 (3 89 OMMORa6ODavid+ !orevery iteration the algorithm reestimates the HMM parameters to a closer value of the

_glo6al` 8e=ist many local9 ma=imum+ 'he importance lies in that the first local ma=imum

found is the glo6al3 other7ise an erroneous ma=imum is found+

'he aum-elch algorithm is 6ased on a com6ination of the for7ard algorithm and the

6ack7ard algorithm+

'he


47/79



///////////////////////////////////////////////////////////////////////////

5+@+* A/reest3 reestimate the state transition pro6a6ility matri=

hen solving pro6lem three3 GptimiCe model parameters ORa6B13 an adustment of the

parameters of the model is done+ 'he aum-elch is used as mentioned in the previous

section of this chapter+ 'he adustment of the model parameters should 6e done in a 7ay that

ma=imiCes the pro6a6ility of the model having generated the o6servation se


48/79



///////////////////////////////////////////////////////////////////////////

'he reestimation of theAmatri= is


49/79



///////////////////////////////////////////////////////////////////////////

5+" 'H$ R$SX(' 'H$ H&DD$, MARKG) MGD$(

5+"+* Save the Hidden markov Model for that specific utterance

After the reestimation is done+ 'he model is saved to represent that specific o6servationse


50/79



///////////////////////////////////////////////////////////////////////////

:+ HMM 'H$ '$S'&,Q G! A, GS$R)A'&G, 'he decoding pro6lem

hen comparing an o6servation se


51/79



///////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////////////////////////////////"*


52/79



///////////////////////////////////////////////////////////////////////////

:+* SP$$;H S&Q,A(

:+*+* Speech signal

'he signals used for testing purposes are ordinary utterances of the specific 7ord3 the 7ord to

6e recogniCed+

:+4 PR$PRG;$SS&,Q

:+4+* M!;; Mel !re


53/79



///////////////////////////////////////////////////////////////////////////

:+@ PRGA&(&'j $)A(XA'&G, (oglikelihood3 using

'he Alternative )iter6i Algorithm

:+@+* (og89

'he continuous o6servation pro6a6ility density function matri= is calculated as i the previous

chapter 5 +he training o) a model+ 'he difference is that the logarithm is used on

the matri= due to the constraints of 'he Alternative )iter6i Algorithm+

:+@+4 L3 delta

'o 6e a6le to search for the ma=imiCation of a single state path the need for the follo7ing


54/79



///////////////////////////////////////////////////////////////////////////

:+@+: Alternative )iter6i Algorithm

'he follo7ing steps are included in the Alternative )iter6i Algorithm O* O4 ORa6B1+

"+ PreprocessingN&&& = *93log8k @+:@ MM

N9&aa &9&9 = 3*93log8k

@+:" MM

5+ &nitialiCation

Set t [ 4Z

N&(5(5&& = *9938log898

k** @+:5 MM

N&(5& && += *938kk98

k**

@+:: MM

N&& = *3098* @+:B MM

:+ &nduction

N9(5(5 t9t9 = *9938log898k

@+:1 MM

N9a&(59 &9tN&

ttt ++=

*P3k98k

Oma=98k

98k

**

@+B0

M

N9a&9 &9tN&

t +=

*P3k98k

Oma=arg98 **

@+B* MM

B+ Xpdate time

Set t[ t *Z

Return to step # if t 'Z

Gther7ise3 terminate the algorithm 8goto step "9+1+ 'ermination

9P8k

Oma=k

*

N &* 4N&

= @+B4 MM

9P8k

Oma=arg*

&q 4N&

4

= @+B# MM

*0+ Path 8state se


55/79



///////////////////////////////////////////////////////////////////////////

;onsider a model 7ith , [ # states and an o6servation of length ' [ B+ &n the initialiCation 8t

[ *9 is * 8*93 * 849 and * 8#9 found+ (ets assume that * 849 is the ma=imum+ ,e=t time 8t [

49 three varia6les 7ill 6e used namely 4 8*93 4 849 and 4 8#9+ (ets assume that 4 8*9 is no7

the ma=imum+ &n the same manner 7ill the follo7ing varia6les # 8#93 @ 8493 " 8493 5 8*93 :8#9 and B 8#9 6e the ma=imum at their time3 see !ig+ :+*+

!ig+ :+* sho7s that the )iter6i Algorithm is 7orking 7ith the lattice structure+

Fig"re 7.1$=ample of )iter6i search

'o find the state path the 6acktracking is used+ &t 6egins in the most likely end state and

moves to7ards the start of the o6servations 6y selecting the state in the t 8&9 that at time t-*

refers to current state+

:+B R$SX('

:+B+* Score'he score according to the )iter6i algorithm+ 'he same as the calculated value (og8PN93

the ma=imiCation of the pro6a6ility of a single state path is saved as a result for each

comparison+ 'he highest score is naturally the highest pro6a6ility that the comparedmodel has produced the given test utterance+

///////////////////////////////////////////////////////////////////////////""


56/79



///////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////////////////////////////////"5


57/79



///////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////////////////////////////////":


58/79



///////////////////////////////////////////////////////////////////////////

B 'he !"## DSP

B+* 'he !"## $%-K&' (ite'he DSP used in this proect is an !"## $%-K&' (ite3 7hich is an fi= point digital signal

processor developed 6y Analog Devices+ 'his DSP is offering a good performance at a verylo7 po7er consumption+ &t have audio inout ports as 7ell as video inout+

ain )eat"res

;lock speed of :"0MHC

Audio ports3 three in and t7o out

!our B-6it )ideo A(Xs

@0-6it shifter

Dual *5-it multiplication accumulation 8MA;9

!riendly and easy to use compiler support+

'he soft7are used is )isual DSP3 also developed 6yAnal0 :e&ce!+All the programming on the !-"## is done in programming language c+

Fig"re %.1

///////////////////////////////////////////////////////////////////////////"B


59/79



///////////////////////////////////////////////////////////////////////////

B+4 SP$$;H S&Q,A('he human natural speech is collected to the DSP 6y a microphone connected to one of the

audio connectors on the !-"##+

B+4+* 'he talkthrough modification'o get started easier 7ith the programming of the DSP3 the test program alkthu0hprovided together 7ith the )isual DSP 3 is used+ After implementing that and after testing

that the audio inout from the DSP is 7orking 7e started the Speech Recognition

Programming3 6ased on the alkthu0he=ample+

B+4+4 &nterrupts'he ne=t issue to solve 7as to generate an interrupt 6y pushing a 6utton on the !-"##+

!or this3 the e=ample _link` provided 7ith D&!ual :/* ==is modified.

&nterrupts are programmed in that 7ay that every time one pushes a certain 6utton3 the Direct

Memory Access Serial Port3 DMA SPGR'3 is ena6led and the DSP is listening for incoming

speech signal from the microphone+,e=t time the 6utton is pushed the DMA SPGR' is disa6led and the speech recognition

program 6egins+

B+4+4 DMA3 Direct Memory Access'he DMA is a device for transferring data to or from other memory locations or peripherals

8in our case microphone9 7ithout the attention of the ;PX+ DMA* and DMA4 are mapped to

SPGR'/RY and SPGR'/'Y respectively3 and are configured to use *5-6it transfers+

B+4+4 !iltering

'he sampling fre


60/79



///////////////////////////////////////////////////////////////////////////

B+# PR$PRG;$SS&,Q

B+#+* Preemphasis'he filtering is done similarly as in matla6+ Same filter is used

cn!t flat hI2J 8 1,;.%"L

See Fig"re %.1for plot from the input signal 6efore preemphasiCing and Fig"re %.2after.

Fig"re %.2 'he signal =/utt8n9 6efore preemphasis

Fig"re %.3 'he 6lue signal =/utt8n93and the preemphasiCed red signal3 =/preemp8n9

///////////////////////////////////////////////////////////////////////////50


61/79



///////////////////////////////////////////////////////////////////////////

B+#+4 )oice Activation Detectionhen you have got access to a sampled discrete signal it is significant to reduce the data

to contain only the samples 7hich is represented 7ith signal values3 not noise+ 'herefore

the need of a good )oice Activation Detection function is needed+ 'here are many 7ays

of doing this+ )oice activation detection is calculated similarly as in matla6 8chapter @9+

'he varia6le alpha is though different from the matla6 version+ 'his varia6le is tuned due

to another noise to ratio- and processing values compare to matla6 environment+

See $


62/79



///////////////////////////////////////////////////////////////////////////

B+@ !RAM$(G;K&,Q V &,DG&,Q

B+@+* !rame6locking'he filtering is done similarly as in matla6+ 'he frame length is #40 and an overlap of 5031W

is used+ Fig"re %.4illustrates frame 40 after frame6locking+

Fig"re %.&

///////////////////////////////////////////////////////////////////////////54


63/79



///////////////////////////////////////////////////////////////////////////

B+@+4 indo7ing using Hamming 7indo7After the frame6locking is done a Hamming7indo7 is applied to each frame+ 'his

7indo7 is to reduce the signal discontinuity at the ends of each 6lock+ 'he formula used

to apply the 7indo7ing is sho7n inEq.+.+.

#+#+9*

4cos8@530"@3098 Eq

K

kkw

=

Fig"re %.& illustrates frame 40 after 7indo7ing+ ,ote that the result gives a reduction of the

discontinuity at the ends of the 6lock+

Fig"re %.6

///////////////////////////////////////////////////////////////////////////5#


64/79



///////////////////////////////////////////////////////////////////////////

B+" !$A'XR$ $Y'RA;'&G,A "*4 pont !!' is used on each 7indo7ed frame+ 'o adust the length3 Ceropadding is used+

Se figure :+:+

Fig"re %.7

'he functionfft256.c O6ook3 is modified in order to calculate the fft for the signal after

7indo7ing+ $=actly the same filter6ank as in matla6 is used+ See chapter #+@+4 _Mel spectrum

coefficients 7ith filter6ank`+ Mel-;epstrum coefficients are calculated usingEq. +.6+ A lo7-time lifter is then used in order to remove the first t7o coefficients using formula inEq. +."+

An e=tra coefficient 7hich contains the log of signal energy is added for each frame3 as sho7n

inEq. +.%+

B+5 !$A'XR$ )$;'GRS Mel !re


65/79



///////////////////////////////////////////////////////////////////////////

B+: '$S'&,Q'he models created as 7e descri6ed in chapter " are no7 stored in the DSP memory+ After 7e

processed the input speech from the microphone to the Mel !re


66/79



///////////////////////////////////////////////////////////////////////////

Fig"re %.%

'o ma=imiCe P 8


67/79



///////////////////////////////////////////////////////////////////////////

;alculating the state 7hich gave the largest (og8PN9 at time '+ Xsed in 6acktracking later

on+

B+1+5 PathState se


68/79



///////////////////////////////////////////////////////////////////////////

Gther7ise3 terminate the algorithm+

B+*0 D$('A V A;;$($RA'&G, ;G$!!&;&$,'S

'he delta and acceleration coefficients are calculated to increase the information of thehuman perception+ 'he delta coefficients are a6out time difference3 the acceleration

coefficients are a6out the second time derivative+

Delta and acceleration coefficients are though not used 7hen implementing the Speech

recognition in DSK3 6ecause of its small effect in the final result+

Another reason for not using the delta V acceleration coefficients in DSK is to save po7er

from processor and to shorten the calculation time+

B+** 'H$ R$SX('

B+**+* 'he Score

'he score from each stored model tested to7ards the input 7ord is stored in a result vector+

'he highest score is naturally the highest pro6a6ility that the compared model has produced

the given test utterance+

Depending of 7hich the score 7as3 7e turn on different ($Ds on the !"##+

7ord ($Ds ($Ds on DSP_one` 00000*

_t7o` 0000*0

_three` 0000**

_four` 000*00

_five` 000*0*

///////////////////////////////////////////////////////////////////////////5B


69/79



///////////////////////////////////////////////////////////////////////////

1+ $valuation

1+* MA'(A

1+*+* Mat(a6 Result

'he Mat(a6 result is given in +able 1A +able 2A +able 3A +able 4 and +able &. 'he different

ta6les sho7s the score for the different models to 6e the one generating the utterances tested+

'he num6er of utterances tested are "0 times " 8" different 7ords9+ 'here is a recognition rate

of *00W+

///////////////////////////////////////////////////////////////////////////51


70/79



///////////////////////////////////////////////////////////////////////////

'he follo7ing columns are the result score from testing utterances of Mne against the

different models+

*+0e00# N

Bone Bt!o Bthree B)o"r B)ie

-0+1::0 -4+1:55 -#+4"B# -4+4#:: -4+"0:0 -*+00B5 -#+*":* -#+4##@ -4+"#B@ -4+50#B

-0+B"0@ -4+10*4 -4+11#1 -4+*"55 -4+0:1" -0+1@@# -4+55@1 -4+1##4 -4+*5"# -4+4:"@

-0+1@5: -#+04@* -#+*B*4 -4+"5#@ -4+@"0:

-0+B1@0 -4+1B:@ -#+**4: -4+#1"5 -4+@#1* -0+1#B5 -4+"1@4 -4+1@4# -4+##5B -4+"44B

-0+BB4" -4+50B: -4+B@5# -4+@@B# -4+#1:B -0+B"B5 -4+:04B -4+1#B1 -4+45@B -4+#@45

-0+:1"0 -4+"B@# -4+5"1: -4+4*#4 -*+1*4# -0+B"4# -4+5*1@ -4+:1:# -4+@14@ -4+04#:

-0+1#1B -4+:"1# -4+:*BB -4+#"1: -4+41@# -0+14"* -4+::"4 -#+*"10 -4+:#1: -4+"5#5

-0+1:*B -4+1#4" -#+4@1B -4+5@#* -4+5#** -0+B:#: -4+:55* -#+*#"0 -4+"014 -4+@#:5

-0+1#"" -4+1:#0 -#+*:1: -4+B#@B -4+#:10 -0+1#0" -4+:1*5 -4+BB@: -4+B"@1 -4+4@#:

-0+1##4 -4+::*0 -#+0@#* -4+:#54 -4+@@55 -0+1#5B -#+0@1* -#+##B1 -4+501: -4+"#0#

-0+B"15 -4+B4:@ -4+B5#0 -4+*540 -4+#15# -0+1**0 -4+1@14 -4+1*11 -4+*5"4 -*+1:*B

-0+B0"* -4+B@@B -4+1:@4 -4+05:1 -4+4B*1 -0+B"1: -4+B:5* -4+1#00 -4+010# -4+#45@

-0+BB01 -#+*B"0 -#+*1*B -4+#B50 -4+#555

-0+:BB" -4+@::* -4+@:11 -4+014" -*+1"00 -0+B5*# -4+5:B: -4+:"#1 -4+#1B@ -4+**0#

-0+B5@@ -4+:@5: -4+1@01 -4+4#"5 -4+"40"

-0+B154 -4+50:* -4+B@B: -4+""@* -4+0404 -0+B0*0 -4+"*0: -4+51:@ -4+@4*5 -4+0**:

-0+:@45 -4+4"#4 -4+4"B# -*+B"5" -*+"B:5 -*+00*0 -4+B01@ -#+4":: -4+#0:5 -4+B"0B

-0+1B4: -4+1""# -#+*#1: -4+*4": -4+@511 -0+1:11 -#+0"40 -#+0@"" -4+@5#: -4+@1B1

-0+14#: -#+0B5: -#+**@: -4+*#5@ -4+040@ -0+1"@1 -4+B1*4 -4+1"#* -4+0B54 -4+@0@1

-*+0:5# -#+@*0: -#+@5#B -4+4B1" -4+@BB@ -*+0##1 -#+@@14 -#+@:5* -4+45B# -4+5#":

-0+1#@" -#+0**0 -4+B@#* -4+01#: -4+04B1 -*+0B0" -#+5@#0 -#+"@#0 -4+1@0B -4+5B*4

-*+4"40 -@+001B -#+1:4B -#+0:"* -4+B"54 -*+**"* -#+45B5 -#+4105 -4+:*00 -4+"::4

-0+1@4@ -#+4:B@ -#+**B4 -4+#"01 -4+410B -*+0*1B -#+#011 -#+*B:1 -4+445" -4+@@@*

-0+1*5B -4+15:: -4+1:45 -4+4":0 -4+"011

-*+014@ -#+"5*@ -#+@":B -4+B14: -4+5404 -*+00@" -#+4B#0 -#+01"# -4+"@#0 -4+@#@:

-*+0"50 -#+4@@@ -#+4@*: -4+54"@ -4+:0#0

-*+0B04 -#+#B"* -#+@"1" -4+@::4 -4+##1@ -0+1"": -#+404@ -#+45"1 -4+#51B -4+#B15

-*+**#: -#+@:@@ -#+"44* -4+"B"* -4+:404+able 9.1

'he first column al7ays sho7s the highest score+ Means that the recognition rate for theutterance Mneis *00W of the "0 utterances tested+

///////////////////////////////////////////////////////////////////////////:0


71/79



///////////////////////////////////////////////////////////////////////////

'he follo7ing columns are the result score from testing utterances of Mtw against the

different models+

*+0e00# N


-*+B4:* -0+5:15 -*+#5B5 -4+*:*0 -*+@044 -*+"@"5 -0+511: -*+@@5* -4+44** -*+4@1*

-*+B":5 -0+5::: -*+4*#" -4+*0B: -*+#00@ -*+"":4 -0+5@#4 -*+*"** -4+*4"0 -*+*5*B

-*+"5B* -0+541# -*+4*#0 -*+1*54 -*+40:4

-*+:0B1 -0+554* -*+4@5B -4+44*1 -*+4B0B -*+5#5@ -0+5B:5 -*+"*4# -4+"04@ -*+@#*"

-*+5@:B -0+50@* -*+#01# -4+4:0: -*+@4## -*+1:@0 -0+5:55 -*+41B4 -4+##*B -*+"40*

-*+B"00 -0+5:"" -*+4#*# -4+*""B -*+#0#* -*+B4** -0+:**4 -*+@010 -4+45B@ -*+@":@

-*+51** -0+5:00 -*+#BB5 -4+@5B0 -*+@@00 -*+:B14 -0+51"1 -*+@@0B -4+5*B4 -*+@:1*

-*+:5@5 -0+:**# -*+#@4@ -4+4#BB -*+@4": -*+44## -0+""5: -*+*1B" -*+B:B# -*+01:0

-*+#51: -0+5:45 -*+@#0* -4+*@*: -*+450* -*+@:41 -0+5"15 -*+#0#: -4+*4@1 -*+451"

-*+:*5" -0+515" -*+"04B -4+4@1B -*+@0*5 -*+"#:1 -0+:*4" -*+"*44 -4+#1B4 -*+#B45

-*+@@0" -0+:450 -*+#:": -4+#*## -*+#0#4 -*+@@4# -0+5#:: -*+#*40 -4+0##: -*+4@*#

-*+"B"B -0+"1## -*+40B@ -4+*5B* -*+4::: -*+5**1 -0+:00B -*+#5#@ -4+#:#@ -*+41"#

-*+"*4B -0+:*54 -*+@"B# -4+"*": -*+4:*1

-*+#:0" -0+5@@0 -*+#*05 -4+0B5B -*+*B:0 -*+"4": -0+5@1# -*+#1#B -4+451" -*+4@00

-*+54:B -0+55*" -*+@41" -4+#B41 -*+#B##

-*+B51@ -0+54:0 -*+45@0 -4+#*54 -*+#4@5 -*+:4:4 -0+55:1 -*+4#"1 -4+**#" -*+4#*1

-*+#5#* -0+"B0" -*+*4B# -*+1#44 -*+0@": -*+@454 -0+54:5 -*+#4** -4+0":: -*+01*#

-*+"1B# -0+50*# -*+*"#4 -4+*0#@ -*+#@## -*+5@*0 -0+"B*0 -*+*@:B -*+B1:1 -*+*1BB

-*+50B4 -0+":B: -*+*4#0 -*+1#*" -*+44@# -*+:::5 -0+5B01 -*+##4# -4+4#*4 -*+41BB

-*+:0@0 -0+:##0 -*+#@## -4+0@"@ -*+@#** -4+40:* -0+:55# -*+4B*1 -4+@5*: -*+"@"5

-4+0*@@ -0+5B@" -*+#@#: -4+#:@1 -*+@5#* -4+*#:1 -0+:0@: -*+#:@5 -4+5410 -*+5045

-*+BB:1 -0+5:@B -*+4410 -4+4"B4 -*+@0"0 -*+BB04 -0+5B05 -*+40#1 -*+11*: -*+@#B@

-4+0414 -0+:4#B -*+@##* -4+40"B -*+50:B -*+54"" -0+5*@B -*+445* -*+14*B -*+4#4#

-*+1#44 -0+5@B5 -*+*B:0 -4+**"4 -*+45"1

-4+**5: -0+55** -*+401B -4+0:*1 -*+#@B: -*+::": -0+5"B@ -*+##@5 -*+1B:B -*+45"*

-*+B"11 -0+50*1 -*+**05 -4+01"# -*+*B05

-*+B4:0 -0+5@#5 -*+*"11 -*+1B": -*+#55@ -*+14:4 -0+5"*0 -*+40@@ -4+4@"1 -*+@50"

-*+1@@0 -0+5::5 -*+445" -4+4#B5 -*+@44B+able 9.2

'he first column al7ays sho7s the highest score+ Means that the recognition rate for theutterance Mtwis *00W of the "0 utterances tested+

///////////////////////////////////////////////////////////////////////////:*


72/79



///////////////////////////////////////////////////////////////////////////

'he follo7ing columns are the result score from testing utterances of Mthee against the

different models+

*+0e00# N


-4+B*:B -*+"4@# -0+:5@: -4+"@#B -*+":05 -4+"@:* -*+#151 -0+:4"0 -4+#1"1 -*+@#B0

-4+"@#* -*+5:0* -0+B0"1 -4+:B4B -*+:@4@ -*+:@15 -*+0@01 -0+"@:* -4+0#41 -*+*:B"

-4+"@41 -*+@@"5 -0+5"@: -4+"0B1 -*+"0B*

-4+*"00 -*+4""4 -0+5#1B -4+@*41 -*+#B#B -4+##1B -*+#@05 -0+5#4@ -4+#:*@ -*+@@4B

-4+B@:" -*+"B10 -0+5:5@ -4+@144 -*+"445 -4+:5#0 -*+@5:# -0+:*#1 -4+"@0" -*+"#0B

-#+44@1 -*+55"1 -0+10## -4+51BB -*+:5@@ -4+5:*B -*+@1B@ -0+:4*: -4+""## -*+"51*

-4+"B0@ -*+5*44 -0+5"5B -4+":0* -*+54:4 -4+"5"* -*+@#"@ -0+5*5# -4+05@# -*+#"4@

-4+B*#0 -*+"@45 -0+:### -4+@@*# -*+"00* -4+B":* -*+5:0" -0+::5@ -4+:B:" -*+545*

-4+#*00 -*+#*B" -0+5@#@ -4+#*@# -*+#*B: -#+#B:" -*+:55: -0+:B0B -4+:*@* -*+:*#4

-#+0@@B -*+"B1: -0+:0:* -4+@:#: -*+"#B: -4+"*@@ -*+#"*0 -0+5"0" -4+*4*4 -*+#:14

-4+*"5* -*+44*" -0+"B*: -4+*@5: -*+#@01 -4+BB"B -*+5"#B -0+:B05 -4+4:"B -*+54@B

-#+00:# -*+"@1: -0+55#1 -4+@#05 -*+@#44 -#+0:45 -*+"@*1 -0+:"*5 -4+"#": -*+"@:0

-4+B551 -*+55*@ -0+::0* -4+BB4# -*+:#14

-#+@4@: -*+:1:1 -0+:*"0 -4+#110 -*+"::5 -4+#:#@ -*+#:0# -0+5#1" -4+"#*5 -*+@5#*

-4+@4#* -*+410B -0+"1B1 -4+@@04 -*+@*4:

-4+B*15 -*+@B00 -0+::@B -4+B4BB -*+5:"5 -4+B5"0 -*+51@1 -0+:40B -4+:""B -*+5B:"

-4+B0B: -*+:4#* -0+:"1* -4+@14: -*+5541 -4+5:45 -*+""41 -0+5B:# -4+55#4 -*+"B#5

-4+@454 -*+@:5# -0+50B4 -4+":@1 -*+"*:" -#+#@BB -*+BB51 -0+:"** -4+1@4: -*+B@0B

-4+@40@ -*+"##" -0+5:@# -4+:5:@ -*+5":B -4+:@#B -*+":"@ -0+:#5@ -4+1@*: -*+B#":

-4+"5:B -*+@5B# -0+:01: -4+B*0@ -*+5B@B -4+#40B -*+@0:* -0+5B"4 -4+B0"5 -*+555*

-4+#:5# -*+##*B -0+:*** -4+B4B4 -*+5#4B -4+*":B -*+#"1@ -0+5410 -4+"*@@ -*+":@#

-4+@@:1 -*+@::B -0+51*B -4+@#4B -*+"#1# -4+5**" -*+:@04 -0+:4@: -4+@:5" -*+:*:*

-4+450" -*+@@1@ -0+55@* -4+54#1 -*+5*1* -4+#44" -*+@0*0 -0+5@4: -4+54#: -*+"B#5

-4+::@4 -*+"::5 -0+5"51 -4+5410 -*+55#5

-4+"515 -*+"@@* -0+5@@B -4+":0@ -*+""B4 -4+:004 -*+54B0 -0+1"1B -4+:*0* -*+:@5@

-4+*#14 -*+@*15 -0+5#11 -4+"4": -*+@"4*

-4+#5:# -*+@11* -0+54:5 -4+"0#1 -*+@B5@ -4+*B": -*+#541 -0+5"44 -4+#1:B -*+#1#1

-4+:00# -*+"":B -0+:@:@ -4+#":@ -*+@"5:+able 9.3

'he first column al7ays sho7s the highest score+ Means that the recognition rate for theutterance Mtheeis *00W of the "0 utterances tested+

///////////////////////////////////////////////////////////////////////////:4


73/79



///////////////////////////////////////////////////////////////////////////

'he follo7ing columns are the result score from testing utterances of Mfu against the

different models+

*+0e00# N


-4+#B## -#+*@@B -4+@:B" -0+B10: -*+B50* -4+B@"5 -#+#5@1 -4+:1@# -*+04@* -4+0#""

-4+401# -4+:1"" -4+#@44 -0+B::5 -*+1#:1 -4+@5@0 -#+#""1 -4+:145 -0+155B -4+0"BB

-0+5B:0 -4+@04# -4+#*0" -0+50#4 -*+51*"

-4+4::B -#+04** -4+"0*@ -0+10B: -*+1*"5 -*+44"0 -4+@@:* -4+":1# -0+5:40 -*+B#4#

-*+0"50 -4+:B0# -4+5514 -0+514: -4+0#@1 -*+010* -4+#4"* -4+*"B4 -0+5*#" -*+5*B#

-0+B@5# -4+05#B -*+BB4B -0+"1#4 -*+@B@" -*+0#** -4+@*B" -4+*@50 -0+5#4* -*+"B"4

-4+1*#5 -#+*:5* -4+:0@4 -*+0"5: -4+04@1 -*+B"*@ -4+5*04 -4+#5@# -0+B#"B -*+:115

-*+@#:* -4+"B5: -4+#B"" -0+5B0: -*+B0B: -*+*44: -4+"455 -4+5*5" -0+5B"5 -*+B5"B

-4+#5"4 -#+*0"# -4+B"B0 -0+1:4@ -4+*"*0 -*+44"0 -4+@@:* -4+":1# -0+5:40 -*+B#4#

-*+00#5 -4+:0*: -4+:B:: -0+:**1 -4+0*0@ -4+0*@# -4+1#0: -4+10:1 -0+B"*" -4+0@@#

-0+1##0 -*+:1B@ -*+BB*0 -0+55*# -*+4105 -*+:@"" -4+@@0@ -4+44*# -0+:B@@ -*+:*0#

-*+*:"" -4+:451 -4+:#1: -0+5B@* -*+1*5" -0+1"14 -4+"#1* -4+54#4 -0+5@1# -*+B001

-*+B4"B -4+1#*# -4+B:01 -0+BB50 -4+0B":

-*+@4:5 -4+5@5@ -4+@@15 -0+:#1: -*+B*@1 -*+1"#" -4+B#0@ -4+@110 -0+B""@ -*+11:B

-0+1#"" -4+"4*" -4+"0@: -0+55:1 -*+B**0

-4+@#"* -4+4B1@ -*+"1@1 -0+B:#* -*+@5## -0+1151 -4+#BB1 -4+41*" -0+5:4# -*+:"4*

-*+*0#* -4+41"B -4+45@5 -0+51B4 -*+5:#: -0+B11" -4+#4B1 -4+*150 -0+"14B -*+":*0

-*+@:"B -4+5055 -4+@5*# -0+:@:1 -*+B050 -*+4### -4+B**1 -4+::4* -0+:B14 -*+5150

-0+10B@ -4+##*# -4+##:# -0+5*** -*+5@1: -0+1:#0 -4+4*"1 -4+#0:1 -0+5**: -*+"*#4

-*+0#B4 -4+5055 -4+B"50 -0+5B41 -*+10B: -*+*:05 -4+"0#: -4+5@"B -0+:#04 -*+:#45

-*+**1: -4+""*4 -4+50** -0+:0#@ -*+:441 -*+*5B" -4+B"10 -4+51:0 -0+:@:* -*+15:0

-*+B**" -4+:444 -4+5:5* -0+B@"1 -*+1514 -4+45:0 -4+"@:5 -4+4:40 -0+:5:4 -*+"#"5

-*+05*0 -4+5#4" -4+:0*@ -0+5BB* -*+B5BB -*+@":0 -4+:"1" -4+1#@4 -0+B*5" -4+0*5B

-4+5:@1 -#+4*@" -4+B"50 -0+1B"@ -4+0::B

-*+5"55 -4+:*0" -4+:B"B -0+::*0 -*+1450 -*+414# -4+:#5: -4+:B@0 -0+:#@0 -*+B51"

-*+40"# -4+54@4 -4+"1:* -0+5:"4 -*+::0B

-*+@#B" -4+::00 -#+0##0 -0+:14# -*+155@ -*+1:** -4+:0B4 -4+":*@ -0+B"*@ -*+:"*@

-*+*:4: -4+""4@ -4+#*#* -0+:0B# -*+:#@B+able 9.4

'he first column al7ays sho7s the highest score+ Means that the recognition rate for theutterance Mfuis *00W of the "0 utterances tested+

///////////////////////////////////////////////////////////////////////////:#


74/79



///////////////////////////////////////////////////////////////////////////

'he follo7ing columns are the result score from testing utterances of Mf&e against the

different models+

*+0e00# N


-*+""5# -4+00"1 -*+5454 -*+B@5@ -0+::"# -4+0B0" -*+1B14 -*+#@@@ -*+@45B -0+:1"B

-*+:@#: -4+00#@ -*+:"*4 -4+#11B -0+B4"5 -*+40"B -*+@:5" -*+*#1# -*+0:": -0+"01"

-*+@#:5 -*+@00B -*+*4@5 -0+1B0" -0+"1*B

-*+0B5* -*+#B@@ -*+4#5@ -*+*"@# -0+"0@4 -*+#B:5 -*+""#@ -*+4#04 -*+4*54 -0+":1B

-*+"1@B -*+:B11 -*+"B05 -*+"B4@ -0+5"5* -*+111" -*+:540 -*+4:5: -*+*0:" -0+55##

-*+B#@0 -*+1:04 -*+515@ -*+B"5" -0+:015 -4+5B1# -4+50B0 -4+##4@ -4+BB"# -*+00*1

-*+40B@ -*+@#1B -*+44:0 -*+**01 -0+"*@5 -*+#@@5 -*+54:5 -*+@*1@ -*+B5:B -0+551*

-*+:##B -*+B511 -*+51@B -*+1@@* -0+:"0# -*+011@ -*+@1B@ -*+#*## -*+*:0* -0+"#@4

-*+#@"1 -*+5"5B -*+5*#5 -*+1:"5 -0+5"*5 -*+04*: -*+@0#: -*+*41" -*+4@"B -0+"@#:

-4+0@00 -4+#0#@ -4+**11 -4+::41 -0+1045 -*+4B41 -*+5*#1 -*+@5B* -*+1::1 -0+:@11

-4+4@*1 -4+*B*4 -*+55*5 -*+5":0 -0+B@@* -*+#01B -*+5"11 -*+5*"1 -*+:@:# -0+51::

-*+BB*5 -*+B:0: -*+#155 -*+@"0@ -0+:":B -*+:#"0 -*+:1:B -*+@05B -*+#0*# -0+5:45

-*+B00# -*+B::* -*+541# -*+54B0 -0+5B0B

-*+@:4# -*+:1"0 -*+"":1 -*+55"" -0+5"5@ -*+:001 -*+""*4 -*+4*"0 -*+@4*# -0+":1#

-0+:BB: -*+*::0 -0+1115 -*+*B@" -0+"*4B

-*+*@:" -*+@051 -*+4#:" -*+B#*B -0+"11@ -4+005" -*+B1"* -*+#"01 -*+#""B -0+:0*4

-0+B*1B -0+11*" -0+B14: -*+0B## -0+#B1* -*+50B0 -*+B@11 -*+#@@0 -*+@#0" -0+551:

-4+0:4# -4+5@4: -4+#0## -4+@B:4 -*+0*@5 -*+@B5B -*+"144 -*+#@"1 -*+##:@ -0+""4@

-4+*B#1 -4+*"5@ -*+1":B -*+1:#" -0+B*:# -*+@1:# -*+:""@ -*+"4@: -*+#"#B -0+504*

-4+0B01 -*+B1:* -*+5044 -*+5""B -0+51## -*+@#:4 -*+101: -*+B4@# -4+0@40 -0+:*@B

-4+*50@ -4+04@5 -*+5#@B -*+115B -0+::41 -*+B5B4 -*+B@@# -*+@:5* -*+#50" -0+5#41

-4+*1*" -*+1@#0 -*+@B#@ -*+4@0" -0+510* -*+1451 -*+B45" -*+#104 -*+@010 -0+5B51

-4+5":B -4+*05B -*+5@*0 -*+B@B4 -0+BBB1 -4+@5*" -4+40@4 -*+1*:: -*+B15* -0+B@:4

-*+1:## -4+*@0B -*+54*B -4+*"#4 -0+B#:0

-*+5#:" -*+B410 -*+:B*1 -*+1B:@ -0+:#41 -*+*B@: -*+4455 -*+40#5 -*+@:45 -0+"#B4

-*+4:4: -*+@B"1 -*+4:54 -*+*@:4 -0+"41:

-*+5111 -*+5:B5 -*+5*5" -*+51#* -0+5BB4 -*+*5:5 -*+"4"* -*+#:#1 -*+*B#@ -0+"0@5

-*+41*@ -*+@B*# -*+44*1 -*+*"B# -0+":04+able 9.&

'he first column al7ays sho7s the highest score+ Means that the recognition rate for theutterance Mf&eis *00W of the "0 utterances tested+

///////////////////////////////////////////////////////////////////////////:@


75/79



///////////////////////////////////////////////////////////////////////////

1+4 DSK

1+4+* DSK Result

'he ta6les 6elo7 are sho7ing the result 7hen tested the 7ords _one`-_five`+ 'he result is

very satisfying 6ecause the models are not trained from the same person 7hich tested these

7ords+

'he follo7ing columns are the result score from testing utterances of Mne against the

different models+


*5B4+"*1*# -4*B1+@0:"0 -4#4@+4BB:" -4#"B+@4:4" -*1@@+"5*5#

-*B*4+0*05# -45*1+#B:4" -454#+001:" -451#+5#@4" -*144+*""BB

-*:@B+1":#B -4#B"+@#5:" -4"4"+004:" -4":"+@14"0 -*10@+455:"

-*1:4+*#""0 -4BB:+4044" -41"5+4:B00 -#0@B+@0*"0 -4405+:0#:"

-*54"+"0@*# -4##*+B4:"0 -4"0*+55":" -4"#1+@004" -*B*#+*"#BB

-*:B*+*#5:" -4#:4+#1":" -4"#1+B1:4" -454*+54@"0 -*B1@+@#@"0

-*B0@+*00BB -4"1:+:B#:" -45:1+5@0:" -4:BB+:"*00 -*BBB+@45:"

-*B4*+5115# -4:10+BB1"0 -4:B1+0##4" -4B1:+B0#00 -40#*+*54#B

-*:"*+*4*4" -4501+@#4:" -4"1#+01*4" -45*4+#"100 -40#B+@B*"0

-*BB#+@"B4" -4B@4+4:500 -4B5*+B1@"0 -414"+:044" -4*BB+:#@00

-*5@:+B14:" -4"*:+4@1:" -4"4B+B05"0 -45B0+#:""0 -*154+:@B"0

'he follo7ing columns are the result score from testing utterances of Mtw against the

different models+


-*:*0+1:B00 -*01@+00*5# -400*+:@1BB -400"+1":#B -*54@+#*000

-*:*1+4@@#B -***4+"514" -*1:*+4"@5# -44#:+#4B00 -*5@5+*BB:"

-*1:*+@41*# -***:+#11:" -*114+4:"#B -404:+":B:" -*:14+4#1"0

-40B@+45@5# -**14+4*:00 -*B5B+:10*# -*1#@+"15BB -*B0@+054#B

-4@4@+"B0:" -*#0:+B1:4" -4@**+B1500 -4":1+#5*00 -44@1+@@@00

-*:B"+"B@4" -*0*:+5*45# -*5B:+:##"0 -*5"5+B1@00 -*"4#+:"B"0

-*15:+":1BB -*0"B+B05BB -*:5@+:##5# -*:1*+1B@BB -*:#0+4:5:"

-*:0@+05@:" -111+1B:*0 -*"##+0*"5# -*:@:+"1*4" -*50"+#1@*#

-40:"+:0@5# -*04"+4":"5 -*:*1+1B15# -*B51+:"05# -*:BB+B:4:"

-*B:0+"41#B -1@0+0B0#0 -*@*:+:*45# -*5@B+*1#4" -*"*4+"B5*#

-*1::+0*@00 -**B:+"*"#B -*B:0+:*45# -*1@"+01:BB -*B05+54BBB

///////////////////////////////////////////////////////////////////////////:"


76/79



///////////////////////////////////////////////////////////////////////////

'he follo7ing columns are the result score from testing utterances of Mthee against the

different models+


-*@@1+@B":" -*#4*+###4" -*0:*+4B1:" -*5@"+:5:4" -*@0B+"*1"0

-4011+B05"0 -*5B@+0*"5# -*4"4+@"0"0 -40B@+@#B:" -40*1+#4:@@

-4*15+":1:" -*:B5+:41"0 -*@05+#B*"0 -4*4"+@@:4" -40*5+@"1:"

-*"51+5*@4" -*#"0+@@@:" -*40*+5"B#B -*:#@+*@45# -*"#5+#4#*#

-*B1*+:0@5# -*@4@+"4B00 -**1*+:00"0 -*11B+500#B -*501+**400

-4*4#+@**00 -*BBB+5#"#B -*44#+45#5# -4410+0":00 -*1:1+@0400

-4*#"+1"44" -*5:0+B*#BB -*4"0+"@@BB -4*00+4@400 -*::*+5:0*#

-*55:+05::" -*@"*+0:500 -*0"4+5":*# -*:1:+*""00 -*"55+#41BB

-40@*+15"*# -*"::+5#"5# -*44#+4@04" -40#:+15@"0 -*:@#+4:0"0

-*1"B+1*15# -*5"4+@#400 -*"*0+151*# -4*00+@"B00 -*:50+51400

-*5#B+4B:00 -*###+*:""0 -*05:+4145# -*:"5+"@"5# -*"*4+#:#4"

-*"*@+@:@BB -*":@+*14:" -**B*+"0"5# -40#1+4B#"0 -*""5+0*4#B

-*B01+BB4BB -*:**+5***# -*4:#+##4BB -444@+#*0"0 -*55"+140#B

-*B#:+5@0*# -*B0*+B@B"0 -*@0B+:0@BB -44B5+40B:" -*:5@+450:"

'he follo7ing columns are the result score from testing utterances of Mfu against the

different models+


-4**B+15::" -45"B+:@1:" -#0#0+:4@:" -40:#+"5@"0 -4514+*:100

-4"@5+15100 -###1+"514" -#@:@+15100 -4##*+515"0 -411:+4*100

-*#1"+B*::" -405@+#@*:" -4*@5+1B@4" -*#*5+"*5#B -*5B#+00*5#

-4:B"+*50:" -#@*0+*#04" -#*11+151:" -4":*+1#500 -41#*+4@"4"

-*15"+4:05# -44:1+@4#"0 -4"04+5@B00 -*:0#+#1"00 -4*"@+#:0:"

-*100+@4@*# -4*B:+#"*4" -44*"+"@1"0 -*5*:+:B@*# -4@:*+"1"00

-#@"1+0B000 -#"05+5:B"0 -#"@:+10#"0 -4:4*+"*:4" -#:*B+*::"0

-4:#B+#**:" -4:1#+""::" -41*1+*:@00 -44#"+#*54" -41:0+*::4"

-40:"+:14:" -4B41+#40"0 -4B@1+#@14" -*B:1+450BB -4"0:+@:#"0

-*B4"+B@0BB -4#04+0044" -4"::+:4:4" -*"B4+45##B -4**1+"@0:"

///////////////////////////////////////////////////////////////////////////:5


77/79



///////////////////////////////////////////////////////////////////////////

'he follo7ing columns are the result score from testing utterances of Mf&e against the

different models+


-45"1+:00:" -#0B*+##@"0 -#***+14*00 -##1"+@"B:" -4*B0+5B44"

-4@B0+"@4:" -4B10+4**"0 -4B"5+*0":" -#0@"+##4"0 -4*1B+@4*00

-450:+*5B:" -#*40+@#:00 -4B#:+05#00 -##0B+:@14" -*B*@+4:05#

-4#""+#*::" -4""0+*:4:" -4"":+#5*4" -41*0+1:*"0 -*1"#+"::4"

-4#10+"**:" -4514+#:@"0 -41B*+50*4" -#4@*+"#"00 -*B4@+4"B:"

-444*+@*@4" -4504+"4"00 -45*:+@#*:" -4:4#+:01:" -*5@@+4:B5#

-4@:@+514:" -451*+114:" -4:#@+:@4"0 -#04@+#B#4" -*B5:+#50"0

-4*B:+#5400 -4"B4+@#100 -4@*#+55":" -4B5@+4*@"0 -*5:*+"::4"

-4040+#4@5# -4"04+#:B:" -4451+14400 -4:54+000"0 -*":#+*1:5#

-4@4*+4:""0 -450:+@B54" -4:BB+:15:" -41""+B4B"0 -40:"+:*:"0

-*1B@+1044" -4#:*+@14:" -4@*#+B15:" -450#+114:" -*"11+4B4:"

///////////////////////////////////////////////////////////////////////////::


78/79



///////////////////////////////////////////////////////////////////////////

*0+ ;onclusion

*0+* ;G,;(XS&G,

'he conclusion of this master degree proect is that the theory for creating Mel !re


79/79



///////////////////////////////////////////////////////////////////////////

**+ References

OMM Mikael ,ilsson3 Marcus $narsson3 /eech Cec0n&t&n u!&n0 H&''en -ak

-'el, M$$-0*-4:

ORA (a7rence R+ Ra6iner+ Proceedings of the &$$$3 )G(+ ::3 ,G+ 43 !$RXARj

*1B1+

ODavid David Meier3 /eech Cec0n&t&n u!&n0 H&''en -ak -'el,M$$-1B-*@

Oohn Deller ohn R+3 r+3 Hansen ohn +(+3 Proakis ohn Q+ 3:&!cete &me

*ce!!&n0 f /eech /&0nal!3 &$$$ Press3 &S, 0-:B0#-"#B5-4

O6ook ;hassaing Rulph3 DSP applications using ; and the 'MS#40;5= DSK3 40043

&S, 0-@:*-40:"@-#

OSS Anders Sv?rdstr.m3 /&0nale ch !!tem3 &S, 1*-@@-00B**-4

speech recognition using hidden markov model_mee_03_19 (1)

Documents