performance analysis of bangla speech recognizer model using hmm

Performance analysis of Bangla Speech Recognizer

model using Hidden Markov Model (HMM)

Submitted by: Md. Abdullah-al-MAMUN

OUTLINEOUTLINE What is speech recognition ?What is speech recognition ? The Structure of ASR The Structure of ASR Speech DatabaseSpeech Database Feature ExtractionFeature Extraction Hidden Markov ModelHidden Markov Model

Forward algorithmForward algorithm Backward algorithmBackward algorithm Viterbi algorithmViterbi algorithm

Training & RecogntionTraining & Recogntion ResultResult ConclusionsConclusions ReferencesReferences

What is What is SSpeech peech RRecognitionecognition??

In Computer Science, In Computer Science, Speech Speech recognitionrecognition is the translation of is the translation of spoken words into text .spoken words into text . Process of converting acoustic Process of converting acoustic signal captured by microphone to a signal captured by microphone to a set of words.set of words. Speech recognition known as Speech recognition known as “Automatic Speech Recognition (ASR) “Automatic Speech Recognition (ASR) ”, “Speech to Text (STT)".”, “Speech to Text (STT)".

Model of Model of BBangla angla SSpeech peech RRecognitionecognition

Fig -1 : Simple model of Bangla Speech Recognition

Database Signal Interface

Feature Extraction

Recognition

DatabasesTraining HMM

The Structure of The Structure of ASRASR System:System:

Figure 1 :Functional Scheme of an ASR SystemFigure 1 :Functional Scheme of an ASR System

Speech samples

Speech Database:Speech Database:-A speech database is a A speech database is a collection of recorded speech collection of recorded speech accessible on a computer and accessible on a computer and supported with the necessary supported with the necessary transcriptions.transcriptions.-The databases collect the The databases collect the observations required for observations required for parameter estimations.parameter estimations.-In this ASR system, I have used In this ASR system, I have used about 1200 keywords.about 1200 keywords.

Classification of Classification of KeywordsKeywords

Bengal Word

Independent

Dependent

Consonant

Modifier

Character

Compound

Character

DDatabase atabase CCreation reation PProcessrocess

Database

Speech Signal AnalysisSpeech Signal Analysis

Feature Extraction for Feature Extraction for ASR:ASR:

- The aim is to extract the voice - The aim is to extract the voice features to distinguish different features to distinguish different phonemes of a language.phonemes of a language.

515645465

156156165

156456454

251561565

Feature Extractio

MFCCMFCC extractionextraction

Pre-emphasis DFTMel filter

banks Log(||2) IDFT

Speech

signalx(n)

WINDOW

x’(n)

xt (n)

MFCCyt(m)(k)

MFCC means Mel-frequency cepstral coefficients that representation of the short-term power spectrum of a sound for audio processing.

The MFCCs are the amplitudes of the resulting spectrum.

Speech waveform Speech waveform of a phoneme “\of a phoneme “\

ae”ae”

After pre-emphasis After pre-emphasis and Hamming and Hamming

windowingwindowing

Power spectrumPower spectrum MFCCMFCC

Explanatory ExampleExplanatory Example

FFeature eature VVector to ector to P(O|M)P(O|M) via via HMMHMM

51564654564

P(O|M)

For each input word O the HMM generate a corresponding probability P(O|M) that could be computed by the HMM.

HMM ModelHMM Model

HMM is specified by a five-tuples λ=( , , , , )S O A B

Elements of an HMMElements of an HMM

1) Set of hidden states 1) Set of hidden states S={1.2., … … N}S={1.2., … … N}

2) Set of observation symbols 2) Set of observation symbols O={oO={o11, o, o22, … … o, … … oMM}}

M: the number of observation symbolsM: the number of observation symbols

3) The initial state distribution3) The initial state distribution

4) State transition probability distribution4) State transition probability distribution

5) Observation symbol probability distribution in 5) Observation symbol probability distribution in state j state j

1{ } ( | ), 1 ,ij ij t tA a a P s j s i i j N

{ ( )} ( ) ( | ) 1 ,1j j t k tB b k b k P X o s j j N k M

0{ } ( ) 1i i P s i i N

Three Basic Problems in Three Basic Problems in HMMHMM 1.The Evaluation Problem 1.The Evaluation Problem –Given a model –Given a model λλ =(A, B, =(A, B,

π)π) and a sequence of observations O and a sequence of observations O = (o = (o11, o, o22, , oo33,...o,...oMM ) ), what is the probability P(O|, what is the probability P(O|λλ); i.e., the ); i.e., the probability of the model that generates the probability of the model that generates the observations?observations?

2.The Decoding Problem 2.The Decoding Problem – Given a model – Given a model λλ =(A, B, =(A, B, π)π) and a sequence of observation O and a sequence of observation O = (o= (o11, o, o22, , oo33,...o,...oMM ) ), what is the most likely state sequence in , what is the most likely state sequence in the model that produces the observations?the model that produces the observations?

3.The Learning Problem 3.The Learning Problem –Given a model –Given a model λλ =(A, B, π) =(A, B, π) and a set of observations and a set of observations O = (oO = (o11, o, o22, o, o33,...o,...oMM ) ), how , how can we adjust the model parameter can we adjust the model parameter λλ to maximize to maximize the joint probability P(O|the joint probability P(O|λλ)?)?

How to evaluate an HMM?

Forward Algorithm

How to Decode an HMM?

Viterbi Algorithm

How to Train an HMM?

Baum-Welch Algorithm

Calculate Calculate PProbability robability ( O| M )( O| M )

Trellis:

P(up)P(down) P(no-change)

0.30.30.4

0.70.10.2

0.10.60.3

0.35*0.2*0.3

0.02*0.5*0.7

0.09*0.4*0.7

0.02*0.2*0.3

0.09*0.5*0.3

0.35*0.6*0.7 0.179*0.6*0.7

0.008*0.5*0.7

0.036*0.4*0.7

0.60.50.4

0.20.30.1

0.2 transition matrix0.5

0.2230.46add probabilities !

Forward Calculations – Forward Calculations – OverviewOverview

a12=0.3

a11=0.7

a22=0.5

a21=0.5

TIME 2 TIME 3 TIME 4

0.60.10.3

0.10.10.2

Forward Calculations (t=2)Forward Calculations (t=2)

a12=0.3

a11=0.7

a22=0.5

a21=0.5

TIME 2

NOTE: that 1 (2)+ 2 (2)is the likelihood of the observation.

1 1 13 11 2 23 21

2 1 13 12 2 23 22

(2) (1) (1) 0.21

(2) (1) (1) 0.09

b a b a

0.60.10.3

0.10.10.2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

0.60.10.3

0.10.10.2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

0.60.10.3

0.10.10.2

Forward Calculation of Forward Calculation of Likelihood FunctionLikelihood Function

t=1 t=2 t=3 t=4

1(t) 1.0

1(1) a11 b13

+2(1) a21 b23

0.04621(2)a11 b12

+2(2)a21 b12

0.021294

2(t) 0.0

0.09 1(1) a12 b13

+2(1) a22 b23

0.0378 0.010206

p(K1… Kt)

1.01(1) +2(1)

0.31(2) +2(2)

0.0841(3) +2(3)

0.03151(4) +2(4)

Backward Calculations – Backward Calculations – OverviewOverview

a12=0.3

a11=0.7

a22=0.5

a21=0.5

0.60.10.3

0.10.10.2

Backward Calculations (t=3)Backward Calculations (t=3)

TIME 3

0.60.10.3

0.10.10.2

a22=0.5

a11=0.7

a12=0.3a21=0.5

NOTE: that 1 (2)+ 2 (2)is the likelihood the observation/word sequence.

1 1 11 12 2 12 12

2 1 21 22 2 22 22

(3) 0.6

(3) 0.1

(2) (3) (3) 0.045

(2) (3) (3) 0.245

a b a b

0.60.10.3

0.10.10.2

a12=0.3

a11=0.7

a21=0.5

0.60.10.3

0.10.10.2

Backward Calculation of Backward Calculation of Likelihood FunctionLikelihood Function

t=1 t=2 t=3 t=4

1(t) 0.0315 0.045a11b11 1(1) ++ a12b21 1(1)

0.6b11

2(t) 0.029 0.245 a11b11 1(1) +

+ a12b21 1(1)

0.1 b21

L(t)p(Kt… KT)

0.03151 1(1) +

2 2(1)

0.2901(2) +2(2)

0.71(3) + 2(3)

Calculate Calculate maxmaxSS Prob. Prob. state sequence state sequence SS

P(up)P(down) P(no-change)

0.30.30.4

0.70.10.2

0.10.60.3

0.35*0.2*0.3

0.02*0.5*0.7

0.09*0.4*0.7

0.02*0.2*0.3

0.09*0.5*0.3

0.35*0.6*0.7 0.147*0.6*0.7

0.007*0.5*0.7

0.021*0.4*0.7

Select highest probability !

Viterbi Algorithm – OverviewViterbi Algorithm – Overview

a12=0.3

a11=0.7

a22=0.5

a21=0.5

0.60.10.3

0.10.10.2

Viterbi Algorithm (Forward Calculations Viterbi Algorithm (Forward Calculations t=2)t=2)

a12=0.3

a11=0.7

a22=0.5

a21=0.5

TIME 2

1 1 13 11 2 23 21

2 1 13 12 2 23 22

(2) max{ (1) , (1) } 0.21

(2) max{ (1) , (1) } 0.09

b a b a

0.60.10.3

0.10.10.2

Viterbi Algorithm (Backtracking t=2)Viterbi Algorithm (Backtracking t=2)

a12=0.3

a11=0.7

a22=0.5

a21=0.5

TIME 2

1 1 13 11 2 23 21

2 1 13 12 2 23 22

(2) max{ (1) , (1) } 0.21

(2) max{ (1) , (1) } 0.09

b a b a

0.60.10.3

0.10.10.2

Viterbi Algorithm (Forward Viterbi Algorithm (Forward Calculations)Calculations)

a12=0.3

a11=0.7

a22=0.5

a21=0.5

0.60.10.3

0.10.10.2

Viterbi Algorithm (backtracking)Viterbi Algorithm (backtracking)

a12=0.3

a11=0.7

a22=0.5

a21=0.5

0.60.10.3

0.10.10.2

Viterbi Algorithm (Forward Calculations t=4)Viterbi Algorithm (Forward Calculations t=4)

a12=0.3

a11=0.7

a22=0.5

a21=0.5

0.60.10.3

0.10.10.2

Viterbi Algorithm (Backtracking to Obtain Viterbi Algorithm (Backtracking to Obtain Labeling)Labeling)

a12=0.3

a11=0.7

a22=0.5

a21=0.5

0.60.10.3

0.10.10.2

Implementing Implementing HMMHMM to speech to speech ModelingModeling

( (TrainingTraining and and Recognition Recognition ))

- Building HMM speech models based - Building HMM speech models based on the correspondence between the on the correspondence between the observation sequences observation sequences YY and the state and the state sequence (sequence (SS). ). (TRAINNING).(TRAINNING).- Recognizing speech by the stored - Recognizing speech by the stored HMM models HMM models and by the actual and by the actual observation Y. observation Y. (RECOGNITION)(RECOGNITION)

Training HMM

Feature Extraction

RecognitionW*Y

Speech Samples

RECOGNITIONRECOGNITION Process Process Given an input speech Given an input speech S=(sS=(s11,s,s22,…,s,…,sTT)) be the recognized . be the recognized . xxtt be the feature samples computed at time be the feature samples computed at time tt, where the feature , where the feature

sequence from time sequence from time 11 to to t t is indicated as: is indicated as: X=(xX=(x11,x,x22,…,x,…,xt t )).. The recognized states The recognized states S*S* could be obtained by: could be obtained by:

S*=ArgMax P(S,X|S*=ArgMax P(S,X|))..

Dynamic Structure

Search Algorithm

Static Structure

St , P(xt,{st}|{st-1},)

{St-1}

ResultResult ((SSpeaker peaker RRecognition)ecognition)

Table 1: Speaker recognition result

ResultResult ((IIsolated solated SRSR))

Table 3: Result for isolated speech recognition.

Result Result ((CContinuous ontinuous SRSR))

Table 3: Continuous Speech recognition result

ConclusionsConclusions No speech recognizer till now has No speech recognizer till now has

100% accuracy. 100% accuracy.

You should avoided poor quality You should avoided poor quality microphone consider using a better microphone consider using a better microphonemicrophone

On important matter is that , On important matter is that , training the computer will provide training the computer will provide an even better experience.an even better experience.

Thank YouThank You

performance analysis of bangla speech recognizer model using hmm

collection ofa speech

textspeech recognition

asr systemfigure

automaticspeech recognition

set of words

translation of spoken

shortterm power spectrum

functional scheme

Engineering

building a highly accurate mandarin speech recognizer

hmm boxsofa

for example: human protein sequence query sequence 1. create...

sudoku downloader and recognizer

bangla torrents bangla torrents bangla torrents bangla...

speechtek 2009: optimizing speech recognizer rejection...

hmm report

microsoft’s cursive recognizer

realizing an autonomous recognizer using data compression

government of west bengal - egiye bangla bangla...

digit recognizer

hmm revisited

hmm powerpoint

protein homology detection by hmm–hmm comparison johannes...

a microprocessor based speech recognizer for isolated

playing card recognizer

implementation of an optical character recognizer …

synchronous sequential circuits: design procedure and...

recognizer issues

tutorial on hmm and atutorial on hmm and...