a simple example of generating a recognition system using … · 452 words 7set –training data...
TRANSCRIPT
A Simple Example of generating a recognition System using HTK
For a Word Level Recognition
Yoon-Joong Kim Hanbat National University
Setting an environment variable(path)
• Download HTKD32.zip and save HTK 3.2 on
– location: d:\HTK32\binwin32 //HTK library \Data //speech Data
• Set the environment variable(path)
– 제어판\시스템 및 보안\시스템\고급시스템설정
– 시스템설정
• 고급
– 환경변수
» Adiminstrator (사용자계정)에 대한 사용자변수 변수 : path 값 : D:\HTK32\binwin32
» 확인
A folder structure for an example
• D:/HTK32 – Binwin32 – Data
• Talker1 – PBW1 Pbw1001.wav
Pbw1001.mfc Pbw1002.wav
Pbw1002.mfc ….
– PBW2 Pbw1001.wav Pbw1001.mfc Pbw1002.wav Pbw1002.mfc ….
• Talker2 – PBW1 Pbw1001.wav
Pbw1001.mfc Pbw1002.wav
Pbw1002.mfc ….
– PBW2
…
– YJK • config
…
– YJK
• configs Hcopy.config config
• scripts Hcopy.scp train.scp test.scp
• wordhmms – m0
proto vfloor Hmmdefs
– m1 Hmmdefs
– m2 …
• mlfs words.mlf recOutWordm5.mlf
• modelList wordList
• dic pbwGram pbwNet dict
Summary
• Data Preparation
– Preparation of speech data(wave file) for training and testing.
• Feature Vectors Generation (Hcopy.exe)
– Wave file =>mfcc file
• Generation of initial hmmdefs(ptototype, master macro files)
– General Hmm model(prototype) definition for a unit
– Valiance computation (HCompV.exe) for all the speech data
– Generation of hmmdefs(a set of initial Hmms, Master Macro File) for all units(word or phoneme) using the general Hmm model.
• Reestimate hmmdefs(Herest.exe)
• Recognition Test (Hvite.exe)
• Analyze Result (Hresults.exe)
Data Preparation
• Speech data for training and testing
– Data(wave file) 452 words 7set
– Training Data 452 words 7set
– Test Data
• Open Test(speaker indepedent recognition) : 0 set
• Close Test(speaker dependent recognition) : 7 set
– location: HTK32/Data/speaker name/PBW1/speech file
Data Preparation
• Feature of Speech Data
– NIST format
• NIST format that has a extension of “.wav”.
• 16KHz, 16bit, linear PCM
• Phonetically Balanced Words(PBW)
pbw1001.wav ⇒ “청와대” pbw1002.wav ⇒ “컴퓨터” Pbw1003.wav ⇒ “그에게” pbw1004.wav ⇒ “위대한” pbw1005.wav ⇒ “당뇨병” pbw1006.wav ⇒ “그야말로” pbw1007.wav ⇒ “예컨대” pbw1008.wav ⇒ “분야에서”
Feature Vector Generation
• Compute Feature Vectors Hcopy.exe
• Compute the features from wave file and save the features on the same folder.
• Features : LPC,MFCC,PLP
• Wave file formats: HTK, Esignal, TIMIT, NIST Sphere, SCRIBE, SDES1, AIFF,SUNAU8,OGI,WAV,NPHEAD
– -C configs/Hcopy.config
• Configuration file to compute features
– -S scripts/Hcopy.scp
• Script file of a list
• pairs of wave file and feature file
D:/HTK32/binwin32 /Data/speakerName/PBW1/ Pbw1001.wav Pbw1002.wav … /configs Hcopy.config /scripts Hcopy.scp => D:/HTK32/binwin32 /Data/SpeakerName/PBW1/ Pbw1001.wav Pbw1001.mfc Pbw1002.wav Pbw1002.mfc …
HCopy –C configs/Hcopy.config –S scripts/Hcopy.scp
Feature Vector Generation
• Compute Feature Vectors Hcopy.exe
• A copy program that extracts the features from wave file and save the features
• Features : LPC,MFCC,PLP
• Wave file formats: HTK, Esignal, TIMIT, NIST Sphere, SCRIBE, SDES1, AIFF,SUNAU8,OGI,WAV,NPHEAD
– -C configs/Hcopy.config
• Configuration file to extract features
– -S scripts/Hcopy.scp
• Script file of a list
• pairs of wave file and feature file
D:/HTK32/binwin32 /Data/speakerName/PBW1/ Pbw1001.wav Pbw1002.wav … /configs Hcopy.config /scripts Hcopy.scp => D:/HTK32/binwin32 /Data/SpeakerName/PBW1/ Pbw1001.wav Pbw1001.mfc Pbw1002.wav Pbw1002.mfc …
HCopy –C configs/Hcopy.config –S scripts/Hcopy.scp
Feature Vector Generation
• configuration file Configs/Hcopy.config
#Coding parameters SOURCEKIND = WAVEFORM SOURCEFORMAT = NIST //NIST format SOURCERATE = 625 //fs=16000KHz, 0.0625ms=1/16000 TARGETKIND = MFCC_0 //MFCC+ Energy C0
TARGETRATE = 100000.0 //window shift rate :10ms, of 100ns SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 //window size 25ms USEHAMMING = T //hamming window PREEMCOEF = 0.97 //pre-emphasis factor NUMCHANS = 26 //number of filter banks CEPLIFTER = 22 //filtering degree NUMCEPS = 12 //cepstrum order ENORMALISE = F //energy normalization
HCopy –C configs/Hcopy.config –S scripts/Hcopy.scp
Feature Vector Generation
• Script file Scripts/Hcopy.scp ../Data/MBTG0/pbw1/pbw1001.wav ../Data/MBTG0/pbw1/pbw1001.mfc ../Data/MBTG0/pbw1/pbw1002.wav ../Data/MBTG0/pbw1/pbw1002.mfc ………….
HCopy –C configs/Hcopy.config –S scripts/Hcopy.scp
Data Preparation
• Master Label File – word level transcription mlfs/words.mlf
– pbw1001.wav : “*/pbw1001.lab” sil 청와대 sil.
– pbw2001.wav : “*/pbw2001.lab” sil 청와대 sil. => “*/pbw*001.lab” sil 청와대 sil.
#!MLF!#
“*/pbw1001.lab” //Label index ::[wave file].lab
Sil //silence
청와대 //utterance
sil
. //end of label
#*/pbw2001.lab” //wild card is available sil 청와대
Sil
.
#!MLF!# “*/pbw1001.lab” sil 청와대 sil . “*/pbw2001.lab” sil 청와대 sil . “*/pbw1002.lab” sil 컴퓨터 sil . “*/pbw2002.lab” sil 컴퓨터 Sil . ...
#!MLF!# “*/pbw*001.lab” sil 청와대 sil . “*/pbw*002.lab” sil 컴퓨터 sil . ..
Data Preparation
• Model List
– modelList/wordList
– Hmm model name list
sil 청와대 컴퓨터 그에게 위대한 당뇨병 그야말로 예컨대 분야에서 어두운 소프트웨어 됐습니다 아니냐는 야당의
Generation of an initial Hmmdefs(master macro file)
• Scan a set of data files(train.scp), compute global mean and variance and set
them to new hmm prototype(m0/proto) from hmm proto(proto) and mean and variance.
• HCompV.exe • input :
-C configs/config //parameters for computing feature -f 0.01 //cause a variance floor macro (called vFooors) to be // computed with value of 0.01 times the global variance -m //cause mean to be computed as well as the variances -S scripts/train.scp //mfc feature vector list to be used in training WordHmms/proto //hanwritten hmm prototype
• output : -M WordHmms/m0 // directory for the result //vfloors : variance floor macro //proto : hmm prototype with valued GMM //hmmdefs : will be written manually with proto
HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto
Generation of an initial Hmmdefs(master macro file)
• Write a general Hmm model(ptototype) for mono phone speech WordHmms/proto – 3 state left-to-right Model
– WordHmms/proto • -o Feature vector definition
size : 39, MFCC_0_D_A
• -h Hmm model Name : proto
• State Number :5
• Gaussian Mixture Model one model, Mean, Valiance
• Transition Probability matrix
HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto
Generation of an initial Hmmdefs(master macro file)
• Configs/config
– script/Hcopy.config => configs/config
• TARGETKIND : MFCC_0_D_A
# Coding parameters NONUMSCAPES = T //for Korean handling TARGETKIND = MFCC_0_D_A //MFCC 12 + Energy 1 TARGETRATE = 100000.0 // + delta 13 + acceleration 13=39 SAVECOMPRESSED = T // shown later SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F
HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto
Generation of an initial Hmmdefs(master macro file)
• scripts/train.scp - file list for training
HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto
Generation of an initial Hmmdefs(master macro file)
• wordHmms/m0/vFloor
– Global Constant Values for computing -v varFloor1 <Valiance> 39 7.217242e-001 3.275488e-001 … ….
HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto
)( tt ob
||)2(
e||)2(
1
)),;()(
))(2
1
jkn
jkn
jkjkttj
CONST
Nob
jktjkjkt
μoΣμo
Σμo
Generation of an initial Hmmdefs(master macro file)
• wordHmms/m0/proto
• wordHmms/proto + global means and variances => wordHmms/m0/proto
HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto
Generation of an initial Hmmdefs(master macro file)
• Writie wordHmms/m0/hmmdefs – Master Macro File(MMF) hmmdefs ~o <STREAMINFO>1 39 <VECSIZE> 39<MFCC_D_A_0><DIAGC> ~h “proto” <BEGINHMM> .. <ENDHMM> ~v varFloor1 <Variance> 30 7.2 … ….. ~. “청와대” <BEGINHMM> … <ENDHMM> ~. “그에게” <BEINGHMM> … <ENDHMM> …
HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WorldHmms/m0/ WorldHmms/proto
wordHmms/m0/proto ~o <STREAMINFO> 1 39 <VECSIZE> 39<NULLD><MFCC_D_A_0><DIAGC> ~h "proto" <BEGINHMM> <NUMSTATES> 5 <STATE> 2 <MEAN> 39 -9.954108e+000 4.561644e-001 1.407761e+000 -4.952329e+000 -4.900678e+000 – … <VARIANCE> 39 7.217242e+001 3.275488e+001 6.895670e+001 6.279921e+001 7.020441e+001 … <GCONST> 1.280699e+002 <STATE> 3 <MEAN> 39 -9.954108e+000 4.561644e-001 1.407761e+000 -4.952329e+000 -4.900678e+000 … <VARIANCE> 39 7.217242e+001 3.275488e+001 6.895670e+001 6.279921e+001 7.020441e+001 … <GCONST> 1.280699e+002 <STATE> 4 <MEAN> 39 -9.954108e+000 4.561644e-001 1.407761e+000 -4.952329e+000 -4.900678e+000 … <VARIANCE> 39 7.217242e+001 3.275488e+001 6.895670e+001 6.279921e+001 7.020441e+001 … <GCONST> 1.280699e+002 <TRANSP> 5 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 …. 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 <ENDHMM>
Training
• Embedded Training
HERest –C configs/config –l mifs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList
Training
• Embedded Training
• HERest • input:
-C configs/config //parameters for feature -I mlfs/words.mlf //master label file, word, speech file modellist/wordList //word name list(hmm list) -S scripts/train.scp //mfc file list for training -H wordHmms/m0/hmmdefs //hmmdefs(a set of hmm prototypes) for all words
• output: -M wordhmms/m1 // re-estimated hmmdefs
HERest –C configs/config –l mifs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList
Training
• Configs/config
# Coding parameters NONUMSCAPES = T TARGETKIND = MFCC_0_D_A TARGETRATE = 100000.0 SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F
HERest –C configs/config –l mlfs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList
Training
mlfs/words.mlf scripts/train.scp modelList/wordList
HERest –C configs/config –l mlfs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList
Training
• wordHmms/m0/hmmdefs
– Master Macro File hmmdefs
– Contains a set of prototype hmms for all words
HERest –C configs/config –l mifs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList
• hmmdefs ~o <STREAMINFO>1 39 <VECSIZE> 39<MFCC_D_A_0><DIAGC> ~h “proto” <BEGINHMM> .. <ENDHMM> ~v varFloor1 <Variance> 30 7.2 … ….. ~. “청와대” <BEGINHMM> … <ENDHMM> ~. “그에게” <BEINGHMM> … <ENDHMM> …
Training
• [output]wordHmms/m1/hmmdefs
HERest –C configs/config –l mifs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList
• hmmdefs ~o <STREAMINFO>1 39 <VECSIZE> 39<MFCC_D_A_0><DIAGC> ~v varFloor1 <Variance> 30 7.2 … ….. ~h “청와대” <BEGINHMM> … <ENDHMM> ~h “그에게” <BEINGHMM> … <ENDHMM> …
Training
• Reestimate hmmdefs four times
• The folder wordhmms will shows
– D:/htk32/dev/wordhmms/
HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp
–H wordHmms/m1/hmmdefs –M wordHmms/m2 modelList/wordList
HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp
–H wordHmms/m2/hmmdefs –M wordHmms/m3 modelList/wordList
ERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp
–H wordHmms/m3/hmmdefs –M wordHmms/m4 modelList/wordList
Recognition Test
HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs
–w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict odelList/wordList
Recognition Test
– HVite
• input: –C configs/config //parameters for mfc modelList/wordList // hmm name list -S scripts/test.scp // mfc vector list for testing –w dic/pbwNet //word network for recognition Dic/dict //pronouncing dictionary, word [outymb] models –H wordHmms/m5/hmmdefs //a set of hmms
• output –i mlfs/recOutWordm5.mlf // result of recognition
HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs
–w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList
Recognition Test
• dic/dict - Writing a pronouncing dictionary
• Word [outsym] models – Word : word to be recognized
– [outsym] : string to output when word is recognized
– models : hmm model list
• Ex)
• Dictionary in word level recognition – 청와대 [청와대] 청와대
또는 청와대 청와대
• Dictionary in phoneme level recognition – 청와대 [청와대] ㅊㅓㅇㅘ ㄷ ㅐ
또는 청와대 ㅊㅓㅇㅘ ㄷ ㅐ
HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList
Recognition Test
Example of a grammar and network
• Grammar $digit= 영 | 일 | 이 | 삼 | 사 | 오 | 육 | 칠 | 팔 | 구 ; $name = [박] 산다라 | [태] 현 ; {sil ( <$digit> 번에 | $name 에게 ) 전화 (해 | 걸어 ) sil }
• Network
• Grammar rule $ : variable {} : zero or more repitions <>:one or more repitions [] : optional
Recognition Test
• Generation of Dic/pbwNet
• or
• HParse
– input : dic/pbwGram , –C configs/config
– Output : dic/pbwNet
HParse dic/pbwGram dic/pbwNet
HParse –C configs/config dic/pbwGram dic/pbwNet
HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList
Recognition Test
Writing as grammar file (dic/pbwGram)
HParse –C configs/config dic/pbwGram dic/pbwNet
Recognition Test
• dic/pbwNet configs/config
HParse –C configs/config dic/pbwGram dic/pbwNet
HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList
Recognition Test
• config/config
• scripts/test.scp - modellist/wordList
HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList
Recognition Test
• mlfs/recOutWordm5.mlf (master label format)
HVite –C configs/config -S scripts/test.scp -H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList
Recognition Test
• Analyze the recognition result
• Sentence (Hit,Delete,Substitute,Insert,Number)
– 452 words x 7set =3164 words
– Correct rate : 2998/3164=0.94
• Word ( including two sil per word) – 452 words x 3 x 7set =9492 words
– Correct rate : 9326/9492 =0.98
HResults –I mlfs/words.mlf modelList/wordList, mlfs/recOutWordm5.mlf
Summary
• Feature vectors generation data/pbw****.wav, configs/Hcopy.config,scripts/Hcopy.scp – Hcopy.exe
Data/pbw****.mfs
• Generation of a initial hmmdefs(master macro file) wordHmms/proto,scripts/train.scp,configs/Hcopy.config – HCompV.exe
WordHmms/m0/proto, wordHmms/m0/vFloors – =>manually add
wordHmms/m0/hmmdefs, wordHmms/m0/macros,
• Training wordHmms/m0/hmmdefs,scripts/train.scp,modellist/wordList, words.mlf, configs/config – HERest.exe
wordhmms/m1/hmmdefs
• Repeat Training 4 more times to generate wordHmms/m5/hmmdefs
Summary(cont.)
• Generation of Word Network dic/pbwGram, configs/config
– Hparse.exe dic/pbwNet
• Recognition Test script/test.scp, wordhmms/m5/hmmdefs, dic/dict, modelList/wordlist, configs/config,
– HVite.exe mlfs/recOutWordm5.mf
• Analyze the recognition result
mlfs/words.mlf, modelList/wordList, mlfs/recOutWordm5.mlf
– HResult
: HCopy.exe HComV.exe …
: pbw1001.wav, pbw1001.mfc, pbw1002.wav, …
: config, Hcopy.config
: dict, pbwGram, pbwNet
: words.mlf, recOutWordM5.mlf
: wordList
: Hcopy.scp, train.scp, test.scp
proto
: hmmdefs, proto, vFloors
: hmmdefs
: hmmdefs
: hmmdefs
: hmmdefs
: hmmdefs
Folder and Files for this example
• configs/ Hcopy.config // handwritten config // handwritten, modify TARGETKIND : MFCC_0_D_A
• scripts Hcopy.scp // handwritten pbw1001.wav pbw1001.mfc train.scp //handwritten ..mfc
• mlf words.mlf //handwritten */pbw1001.lab” sil 청와대 sil. recOutWordWordm5.mlf //
• modellist wordlist //handwritten sil 청와대 컴퓨터 그에게 …
• wordHmms proto // handwritten, hmm type m0 vFloor proto hmmdefs //handwritten m1 hmmdefs … m5 hmmdefs
• dic dict //word [outym] models pbwGram pbWNet