dsp hw2-1 - speech.ee.ntu.edu.tw · -c lib/config.cfg set format of input feature (mfcc_z_e_d_a)-o...
TRANSCRIPT
![Page 1: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/1.jpg)
DSP HW2-1HMM Training and Testing
教授:李琳山助教:王君璇
![Page 2: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/2.jpg)
Outline1. Introduction2. HiddenMarkovModelToolkit(HTK)3. HomeworkProblems4. SubmissionRequirements
![Page 3: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/3.jpg)
Introduction● Constructadigitrecognizer- monophone
ling|yi|er|san|si|wu|liu|qi|ba|jiu
● FreetoolsofHMM:HiddenMarkovToolkit(HTK)http://htk.eng.cam.ac.uk/
● Trainingdata,testingdata,scripts,andotherresourcesallareavailableonhttp://speech.ee.ntu.edu.tw/DSP2019Spring/
![Page 4: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/4.jpg)
Flowchart
![Page 5: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/5.jpg)
Hidden Markov Model Toolkit (HTK)
![Page 6: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/6.jpg)
Feature Extraction
![Page 7: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/7.jpg)
Feature Extraction - HCopy
Convertwaveto39dimensionMFCC.-C lib/hcopy.cfg● inputandoutputformat● parametersoffeatureextraction● Chapter7- SpeechSignalsandFront-endProcessing
-S scripts/training_hcopy.scp● amappingfromInputfilenametooutputfilename
speechdata/training/N110022.wav
MFCC/training/N110022.mfc
![Page 8: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/8.jpg)
Training Flowchart
![Page 9: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/9.jpg)
Training Flowchart
![Page 10: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/10.jpg)
Initialize model - HCompV
Computeglobalmeanandvarianceoffeatures-C lib/config.cfg
● setformatofinputfeature(MFCC_Z_E_D_A)-o hmmdef -M hmm
● setoutputname:hmm/hmmdef-S scripts/training.scp
● alistoftrainingdatalib/proto
● adescriptionofaHMMmodel,HTKMMFformat⇨ youcanmodifytheModelFormathere(#states)!
![Page 11: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/11.jpg)
Initial MMF PrototypeMMF:HTKBookchapter7
![Page 12: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/12.jpg)
Initial HMM● bin/macro
ProduceMMFcontainsvFloor
● bin/models_1mixsiladdsilenceHMM
hmm/hmmdef
hmm/models
![Page 13: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/13.jpg)
Training Flowchart
![Page 14: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/14.jpg)
Adjust HMMs - HERest
Basicproblem3forHMM● GivenOandaninitialmodelλ=(A,B,π),adjustλtomaximizeP(O|λ)
![Page 15: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/15.jpg)
Adjust HMMs - HERest
AdjustparametersλtomaximizeP(O|λ)● oneiterationofEMalgorithm● runthiscommandthreetimes=>threeiterations
–I labels/Clean08TR.mlf● setlabelfileto“labels/Clean08TR.mlf”
-o lib/models.lst● alistofwordmodels(liN(零),#i(一),#er(二),…jiou(九),sil)
![Page 16: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/16.jpg)
Add SP Model
Add ”sp”(shortpause)HMMdefinitiontoMMFfile“hmm/hmmdef”
![Page 17: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/17.jpg)
Modify HMMs - HHEd
lib/sil1.hed● alistofcommandtomodifyHMMdefinitions
lib/models_sp.lst● anewlistofmodel(liN(零),#i(一),#er(二),…jiou(九),sil,sp)
![Page 18: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/18.jpg)
Training Flowchart
![Page 19: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/19.jpg)
Adjust HMMs Again - HERest
![Page 20: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/20.jpg)
Increase Number of Mixtures - HHEd
![Page 21: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/21.jpg)
Modification of Models
You can modify # of Gaussian mixture here.
This value tells HTK to change the mixture number from state 2 to state 4. If you want to change # state, check lib/proto.
You can increase # Gaussian mixture here.
![Page 22: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/22.jpg)
Adjust HMMs Again - HERest
![Page 23: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/23.jpg)
Training Flowchart
Hint:Increasemixtureslittlebylittle!
![Page 24: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/24.jpg)
Testing Flowchart
![Page 25: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/25.jpg)
Construct Word Net - HParse
lib/grammar_sp● regularexpression● easyforusertoconstruct
lib/wdnet_sp● outputwordnet● theformatthatHTKunderstand
![Page 26: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/26.jpg)
Viterbi Search - HVite
-w lib/wdnet_sp● inputwordnet
-i result/result.mlf● outputMLFfile
lib/dict● dictionary:amappingfromwordtophonesequences
ling->liN,er->#er,….一 ->sic_ii,七->chi_ii
![Page 27: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/27.jpg)
Compared With Answer - HResults
LongestCommonSubsequence(LCS)
Ref:See HTK book 3.2.2 (p. 33)
![Page 28: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/28.jpg)
Report - Part 1 (40%) - Run Baseline
1. DownloadHTKtools(recommend:compiledbinary)andhomeworkpackage
2. SetPATHforHTKtools:set_htk_path.sh
3. Execute(bashshellscript)01_run_HCopy.sh02_run_HCompV.sh03_training.sh04_testing.sh
![Page 29: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/29.jpg)
Report - Part 1 (40%) - Run Baseline (cont.)
3. Youcanfindaccuracyin“result/accuracy”thebaselineaccuracyis74.34%
4. Putthescreenshotofyourresultonthereport.
![Page 30: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/30.jpg)
Useful tips1. Tounzipfiles
unzipXXXX.ziptar-zxvfXXXX.tar.gz
2. Tosetpathin“set_htk_path.sh”PATH=$PATH:“~/XXXX/XXXX”
3. Incaseshellscriptisnotpermittedtorun…chmod744XXXX.sh
![Page 31: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/31.jpg)
Useful tips
4. IfyouencounterNosuchfileordirectoryonthecompiledbinary files,itisbecauseyouaretryingtoruna32-bitbinaryona64-bitsystemthatdoesn'thave32-bitsupportinstalled.Youmayneedtoinstalllibrarypackagessuchaslibc6:i386,libncurses5:i386,andlibstdc++6:i386.
![Page 32: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/32.jpg)
Report - Part 2 (40%) - Improve Accuracy● Acc>95%forfullcredit;90~95%forpartialcredit
andputthescreenshotofyourresultonthereport.
proto 03_training.sh,mix2_10.hed...
![Page 33: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/33.jpg)
Part 2 - Attention 1● Executing03_training.shtwiceisdifferentfrom
doublingthenumberoftrainingiterations.Toincreasethenumberoftrainingiterations,pleasemodifythescript,ratherthanrunitmanytimes.
![Page 34: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/34.jpg)
Part 2 - Attention 2● Everytimeyoumodifiedanyparameterorfile,you
shouldrun00_clean_all.sh toremoveallthefilesthatwereproducedbefore,andrestartalltheprocedures.Ifnot,thenewsettingswillbeperformedonthepreviousfiles,andhenceyouwillbenotabletoanalyzethenewresults.(Ofcourse,youshouldrecordyourcurrentresultsbeforestartingthenextexperiment.)
![Page 35: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/35.jpg)
Report - Part 3 (30%)● Writeareportdescribingyourtrainingprocessand
accuracy.Numberofstates,Gaussianmixtures,iterations,…HowsomechangeseffecttheperformanceOtherinterestingdiscoveries
● Well-writtenreportmayget+10%bonus.
![Page 36: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/36.jpg)
Submission Requirements1. 4shellscripts
yourmodified01~04_XXXX.sh
2. 1accuracyfilewithonlyyourbestaccuracy(Thebaselineresultisnotneeded.)
3. proto,mix2_10.hedyourmodifiedhmmprototype andfilewhichspecifiesthenumberofGMMsofeachstate
4. hw2-1_bXXXXXXXX.pdfscreenshotforbaselineandthebestresult,orotherinteresting.
![Page 37: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/37.jpg)
Submission Requirements (cont.)5. Putthose8 filesinafolder,compressthefolderto1
zipfileanduploadittoCEIBA.● FoldernameshouldbebXXXXXXXX (e.g.b04901000orr07922000)
● .ziponly● 20% ofthefinalscorewillbetakenoffforwrongformat
6. Deadline:2019/5/323:59:59● LatePenalty:10%offevery24hoursafterdeadline
(lessthan24hourswillbeviewedas24hours).● Submissionafter3dayswillgetzeropoint.
![Page 38: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/38.jpg)
If you have any problem…● Checkforhintsinthelinuxandshellscripts.ex:鳥哥● ChecktheHTKbook.● AskfriendswhoarefamiliarwithLinuxcommandsor
Cygwin. (link:howtoHTKonCygwin)
![Page 39: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training](https://reader036.vdocuments.site/reader036/viewer/2022071102/5fdb49991b5cd541014e47a3/html5/thumbnails/39.jpg)
Contact TA● email:[email protected]
title:[HW2-1]ProblemDescription● OfficeHour:Monday14:30-15:30電二531王君璇
(Pleasesendanemailbeforecoming!)