text prompted remote speaker authentication : joint speech and speaker recognition/verification...
DESCRIPTION
Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients). Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.This project was done at TU, IOE - Pulchowk Campus, Nepal.For more details visit http://ganeshtiwaridotcomdotnp.blogspot.comABSTRACT OF PROJECT>>>Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set. During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.TRANSCRIPT
![Page 1: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/1.jpg)
MAJOR PROJECT FINAL PRESENTATION :
TEXT PROMPTED REMOTE
SPEAKER AUTHENTICATION
Project Members:
Ganesh Tiwari (75010)
Madhav Pandey(75014)
Manoj Shrestha(75018)
Project Supervisor :
Dr. Subarna Shakya
Associate Professor
Internal Examiner:
Er. Manoj Ghimire
External Examiner
Er. Bimal Acharya
Tribhuvan University
Institute of Engineering
Pulchowk Campus
Department of Electronics and Computer Engineering
![Page 2: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/2.jpg)
INTRODUCTION
Voice biometric system
User login
Text-Prompted system
Claimant is asked to speak a prompted(random) text
Speech and Speaker Recognition
Why Text prompted ?
Playback attack
![Page 3: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/3.jpg)
OUR SYSTEM
Feature : MFCC
Modeling and Classifications : both statistical
GMM - Speaker Modeling :
HMM/VQ - Speech Modeling :
![Page 4: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/4.jpg)
PROPERTIES OF SPEECH SIGNAL
Carries both Speech Content and Speaker identity
What makes Speech Signal Unique ?
Each phoneme resonates at its own fundamental frequency
and harmonics of it
Studied over short period : short time spectral analysis
What is Speaker Dependent information
Fundamental frequency, primarily
function of the dimensions and tension of the vocal chords
size and shape of the mouth, throat, nose, and teeth
Studied over long period : all the variations from that speaker
![Page 5: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/5.jpg)
UNIQUENESS IN PHONEME
0 500 1000 1500 2000 2500-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
Samples
Am
plitu
de
Phoneme /ah/
Phoneme /i:/
![Page 6: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/6.jpg)
Pre-Processing and Feature Extraction
![Page 7: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/7.jpg)
PREPROCESSING : STEPS
1)Silence Removal
0 1 2 3 4 5 6 7 8 9
x 104
-1
-0.5
0
0.5
1
0 0.5 1 1.5 2 2.5 3 3.5 4
x 104
-1
-0.5
0
0.5
1
Silence Signal
Silence Removed
![Page 8: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/8.jpg)
PREPROCESSING :STEPS (CONTD..)
1)Silence Removal2)Pre-Emphasis
0 2000 4000 6000 8000 10000 120000
0.01
0.02
0.03
0.04
0.05
Frequency (Hz)
|Y(f
)|
0 2000 4000 6000 8000 10000 120000
1
2
3
4
5x 10
-3
Frequency (Hz)
|Y(f
)|
Boosted high
Frequencies
Suppressed high
Frequencies
![Page 9: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/9.jpg)
1)Silence Removal2)Pre-Emphasis3)Framing
50% overlapped, 23ms
PREPROCESSING :STEPS (CONTD..)
![Page 10: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/10.jpg)
1)Silence Removal2)Pre-Emphasis3)Framing 4)Windowing
0 10 20 30 40 50 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Hamming Window
0 200 400 600 800 1000 1200-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0 200 400 600 800 1000 1200-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
PREPROCESSING :STEPS (CONTD..)
Hamming Window
Windowed Signal
![Page 11: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/11.jpg)
FEATURE EXTRACTION
MFCC : Mel Filter Cepstral Coefficients
Perceptual approach
Human Ear processes audio signal in Mel scale
Mel scale : linear up to 1KHz and logarithmic after
1KHz
![Page 12: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/12.jpg)
MFCC EXTRACTION: (CONTD..)
Steps :
FFT Mel Filter Log DCT CMS
Mel Filter : 12 Filtering of absolute fft coefficients using triangular filter bank in
Mel scale
MFCC gives distribution of energy acc. to filters in Mel frequency band
Mel Filter Bank
![Page 13: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/13.jpg)
EXTRA FEATURES :ENERGY AND DELTAS
For achieving high recognition rate
A Energy Feature
Delta and Delta-Delta
delta velocity feature
double delta acceleration feature
Co-articulation
![Page 14: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/14.jpg)
COMPOSITION OF FEATURE VECTOR
12 MFCC Features
12 Δ MFCC
12 Δ Δ MFCC
1 Energy Feature
1 Δ Energy
1 Δ Δ Energy
39 Features from each frame
![Page 15: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/15.jpg)
Speech Recognition/Verification by
HMM/VQ
![Page 16: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/16.jpg)
HIDDEN MARKOV MODEL (HMM)
HMM is the extension of Markov Process
Markov Process consist of observable states
HMM has hidden states and observable symbols
per states
HMM is the stochastic model
![Page 17: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/17.jpg)
HMM (CONTD…)
Parameters
1) The initial state distribution (π)
2) State transition probability distribution (A)
3) Observation symbol probability distribution (B)
The HMM Model
(A,B,)
![Page 18: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/18.jpg)
EXAMPLE:
PRONUNCIATION MODEL OF WORD TOMATO
(A,B,)
![Page 19: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/19.jpg)
HMM IMPLEMENTATION
Feature Vector observation symbols , 256
Phonemes hidden states, 6
Left to right HMM
Discrete Hidden Markov Model (DHMM) with
Vector Quantization (VQ) technique
![Page 20: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/20.jpg)
SPEECH RECOGNITION SYSTEM
![Page 21: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/21.jpg)
VECTOR QUANTIZATION
![Page 22: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/22.jpg)
Speaker Recognition/Verification by
GMM
![Page 23: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/23.jpg)
SPEAKER VERIFICATION SYSTEM
![Page 24: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/24.jpg)
SPEAKER MODELING (GMM)
Gaussian Mixture Model
Parametric probability density function
Based on soft clustering technique
Mixture of Gaussian components
= (𝑤𝑚, 𝜇 𝑚 , 𝐶𝑚)
![Page 25: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/25.jpg)
SPEAKER MODEL TRAINING
Estimate the model parameters
Expectation Maximization algorithm
![Page 26: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/26.jpg)
SPEAKER VERIFICATION
Based on likelihood ratio
= 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑆 𝑐𝑜𝑚𝑒𝑠 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑠𝑝𝑒𝑎𝑘𝑒𝑟′𝑠 𝑚𝑜𝑑𝑒𝑙
𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑆 𝑐𝑜𝑚𝑒𝑠 𝑓𝑟𝑜𝑚 𝑖𝑚𝑝𝑜𝑠𝑡𝑒𝑟′𝑠 𝑚𝑜𝑑𝑒𝑙
![Page 27: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/27.jpg)
TOOLS USED
Languages: Adobe Flex
Java
Blaze DS for RPC
Servers: Apache Tomcat
MySQL
Versioning Tortoise SVN
![Page 28: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/28.jpg)
OUTPUT : SNAPSHOT (GUI)
![Page 29: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/29.jpg)
APPLICATION AREAS
Telephone transaction
Telephone credit card purchase,
Telephone stock trading
Access control
Physical facilities
Computer networks
Information retrieval
Customers information
Forensics
Voice sample matching
![Page 30: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/30.jpg)
LIMITATION AND FUTURE ENHANCEMENT
Noise reduction
Training on more data
Combine with
other features
other classification methods
![Page 31: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide](https://reader034.vdocuments.site/reader034/viewer/2022051323/54825bc8b07959290c8b47a6/html5/thumbnails/31.jpg)
Thanks
Any queries ?