speaker recognition system by abhishek mahajan
TRANSCRIPT
SHREEJEE INSTITUTE OF TECHNOLOGY AND MANAGEMENT
Speaker Recognition
• Guided By:- Mr. Prakash Singh Panwar
• By:- Rajpal Singh Chouhan• EC BRANCH 1ST YEAR
What is Speaker Recognition?
Speaker Recognition is the process of automatically recognizing who is speaking on the basis of individual
information included in speech signals.
Speaker Recognition =
Speaker Identification, Speaker Verification
Speaker Identification• a
• Determine the speaker identity.
• Selection between a set of known voices.
• The user does not claim an identity.
Whose voice is this?
? ?
??
Speaker Verification• a
• Synonyms: authentication, detection.• User claims an identity.• System task: Accept or reject identity claim.
Is this Ahmad’s voice
?
?
Model of Speaker Recognizer• a
Fig -1 : Simple model of Speaker Recognizer .
U Permitted to Access
Hello,Mr. John
The Structure of Speaker Recognizer• a
• Figure 2 :Functional Scheme of an ASR System.
Feature Extraction Feature Vector
Training Mode
Recognition
Speaker Modeling
Classification
Decision Logic Speaker
#ID
Speaker_1
Speech Signal AnalysisFeature Extraction
• a
• - The aim is to extract the voice features to distinguish different phonemes of a language.
515645465
156156165
156456454
251561565
MFCC extraction• a
Pre-emphasis DFT Mel filter banks Log(||2) IDFT
Speech
signalx(n)
WINDOW
x’(n)
xt (n)
Xt(k)
Yt(m)
MFCCyt(m)(k)
MFCC means Mel-frequency cepstral coefficients that representation of the short-term power spectrum of a sound for audio processing.
The MFCCs are the amplitudes of the resulting spectrum.
a
• a
Speech waveform of a phoneme “\ae”
After pre-emphasis and Hamming windowing
Power spectrum MFCC
Speech Signal to Feature Vector• a
515645465
156156165
156456454
251561565
Vector Quantization (VQ) • aAIM of VQ :
representation of large amountsof data by (few) prototype vectors.
example: identification and groupingin clusters of similar data.
assignment of feature vector to the closest prototype w(similarity or distance measure, e.g. Euclidean distance )
Database Creation Process• a
Database
Speaker #1
Speaker #2
Speaker #3
Hello, Speaker #1
Speaker #1Speaker #2
Hello, Speaker #2
Speaker Identification• a
Database
#1 #2 #3
Speaker
# ?
Speaker 1 5.94
Speaker
# 1
Speaker Verification• a
Database
#1 #2 #3
Speaker
# 1
Speaker 1 5.94
Accept
14
Database Creation Condition• a
Table 1: Database description.
Parameter Characteristics
Language BanglaNo. of speaker 5Speech type Sentence reading Recording condition A normal room conditionAudio Length 60-90 secondsAudio type StereoSample Format 16-bit PCMSampling Frequency 8 KHzBit Rate 1411 kbps
Speaker Recognition Result• a
Table 3: Test result for speaker recognition system.
Speaker No. of input Correct Incorrect Accuracy
Speaker_1 5 5 0 100%
Speaker_2 9 8 1 88.88%
Speaker_3 6 6 0 100%
Speaker_3 12 11 1 91.67%
Speaker_4 8 8 0 100%
Speaker_5 10 10 0 100%
Total Speaker 50 48 2 96%
Applications• a • Transaction authentication
– Toll fraud prevention– Telephone credit card purchases– Telephone brokerage (e.g., stock trading)
• Access control– Physical facilities– Computers and data networks
• Information retrieval– Customer information for call centers– Audio indexing (speech skimming device)
• Forensics– Voice sample matching