speaker recognition system by abhishek mahajan

SHREEJEE INSTITUTE OF TECHNOLOGY AND MANAGEMENT

Speaker Recognition

• Guided By:- Mr. Prakash Singh Panwar

• By:- Rajpal Singh Chouhan• EC BRANCH 1ST YEAR

What is Speaker Recognition?

Speaker Recognition is the process of automatically recognizing who is speaking on the basis of individual

information included in speech signals.

Speaker Recognition =

Speaker Identification, Speaker Verification

Speaker Identification• a

• Determine the speaker identity.

• Selection between a set of known voices.

• The user does not claim an identity.

Whose voice is this?

? ?

??

Speaker Verification• a

• Synonyms: authentication, detection.• User claims an identity.• System task: Accept or reject identity claim.

Is this Ahmad’s voice

?

?

Model of Speaker Recognizer• a

Fig -1 : Simple model of Speaker Recognizer .

U Permitted to Access

Hello,Mr. John

The Structure of Speaker Recognizer• a

• Figure 2 :Functional Scheme of an ASR System.

Feature Extraction Feature Vector

Training Mode

Recognition

Speaker Modeling

Classification

Decision Logic Speaker

#ID

Speaker_1

Speech Signal AnalysisFeature Extraction

• a

• - The aim is to extract the voice features to distinguish different phonemes of a language.

515645465

156156165

156456454

251561565

MFCC extraction• a

Pre-emphasis DFT Mel filter banks Log(||2) IDFT

Speech

signalx(n)

WINDOW

x’(n)

xt (n)

Xt(k)

Yt(m)

MFCCyt(m)(k)

MFCC means Mel-frequency cepstral coefficients that representation of the short-term power spectrum of a sound for audio processing.

The MFCCs are the amplitudes of the resulting spectrum.

a

• a

Speech waveform of a phoneme “\ae”

After pre-emphasis and Hamming windowing

Power spectrum MFCC

Speech Signal to Feature Vector• a

515645465

156156165

156456454

251561565

Vector Quantization (VQ) • aAIM of VQ :

representation of large amountsof data by (few) prototype vectors.

example: identification and groupingin clusters of similar data.

assignment of feature vector to the closest prototype w(similarity or distance measure, e.g. Euclidean distance )

Database Creation Process• a

Database

Speaker #1

Speaker #2

Speaker #3

Hello, Speaker #1

Speaker #1Speaker #2

Hello, Speaker #2

Speaker Identification• a

Database

#1 #2 #3

Speaker

# ?

Speaker 1 5.94

Speaker

# 1

Speaker Verification• a

Database

#1 #2 #3

Speaker

# 1

Speaker 1 5.94

Accept

14

Database Creation Condition• a

Table 1: Database description.

Parameter Characteristics

Language BanglaNo. of speaker 5Speech type Sentence reading Recording condition A normal room conditionAudio Length 60-90 secondsAudio type StereoSample Format 16-bit PCMSampling Frequency 8 KHzBit Rate 1411 kbps

Speaker Recognition Result• a

Table 3: Test result for speaker recognition system.

Speaker No. of input Correct Incorrect Accuracy

Speaker_1 5 5 0 100%

Speaker_2 9 8 1 88.88%

Speaker_3 6 6 0 100%

Speaker_3 12 11 1 91.67%

Speaker_4 8 8 0 100%

Speaker_5 10 10 0 100%

Total Speaker 50 48 2 96%

Applications• a • Transaction authentication

– Toll fraud prevention– Telephone credit card purchases– Telephone brokerage (e.g., stock trading)

• Access control– Physical facilities– Computers and data networks

• Information retrieval– Customer information for call centers– Audio indexing (speech skimming device)

• Forensics– Voice sample matching

speaker recognition system by abhishek mahajan

Engineering