pitch prediction for glottal spectrum estimation with applications in speaker recognition nengheng...

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition

Nengheng ZhengNengheng Zheng

Supervised under Professor P.C. Ching

Nov. 26 , 2004

Outline

• Speech production and glottal pulse excitation in detail

• Linear prediction: short-term and Long-term

• Glottal spectrum estimated with long-term prediction and acoustic features

• For speaker recognition implementation

Speech Production

Impulsetrain

generator

Glottal pulsemodel G(z)

Vocal tractmodel V(z)

RadiationmodelR(z)

Randomnoise

generatorX

Glottal pulses

Vocal tract Speech signal

)()()()( zRzVzGzH

Discrete time model for speech production

A combined transfer function

Acoustic Features of Glottal Pulse

• Time domain– pitch period

– pitch period perturbation (jitter)

– pulse amplitude perturbation (shimmer)

– glottal pulse width

– abruptness of closure of the glottal flow

– aspiration noise • Frequency domain

– fundamental frequency (F0)

– spectral tilt (slope)

– harmonic richness

Glottal Pulse and Voice Quality

• Glottal pulse shape plays an important role on the quality of Natural or synthesized vowels [Rosenberg 1971]– The shape and periodicity of vocal cord excitation are subject to

large variation– Such variations are significant for preserving the speech

naturalness– A typical glottal pulse: asymmetric with shorter falling phase;

spectrum with -12dB/octave decay

• More variation among different speakers than among different utterance of the same speaker [Mathews 1963]

• Such variations have little significance for speech intelligibility but affect the perceived vocal quality [Childers 1991]

Various Glottal Pulses

• Some other vocal typesbreathy falsetto vocal fry

• Temporal and spectral characteristics

Some Comments

• Generally, to study the glottal pulse characteristics, it is necessary to rebuilding the glottal pulse waveform by inverse filtering technique

• Automatically and exactly rebuilding the glottal waveform from real speech is almost impossible, especially, at the transient phase of articulation, or, for high pitched speakers

• Fortunately, it is possible to estimate the glottal spectrum from residual signal with pitch prediction

Linear Prediction

• Speech waveform: correlation between current and past samples and thus predictable

• Short-term correlation:

• Occurs within one pitch period• Formant modulation• Classical linear prediction analysis (short-term prediction)

• Long-term correlation

• occurs across consecutive pitch periods• Vocal cords vibration• Long-term/pitch prediction

kk knsans

)()( pnbunu

Linear Prediction

• Short-term predictor <classical linear prediction>

– Remove the short-term correlation and result in a glottal excitation signal

• Long-term predictor <pitch prediction>

– Remove the correlation across consecutive periods

)1(11)(

ppp zbzbzbzP

kk zazA

kk knsansnu

)()()(

)()()(k

k kpnubnunv

kpk zb +

_ _u(n) v(n)

Short-term predictor Long-term predictor

Linear Prediction: A example

0 100 200 300 400 500 600 700 800-1

0 100 200 300 400 500 600 700 800-0.5

0 100 200 300 400 5000

0 1000 2000 3000 400010

Frequency (Hz)

0 1000 2000 3000 4000

Frequency (Hz)

0 0.2 0.4 0.6 0.8 1-80

Frequency

Spectr

Magnitude (

0 0.2 0.4 0.6 0.8 1-60

Frequency

Spectr

Magnitude (

0 1000 2000 3000 400010

Frequency (Hz)

0 1000 2000 3000 4000

Frequency (Hz)

0 0.2 0.4 0.6 0.8 1-80

Frequency

Spectr

Magnitude (

0 0.2 0.4 0.6 0.8 1-60

Frequency

Spectr

Magnitude (

Examples of pitch prediction estimatedglottal spectrum

0 50 100 150 200 250 300 350 400 450 5000

0 50 100 150 200 250 300 350 400 450 5000.5

0 50 100 150 200 250 300 350 400 450 5000

Harmonic Structure of Glottal Spectrum

• Two parameters describing the harmonic structure– Harmonic richness factor and Noise-to-harmonic ratio

• Harmonic richness factor (HRF)

• Noise-to-harmonic ratio (NHR)

log10H

HRF ni BHi

NHR log10

0 200 400 600 800 1000 1200 1400 1600 1800 20000

Feature Generation

S-Tprediction

s(n)L-T

predictionon every

pitch period

u(n)G(z) G(f)

Mel-scaleBank pass

filtering

HRFn, NHRn,n=1,2,…,

p, g, bi

• Acoustic features including the following:

– Fundamental frequency F0

– Pitch prediction gain g

– Pitch prediction coefficients b-1, b0, b1

– HRFn and NHRn <n=1:10>

• 10 Mel scale frequency bank

• Feature generation process

)(log10

Experiments Conditions

• Speech quality: telephone speech

• Subject: 49 male speakers

• Training condition:– 3 training session, about 90s speech totally, over 3~6 weeks

– 128 GMM

• Testing condition:– 12 testing sessions. Over 4~6 months.

Speaker recognition experiments

Feature F0 g [b-1 b0 b1] HRF NHR

Iden. Rate 18% 11% 14% 32% 17%

• Identification results with long-term prediction related features

FeaturesIdentificationerror rate (%)

Fgs: F0_g_HRF_NHR25 52%

LPCC_D_A36 2.84

LPCC_D_A+Fgs 2.26

MFCC_D_A 2.1

MFCC_D_A+Fgs 1.9

• Comparison of glottal source feature with classical features

Summary

• Glottal source excitation is important for perceptional naturalness of voice quality and is helpful for distinguishing a speaker from the others.

• Linear prediction is a powerful tool for speech analysis. The spectral property of the supraglottal vocal tract system can be estimated by short-term prediction; While the long-term prediction estimates the spectrum of the glottal excitation system

• Recognition results show that the glottal source related acoustic features (F0, prediction gain, HRF, NHR, etc.) provide a certain degree of speaker discriminative power.

Other Applications

• Speech coding

• Speech recognition ?

• Speaking emotion recognition !

Thank You!

pitch prediction for glottal spectrum estimation with applications in speaker recognition nengheng...

Documents

using a white noise source to characterize a glottal ... ·...

glottal closure identification in voiced...

variation of glottal lf parameters across f0, vowels and...

glottal fry in college aged females: an entrainment ... ·...

aliasing-free implementation of discrete-time glottal...

glottal sounds in the chavacáno language

the relationship between sonority and glottal vibration -...

notes on glottal flow interaction - royal institute of ......

glottal stop and checked consonants in bonda.pdf

hugo miguel ferreira beleza estimation of the glottal flow

zheng liu (sbn 229311) menlo park, california 94025...

the glottal stop in german pronunciation

parameterisation methods of the glottal flow estimated by...

frequency domain interpretation and derivation of glottal...

an acoustic investigation of the glottal …...an acoustic...

jason zheng

overall and posterior glottal adduction in...

processing transition regions of glottal stop...

glottal source and excitation analysis

the role of the glottal stop in diminutives: an ot...