complete discrete time model complete model covers periodic, noise and impulsive inputs. for...

19
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown that R(z) can be approximated as R(z) = 1 - z - 1 or R(z) = 1 - z -1 differentiator. z R z V z G A z H z G A z X v v

Upload: kathlyn-theresa-norris

Post on 17-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Complete Discrete Time ModelComplete model covers periodic, noise and impulsive inputs.

For periodic input

1) R(z): Radiation impedance.

It has been shown that R(z) can be approximated as R(z) = 1 - z -1 or R(z) = 1 - z -1

differentiator.

zRzVzGAzHzGAzX vv

Therefore in continuous time it can be written that

R(z) can be moved to the glottis in the previous figure

2) G(z): z-transform of glottal flow input, g[n] over one cycle.

It can be approximated by

If <1 two identical poles outside the unit circle, two zeroes at infinity (maximum phase)

3) V(z): all pole vocal-tract function.

Complete Discrete Time Model

tvtudt

dAtvtu

dt

dAtx gg **

21

1

*

zzG

nunung nn

iC

kkk zczc

zV

1

1*1 11

1

Therefore

• V(z) and R(z) are minimum phase.• G(z) is maximum phase.

Some related work

Zeros of Z-Transform (ZZT) Decomposition of Speech For Source-Tract Separation

Baris Bozkurt, Boris Doval, Christophe D’Alessandro, Thierry Dutoit

This study proposes a new spectral decomposition method for source-tract separation. It is based on a new spectral representation called the Zeros of Z-Transfor m (ZZT), which is an all-zero representation of the z-transform of the signal. We show that separate patterns exist in ZZT representations of speech signals for the glottal flow and the vocal tract contributions. The ZZT-decomposition is simply composed of grouping the zeros into two sets, according to their location in the z-plane. This type of decomposition leads to separating glottal flow contribution (without a return phase) from vocal tract contributions in z domain.

Complete Discrete Time Model

iC

kkk

v

zczcz

zAzX

1

1*12

1

111

1

A Method For Glottal Formant Frequency Estimation

Baris Bozkurt, Boris Doval, Christophe D’Alessandro, Thierry Dutoit

This study presents a method for estimation of glottal formant frequency (Fg) from speech signals. Our method is based on zeros of z-transform decomposition of speech spectra into two spectra : glottal flow dominated spectrum and vocal tract dominated spectrum. Peak picking is performed on the amplitude spectrum of the glottal flow dominated part. The algorithm is tested on synthetic speech. It is shown to be effective especially when glottal formantand first formant of vocal tract are not too close. In addition, tests on a real speech example are also presented where open quotient estimates from EGG signals are used as reference and correlated with the glottal formant frequency estimates.

Improved Differential Phase Spectrum Processing For Formant Tracking

Baris Bozkurt, Boris Doval, Christophe D’Alessandro, Thierry Dutoit

This study presents an improved version of our previously introduced formant tracking algorithm. The algorithm is based on processing the negative derivative of the argument of the chirp-z transform (termed as the differential phase spectrum) of a given speech signal. No modeling is included in the procedure but only peak picking on differential phase spectrum. We discuss the effects of roots of z-transform to differential phase spectrum and the need to ensure that all Zeros are at some distance from the circle where chirp-z transform is computed. For that, we include an additional zero-decomposition step in our previously presented algorithm to improve its robustness. The final version of the algorithm is tested for analysis of synthetic speech and real speech signals and compared to two other formant tracking systems.

Complete Discrete Time Model

If the differentiation at the otput (radiation impedance) is applied to the glottal flow.

Derivative of glottal flow is more like pulse !

Complete Discrete Time Model

Glottal flow

Glottal flow derivative

NOISE INPUT

IMPULSE INPUT

The combination of three inputs may be linear or nanlinear !

Complete Discrete Time Model

zRzVzUAzHzUAzX vn

zRzVAzHAzX ii

OTHER ZEROS OF THE VOCAL-TRACT

In the noise and impulse source states oral tract constrictions may give zeros as well as poles (absorption of energy by cavity anti-resonances)

V(z) may have zeros

Vocal tract function is generally mixed phase.

Maximum phase elements of the vocal tract can also contribute to a more gradual attack of the speech waveform.

The modeling described here is called the SOURCE-FILTER MODEL of speech production.

Complete Discrete Time Model

i

oi

C

kkk

M

kk

M

kk

zczcz

zbzazAzX

1

1*12

11

11

111

111

In the source filter model, it is assumed that glottal input is infinite and glottal airflow is not influenced by the vocal tract.

However the pressure in the vocal tract cavity above glottis backs up (resists) against the glottal flow.

Vocal Fold and Vocal Tract Interaction

Electrical analog is shown below

Psg: subglottal (lung) pressure

p(t): sound pressure corresponding to a single first formant in front of glottis.(because it has been found that other formants have negligible effect on glottal flow.)

Zg(t): time varying impedance of the glottis.

R,L,C: these parameters model first formant with

formant frequency (center frequency)

bandwidth (3dB)

Vocal Fold and Vocal Tract Interaction

RCB

LC1

1

0

0

Zg(t) accounts for the interaction between the glottal flow and vocal tract.

If Zg(t) is comparable to the impedance of 1st formant then there will be considerable interaction and Ω0, B0 will be affected.

Also Zg(t) has been found to be nonlinear:

k=1.1

A(t) smallest time-varying area of glottal slit.

Equations are nonlinear and time-varying.

Vocal Fold and Vocal Tract Interaction

tutA

ktp gtg

222

sgtg

tgt

Ptptp

k

tptAdp

LR

tp

dt

tdpC

where

21

0

Numericval solution of the above equations reveals that the skewness of glottal flow is due to in part A(t) and in part to the loading effect of the first formant.

Numerical solution also yielded a ripple component.

Vocal Fold and Vocal Tract Interaction

Glottal flow derivative

The problem can approximately be analyzed by linearizing the differential equation.

Taylor series of if x<<1.

Vocal Fold and Vocal Tract Interaction

sg

sg

sg

tgt

P

tp

k

PtA

k

tpPtA

k

tptAdp

LR

tp

dt

tdpC

12

2

21

0

xx2

111

sg

sgt

P

tp

k

PtAdp

LR

tp

dt

tdpC

21

21

0

where

By differentiation

Vocal Fold and Vocal Tract Interaction

tutgtpdpLR

tp

dt

tdpC sc

t

0

02

11

sgsg

sc

sgsc

PktA

P

tutg

k

PtAtu

2

2

0

tutptgLdt

tpdtg

Rdt

tpdC sc

002

2

2

11

2

11

Corresponding Norton equivalent circuit is

Where

Usc(t) is now time-varying source.

Vocal Fold and Vocal Tract Interaction

tgtL

tgtR gg

00

22

Formant Frequency and Bandwidth Modulation

Because we have a linear but time-varying equation, formant frequency and bandwidth are time-varying i.e. they are modulated Laplace transform does not apply.

But the equaton can be solved for each time instantas a constant coefficient equation.

g0(t) is proportional to A(t) (glottal area)

bandwidth is proportional to glottal area ( B1(t) B0 since A(t) 0 )

formant is proportional to the derivative of glottal area ( Ω1(t) may be aboveor below Ω0 )

Vocal Fold and Vocal Tract Interaction

tgRBtB

tgLt

tstBs

cs

sU

tsPtsH

sc

001

020

21

211

2

11

2

11

2

/,,

formant

bandwidth

Glottal area

Bandwidth

Formant factor

• In the minimum bandwidth modulation cases ( /i/, /u/ ) B1(t) increases by a factor of 3 to 4.

• Multiplier of Ω 1(t) 0.8 ~ 1.2

Conclusions

• The increase of B1(t) within a glottal cycle yields the truncation of glottal flow (sharp closing of folds.

• It is due to a decrease in the impedance at the glottis as glottis opens. Reduced glottal impedance Zg(t) yields pressure drop accros glottis.

Vocal Fold and Vocal Tract Interaction

Truncation Effect (Using Klatt Synthesiser)

Truncation Effect