presentation 5 : sound propagation in the human...

Presentation 5 : Sound

Propagation in the Human

Vocal Tract

Kocaeli University

Speech Processing

Sound propagation

The vocal tract is modeled as a tube of non-uniform, time-

varying, cross-section area

Speech corresponds to variations of air in such a system

Needs a complete specification of A(x,t)

Uniform tube model

First study uniform lossless tube model that is A(x,t)

is constant in both t and x

Then add

simple models of losses due to soft walls, effects of friction

and thermal conduction

model for radiation at lips

source model at glottis

nasal model for nasal tract, there are two branches as

shown in the figure

Lossless tube model

A widely used model based upon the assumption that the vocal

tract can be represented as a concatenation of variable length,

constant cross sectional area lossless acoustic tubes

The model can deviate due to the lossess

Friction at the walls

Heat conduction through the walls

Vibration at the walls of the tube

Each loss can be studied for more detailed but more complicated

model

Losless tube model

A large number of tubes with short length can reasonably

approximate the vocal tract

Digital models for speech production

Losless tube model

Discrete time source/system model


Vocal Tract Model, 𝑉(𝑧)

It can be approximated with an all-pole model for majority of sounds

Nasals and fricatives require both poles and zeros

We may include zeros in the transfer function or

We may introduce more poles, the effect of a zero can be approximated by including more poles

Roots of V(z);

For stability, all poles are inside the unit circle

𝑉 𝑧 =𝐺

1 − 𝛼𝑘𝑧−𝑘𝑁

𝑘=1


Direct form representation

Cascade of second order systems


Radiation Model, 𝑅(𝑧)

Can be approximated with a zero slightly inside the unit circle, 𝛼 < 1

𝑅 𝑧 = 1 − 𝛼𝑧−1


Excitation Model;

For unvoiced sounds excitation can be modeled as a white noise + a gain

parameter to control the intensity

For voice sounds, remember the glottal airflow from the glottis

Glottal Pulse Model, 𝐺(𝑧)


Rosenberg’s glottal pulse approximation


The complete model;


Frequency response curves for various components of the

speech model and the resulting waveform


Another model for speech production adapted from ‘Discrete Time

Speech Signal Processing : Principles and Practice’, Thomas F.

Quatieri

Impulse

Train

Random

Noise

Impulsive

Input

V(z)

G(z) X

Linear/Non-linear

Combiner

Av

An

Ai

X

X R(z)

Speech

0 1 2 3 4 5 6 7 8

x 104

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Questions?

Thank you!

presentation 5 : sound propagation in the human...

Documents