presentation 5 : sound propagation in the human...

15
Presentation 5 : Sound Propagation in the Human Vocal Tract Kocaeli University Speech Processing

Upload: doankhanh

Post on 14-Apr-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Presentation 5 : Sound

Propagation in the Human

Vocal Tract

Kocaeli University

Speech Processing

Sound propagation

The vocal tract is modeled as a tube of non-uniform, time-

varying, cross-section area

Speech corresponds to variations of air in such a system

Needs a complete specification of A(x,t)

Uniform tube model

First study uniform lossless tube model that is A(x,t)

is constant in both t and x

Then add

simple models of losses due to soft walls, effects of friction

and thermal conduction

model for radiation at lips

source model at glottis

nasal model for nasal tract, there are two branches as

shown in the figure

Lossless tube model

A widely used model based upon the assumption that the vocal

tract can be represented as a concatenation of variable length,

constant cross sectional area lossless acoustic tubes

The model can deviate due to the lossess

Friction at the walls

Heat conduction through the walls

Vibration at the walls of the tube

Each loss can be studied for more detailed but more complicated

model

Losless tube model

A large number of tubes with short length can reasonably

approximate the vocal tract

Digital models for speech production

Losless tube model

Discrete time source/system model

Digital models for speech production

Vocal Tract Model, 𝑉(𝑧)

It can be approximated with an all-pole model for majority of sounds

Nasals and fricatives require both poles and zeros

We may include zeros in the transfer function or

We may introduce more poles, the effect of a zero can be approximated by including more poles

Roots of V(z);

For stability, all poles are inside the unit circle

𝑉 𝑧 =𝐺

1 − 𝛼𝑘𝑧−𝑘𝑁

𝑘=1

Digital models for speech production

Direct form representation

Cascade of second order systems

Digital models for speech production

Radiation Model, 𝑅(𝑧)

Can be approximated with a zero slightly inside the unit circle, 𝛼 < 1

𝑅 𝑧 = 1 − 𝛼𝑧−1

Digital models for speech production

Excitation Model;

For unvoiced sounds excitation can be modeled as a white noise + a gain

parameter to control the intensity

For voice sounds, remember the glottal airflow from the glottis

Glottal Pulse Model, 𝐺(𝑧)

Digital models for speech production

Rosenberg’s glottal pulse approximation

Digital models for speech production

The complete model;

Digital models for speech production

Frequency response curves for various components of the

speech model and the resulting waveform

Digital models for speech production

Another model for speech production adapted from ‘Discrete Time

Speech Signal Processing : Principles and Practice’, Thomas F.

Quatieri

Impulse

Train

Random

Noise

Impulsive

Input

V(z)

G(z) X

Linear/Non-linear

Combiner

Av

An

Ai

X

X R(z)

Speech

0 1 2 3 4 5 6 7 8

x 104

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Questions?

Thank you!