hcs 7367 speech perceptionassmann/hcs6367/lec1.pdf3 human vocal tract acoustics of speech...

19
1 HCS 7367 Speech Perception Dr. Peter Assmann Fall 2019 Course web page http://www.utdallas.edu/~assmann/hcs6367 Course information Lecture notes Speech demos Assigned readings Additional resources in speech & hearing Course materials No required text; all readings online Suggested background reading: W.F. Katz, P.F. Assmann (2019). The Routledge Handbook of Phonetics. 1st edition. Stevens, K.N. (1999). Acoustic Phonetics (Current Studies in Linguistics). M.I.T. Press. http://www.speechandhearing.net/library/speech_science.php Course requirements Class presentations (15%) Written reports on class presentations (15%) Midterm take-home exam (20%) Term paper (50%) Class presentations and reports Choose a topic from the field of speech perception and find a suitable (peer-reviewed) paper from the readings web page or from available journals. Your job is to present a brief (15 minute) summary of the paper to the class and initiate/lead discussion of the paper, then prepare a written report. Suggested Topics Speech acoustics Vowel production and perception Consonant production and perception Suprasegmentals and prosody Speech perception in noise Auditory grouping and segregation Speech perception and hearing loss Cochlear implants and speech coding Development of speech perception Second language acquisition Audiovisual speech perception Neural coding of speech Models of speech perception

Upload: others

Post on 26-Sep-2020

11 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

1

HCS 7367Speech Perception

Dr. Peter Assmann

Fall 2019

Course web page

• http://www.utdallas.edu/~assmann/hcs6367

– Course information

– Lecture notes

– Speech demos

– Assigned readings

– Additional resources in speech & hearing

Course materials

• No required text; all readings online

• Suggested background reading:

W.F. Katz, P.F. Assmann (2019). The Routledge Handbook of

Phonetics. 1st edition.

Stevens, K.N. (1999). Acoustic Phonetics (Current Studies in

Linguistics). M.I.T. Press.

http://www.speechandhearing.net/library/speech_science.php

Course requirements

• Class presentations (15%)

• Written reports on class presentations (15%)

• Midterm take-home exam (20%)

• Term paper (50%)

Class presentations and reports

• Choose a topic from the field of speech perception and

find a suitable (peer-reviewed) paper from the readings

web page or from available journals. Your job is to

present a brief (15 minute) summary of the paper to the

class and initiate/lead discussion of the paper, then

prepare a written report.

Suggested Topics

• Speech acoustics

• Vowel production and perception

• Consonant production and perception

• Suprasegmentals and prosody

• Speech perception in noise

• Auditory grouping and segregation

• Speech perception and hearing loss

• Cochlear implants and speech coding

• Development of speech perception

• Second language acquisition

• Audiovisual speech perception

• Neural coding of speech

• Models of speech perception

Page 2: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

2

PubMed search engine:

http://www.ncbi.nlm.nih.gov/pubmed

Finding papers

Journal of the Acoustical Society of America:

http://scitation.aip.org/JASA

Finding papers

UTD online journals

http://www.utdallas.edu/library/resources/journals.html

• Click on link A – Z List of eJournals or search

directly by journal name

Journal of the Acoustical Society of America

Primate vocal tractThe evolution of speech:

a comparative review

W. Tecumseh Fitch

Trends in Cognitive Sciences 4(7) July 2000

larynx

orangutan chimpanzee humantongue body

larynx

air sac

Primate vocal tractThe evolution of speech:

a comparative review

W. Tecumseh Fitch

Trends in Cognitive Sciences 4(7) July 2000 Source-filter theory of speech production

OutputSound

Vocal Tract

VibratingVocal folds

Lungs

Power supply Oscillator Resonator

Page 3: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

3

Human vocal tract

Acoustics

of speech

● Articulation

● Phonation

Organs of speech

• Lungs: apply pressure to generate

air stream (power supply)

• Larynx: air forced through the

glottis, a small opening between the

vocal folds (sound source)

• Vocal tract: pharynx, oral and

nasal cavities serve as complex

resonators (filter)

Source-Filter Theory

From Fitch, W.T. (2000). Trends in Cognitive Sciences

Audio demo: the source signal

• Source signal for an adult male voice

• Source signal for an adult female voice

• Source signal for a 10-year child

Vocal fold oscillation

• One-mass model

– Air flow through the

glottis during the closing

phase travels at the

same speed because of

inertia, producing

lowered air pressure

above the glottis.

Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.html

Page 4: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

4

Vocal fold oscillation

• Three-Mass Model

– One large mass

(representing the

thyroarytenoid muscle)

and two smaller masses,

M1 and M2 (representing

the vocal fold surface) .

All three masses are

connected by springs and

damping constants.

Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.html

Source-Filter Theory: Vowels

G. Fant (1960). Acoustic Theory of Speech Production

Linear systems theory

Assumptions: (1) linearity (2) time-invariance

Vowels can be decomposed into two primary components: a source (input signal) and a filter(modulates the input).

Time domain version:

U( t ) T( t ) R( t ) = P( t )

Frequency domain version:

U( f ) · T( f ) · R( f ) = P( f )

Source-Filter Theory: Vowels

Frequency

Am

pli

tud

e

Source properties

In voiced sounds the glottal source spectrum contains

a series of lines called harmonics.

The lowest one is called the fundamental frequency

(F0).

0 200 400 600 800 1000-50

-40

-30

-20

-10

0

Frequency (Hz)

Re

lative

Am

plit

ud

e (

dB

)

Frequency (Hz)

F0

Amplitude

Spectrum

Demo: harmonic synthesis

Additive harmonic synthesis: vowel /i/

Cumulative sum of harmonics: vowel /i/

Additive synthesis: “wheel”

Cumulative sum of partials:

Filter properties

The vocal tract resonances (called formants) produce

peaks in the spectrum envelope.

Formants are labelled F1, F2, F3, ... in order of

increasing frequency.

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Am

plit

ud

e in

dB

F1 F2

F3

F4AmplitudeSpectrum

(with superimposedLPC spectral envelope)

Page 5: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

5

source filter radiation=output sound

Frequency

Am

pli

tud

e

/ i /

/ ɑ //ɑ/

Source-filter theory

Source: J. Hillenbrand

Robert Mannell, Macquarie Universityhttp://clas.mq.edu.au/phonetics/phonetics/vowelartic/vowel_artic.gif

Source-filter theorySource: J. Hillenbrand

Source-filter theorySource: J. Hillenbrand

Source-filter theorySource: J. Hillenbrand

Source-filter theorySource: J. Hillenbrand

/s/ and /S/ SF Models Overlaid

Page 6: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

6

Speech terminology…

Fundamental frequency (F0): lowest frequency component in voiced speech sounds, linked to vocal fold vibration.

Formants: resonances of the vocal tract.

F0

Formant

Frequency

Amplitude

Source properties: Pitch

Fundamental frequency (F0) is determined by the

rate of vocal fold vibration, and is responsible for the

perceived voice pitch.

Source properties: Pitch

F0 can be removed by filtering (as in telephone circuits)

and the pitch remains the same.

This is the problem of the missing fundamental, one

of the oldest problems in hearing science.

Pitch is determined by the frequency pattern of the

harmonics (or their equivalent in the time domain, the

periodicities in the waveform).

Harmonicity and Periodicity

Period: regularly repeating pattern in the

waveform

Period duration, T0 = 6 ms

waveform

Harmonicity and Periodicity

Harmonic: regularly repeating peak in the

amplitude spectrum

0 0.5 1 1.5 2 2.5

-40

-20

0

20

Frequency (kHz)

Am

plit

ud

e (

dB

)

F0 = 1000 / 6 = 166 Hz

F0 = 1 / T0

amplitude

spectrum

Harmonics are

integer multiples

of F0 and are

evenly spaced in

frequency

Harmonicity and Periodicity

• Period: regularly repeating pattern in

the waveformPeriod duration T0 = 6 ms

0 0.5 1 1.5 2 2.5

-40

-20

0

20

Frequency (kHz)

Am

plit

ud

e (

dB

)

F0 = 1000 / 6 = 166 Hz

F0 = 1 / T0

Waveform

Amplitude Spectrum

Frequency (kHz)

Page 7: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

7

Harmonic singing

Harmonic singing (also called overtone

singing) involves changing the shape of the

vocal tract to align the resonance frequencies

(formants) with harmonics of the fundamental.

A low, sustained fundamental is produced,

similar to the drone of a bagpipe, along with

flute-like harmonics that drift in and out.

Harmonic singing

Harmonic singing (also called overtone

singing) involves changing the shape of the

vocal tract to align the resonance frequencies

(formants) with harmonics of the fundamental.

A low, sustained fundamental is produced,

similar to the drone of a bagpipe, along with

flute-like harmonics that drift in and out.

Harmonic singing

Tuvan throat singing

http://www.youtube.com/watch?v=DY1pcEtHI_w&feature=youtu.be

Amazing Grace

http://www.youtube.com/watch?v=mO4Uh-Mini4&feature=youtu.be

Effects of F0 changes

Source-filter

independence

Effects of formant frequency changes

Source-filter

independence

Voicing irregularities

Shimmer: variation in amplitude from one

cycle to the next.

Jitter: variation in frequency (period duration)

from one cycle to the next.

Page 8: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

8

Voicing irregularities

Breathy voice is associated with a glottal

waveform with a steeper roll-off than modal

voice. As a result there is less energy in the

higher harmonics (steeper slope in the

spectrum).

Vocal tract properties

Resonating tube model– approximation for neutral vowel (schwa), [ə]

– closed at one end (glottis); open at the other (lips)

– uniform cross-sectional area

– curvature is relatively unimportant

Glottis Lips

length, L

/ ə /

Uniform tube model (schwa)Vocal tract model

Quarter-wave resonator:

Fn = ( 2n – 1 ) c / 4 L

– Fn is the frequency of formant n in Hz

– c is the velocity of sound (about 35000 cm/sec)

– L is the length of the vocal tract (17.5 for adult male)

Vocal tract model

Quarter-wave resonator:

Fn = ( 2n – 1 ) c / 4 L

– F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz

– F2 = (2(2) –1)*35000/(4*17.5) = 1500 Hz

– F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz

Acoustic vowel space

i “heed”

ɑ “hod”

u “who’d”

First formant, F1 frequency (Hz)

Se

con

d f

orm

ant,

F2

fre

qu

en

cy (

Hz)

10008006004002000

0

1000

2000

3000

Ə

Page 9: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

9

Vocal tract model

Quarter-wave resonator:

Fn = ( 2n – 1 ) c / 4 L

– Fn is the frequency of formant n in Hz

– c is the velocity of sound in air (about 35000 cm/sec)

– L is the length of the vocal tract (17.5 for adult male)

L

Vocal tract model

Quarter-wave resonator:

Fn = ( 2n – 1 ) c / 4 L

– F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz

– F2 = (2(2) –1)*35000/(4*17.5) = 1500 Hz

– F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz

L

Helium speech

The speed of sound in a helium/oxygen mixture

at 20°C is about 93000 cm/s, compared to

35000 cm/s in air. This increases the resonance

frequencies but has relatively little effect on F0.

In helium speech, the formants are shifted up

but the pitch stays the same.

Helium speech

Using Matlab as a calculator, find the

frequencies of F1, F2 and F3 for a 17.5 cm

vocal tract producing the vowel /ә/ in a

helium/air mixture (velocity c ≈ 93000 cm/s)

Fn = ( 2n – 1 ) c / 4 L

F1 = (2(1) –1)*93000/(4*17.5) =

F2 = (2(2) –1)*93000 /(4*17.5) =

F3 = (2(3) –1)*93000 /(4*17.5) =

Helium speech

Audio demos

– Speech in air

– Speech in helium

– Pitch in air

– Pitch in helium

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.htmlTime (ms)

Fre

quency (

kH

z)

0 100 200 300 400 500 600 700 800 9000

1

2

3

4

Speech in airSpeech in air

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html

Page 10: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

10

Time (ms)

Fre

quency (

kH

z)

0 100 200 300 400 500 600 700 8000

1

2

3

4

Speech in heliumSpeech in helium

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html

Sulfur Hexaflouride

Helium

– density of 0.1786 g/L at sea level

Air

– density of 1.225 g/L at sea level

Sulfur Hexaflouride (SF6)

– density of 6.12 g/L at sea level

Speech production with vocal tract filled with SF6

http://www.youtube.com/watch?v=d-XbjFn3aqE

Perturbation Theory

The first formant (F1) frequency is lowered by

a constriction in the front half of the vocal tract

(/u/ and /i/), and raised when the constriction is

in the back of the vocal tract, as in /ɑ/.

delta

F1

glottis lips

Perturbation Theory

The second formant (F2) is lowered by a

constriction near the lips or just above the

pharynx; in /u/ both of these regions are

constricted. F2 is raised when the constriction is

behind the lips and teeth, as in the vowel /i/.

delta

F2

glottis lips

Perturbation Theory

The third formant (F3) is lowered by a

constriction at the lips or at the back of the

mouth or in the upper pharynx. This occurs in

/r/ and /r/-colored vowels like American English / ɚ /.

delta

F3

glottis lips

Perturbation Theory

F3 is raised when the constriction is behind

the lips and teeth or near the upper pharynx.

delta

F3

glottis lips

Page 11: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

11

Perturbation Theory

All formants tend to drop in frequency when

the vocal tract length is increased or when a

constriction is formed at the lips.

glottis lips

Perturbation Theory

F1 frequency is correlated with jaw

opening (and inversely related to tongue

height ).

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Am

plit

ud

e in

dB

amplitude

spectrum

Perturbation Theory

F2 frequency is correlated with tongue

advancement (front-back dimension)

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Am

plit

ud

e in

dB

amplitude

spectrum

Wavesurfer

Download Wavesurfer:www.speech.kth.se/wavesurfer

Wavesurfer User Manualwww.speech.kth.se/wavesurfer/man.html

Digital representations of signals

time

am

plit

ude

sampling quantization

Page 12: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

12

Spectral analysis

Amplitude spectrum: sound pressure levels

associated with different frequency

components of a signal Power or intensity

Amplitude or magnitude

Log units and decibels (dB)

Phase spectrum: relative phases associated

with different frequency components Degrees or radians

Spectral analysis of speech

Why perform a frequency analyses of speech?

– Ear+brain carry out a form of frequency analysis

– Relevant features of speech are more readily visible

in the amplitude spectrum than in the raw waveform

Spectral analysis of speech

But: the ear is not a spectrum analyzer.

– Auditory frequency selectivity is best at low

frequencies and gets progressively worse at higher

frequencies.

Measuring formants

0 500 1000 1500 2000 2500 3000 3500 4000 0

20

40

60

Click here to play vowel /i/ Relative amplitude (d

B)

0 500 1000 1500 2000 2500 3000 3500 4000 0

20

40

60

Click here to play vowel /a/ Relative amplitude (d

B)

0 500 1000 1500 2000 2500 3000 3500 4000 0

20

40

60

Click here to play vowel /u/

Frequency (Hz)

Relative amplitude (d

B)

Click on lines to hear individual harmonics

Formant frequency peak

estimation requires an

interpolation process.

amplitude

spectrum

0 2 4 6-20

0

20

40

60

Frequency (kHz)

Am

plit

ude (

dB

)

Formant Estimation

F1 F2 F3Vowel spectra have peaks

corresponding to the center frequencies of formants

Formants

Spectrum of vowel in “head”

0 0.5 1 1.5 220

30

40

50

60

Frequency (kHz)

Am

plit

ude (

dB

)

Formant Estimation

F1But: harmonics also

generate spectral peaks;formant frequencies do not necessarily coincide with

harmonic frequencies

harmonics

Page 13: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

13

Children’s speech

Children’s voices have high F0s.

When F0 is 400 Hz (not unusual for 3-year

olds), only 4 harmonics appear in the

frequency range between 0-1600 Hz.

Frequency (kHz)

Am

pli

tud

e

0 1.5

Sparce sampling problem

Vowel identity is dependent on the

frequencies of formant peaks.

Formants are difficult to estimate when

fundamental frequency is high.

Am

pli

tud

e

0 1.5

0 0.5 1 1.5 2 2.5 3 3.50

10

20

30

40

50

60

70

Frequency (kHz)

Am

plit

ude

(dB

)

LPC spectrum

FFT spectrum

LPC spectrumFormants sometimes appear to merge

LPC

spectrum

Short-term amplitude spectrum

0 1 2 3 4-10

0

10

20

30

40

50

60

Frequency (kHz)

Am

plit

ude (

dB

)

F3 = 2755 Hz

F1 = 281 Hz

F2 = 2196 Hz

Page 14: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

14

Speech spectrograms

What is a speech spectrogram?

– Display of amplitude spectrum at successive

instants in time ("running spectra")

– How can 3 dimensions be represented on a two-

dimensional display? Gray-scale spectrogram

Waterfall plots

Animation

Speech spectrogram

running amplitude spectra (codes amplitude

changes in different frequency bands over time).

Speech spectrograms

What is a speech spectrogram?

– Display of amplitude spectrum at successive

instants in time ("running spectra")

– How can 3 dimensions be represented on a two-

dimensional display? Gray-scale spectrogram

Waterfall plots

Animation

Speech spectrograms

Why are speech spectrograms useful?

– Shows dynamic properties of speech

– Incorporates frequency analysis

– Related to speech production

– Helps to visually identify speech cues

“The watchdog”

waveform

spectrogram

F3

F2

F10

5

Fre

qu

en

cy (

kH

z)

Page 15: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

15

F3

F2

F1

0

4

Fre

qu

en

cy (

kH

z)

0

TrackDraw: a graphical speech synthesizer

TrackDraw: a graphical speech synthesizer

i “heed”

ɩ “hid”

e “hayed”

ɛ “head”

æ “had”ʌ “hut”

ɑ “hod”

ɔ “hawed”

o “hoed”

ʊ “hood”

u “who’d”

American English vowel space

Advancement

Height

F1

← F2

front center back

high

mid

low

Ə “schwa”

Peterson and Barney (1952)

Acoustic measurements (made from spectrograms) of formant frequencies (F1, F2, F3) in vowels spoken by 76 men, women and children.

vowel space: projection of a given talker’s vowels in a F1 x F2 plane

Simple target model: vowels are differentiated (perceptually) by F1 and F2 frequencies measured in the middle of the vowel (vowel target).

200 300 400 600 800 1000 200

1000

1500

2000

2500

3000

3500

i

i

L

a

cu

i

i

ie

æ

L

a

cu

i

i

ie

æ

L

a

c

u

i

F1 frequency (Hz)

Fre

qu

en

cy o

f F

2 (

Hz)

Peterson and Barney (1952)

Men

Women

Children

Peterson and Barney (1952)

Page 16: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

16

i “heed”

ɩ “hid”

e “hayed”

ɛ “head”

æ “had”

ʌ “hut”

ɑ “hod”

ɔ “hawed”

o “hoed”

ʊ “hood”

u “who’d”

American English vowel space

Advancement

Height

F1

← F2

front center back

high

mid

low

Ə “schwa”

0.2 0.4 0.6 0.8 1.0 0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Fre

quency o

f F

2 (

Hz)

Peterson and Barney (1952)

i

i

i

i

i

ii

i

ii

ii

ii

ii

iii

iii

i i

iiii

ii

ii

ii

i i

ii

i i

iii

i

ii

i i

ii

ii

ii

ii

iiii

ii

i

i i i

ii

ii

ii i

ii

i

ii ii

ii

ii

iii i

ii i i

i

ii

i

i i

ii

ii

ii

ii

i i

i

i

i

i

ii

i i

i

iii

ii

i i

ii ii

ii

i

i

e

eee

ee e

e e

eeeee

e

eee

e

e ee

eee

e

e eee

e

e

ee

e e

ee ee

ee ee

ee

ee

e

ee

eee ee

ee ee e

e e

eee

@

@

@@@

@

@@

@@@ @

@@

@

@@@

@@ @

@@

@@@

@@@

@

@

@@

@@@@

@

@

@

@ @

@@

@

@

@@

@@@

@

@@@@@@

@@

@@

@@

@@

LL

LL

LL

LL

L

L L

LLLLL

LL

LL

LL

LL

LL

LL

LL

L

L

LL

LL

LL LL

L

L

LL

L

L

L

L

L

L

LL

LL

L

L

L LLL

LL

LL

LL

a aa

aa a a a

aa

a

aaa

aa

aaa

aaa

a

a

aaaa

aa

aa

a

aaa

a aaa

aaa

a

aa

a

aaa

a

aa

a

a

a

aa

a

a

a

a

a

a aa

c c

cc

ccc

c

c

c

c

c

c

c

cc

cc

cc

c

c cc

cc

cc

c

c

cc

c

c

cc

ccc

c

c

cc

c

c

c

c

c

c

c

c

c

cc

cc

cc

c c

c

c

c

c

cc

vv

v

v

v

v

vvvv

v

vv vv

v

vvv

v

vv

vv

v

vv

v

vv

vv

v

v

vv

v

v

vv

vv

v

vv

v

v

v

v vvv

vv

v

v

vv

vvv

vvv vv

u

u

uu

u

u

u

u

u

uu

u

uu

u

u

u

u

uu

u

u

u

u

uu

uu

uu

uu

uu

u

u

uu

u

u

u u

u

u

u

u

uu

u

u

u

u

u

u

u

uu

u u

u

uuu

uuu

3

3

3

3

33

33

33

33 3

333

3

3

33

3

333

3333

3

3

33

3 33

333

3

33 3

33

33

3

3

33

33

3

3

3 3

333

3 33

33

33

iii

iii

ii

ii

ii

ii

iiii

ii

iii i

ii

i i

i i

ii

ii

ii

i iii

ii ii

i i

ii

i

iii

ii

ii

ii

iiii

ii

i

ii

iii

ii

ii

iii

ii

iiiii

ii

i

i

ii

iii

i

iiii

iiii

ii

i

i

iii

i

i iee

ee

e

e

ee

ee

e

eee

e e

ee

ee

ee

e

e eeee

ee

e

ee

e

ee

eee e

ee

e

eee

e

e

e ee e

ee

e

e

@@

@@

@@

@@@@

@

@

@@@@ @

@

@@

@

@@@

@

@@ @ @

@

@@

@@

@

@

@@ @

@@

@

@@

@@

@@

@@

@

@

@@

@@

L L

L

LLL

LL

L

L

LLLL

L

L

L

L

LLL

LL

L

LL

LL

LL

LLL

LL

L

L

L

L

L

LL

LL

L

LLL

LL

L

L

LL

L

L

aa

a

aaa

aa

aa aa

a aaaa

aa

aa

aa

a

a

a

aa

aaa

a

a

a aaa

a

a

a

aa

aa

aa

aa

aa

aa

a

a

a

a

c

c

cc

c

c

cc

c

c

cc

c cc

c

c

c c

c

c

c

c c

cc c

c

cc

cc

cc

cc

c

c

c

cc c

c

c

cccc

cc

c

c

cc

cc

vv

vv

vvv

vv v

vvv

v

vv

v

v

vvv

vv

v

vv

v v

v

v

v

vv

v

v

v

vv

vv

vv

v

v

v

v

vv v

v

vvvv

v v

u

u

u

uu

u

uu

uu

u

u

u u

u

u

u

u

u

u

u

u

u

u

u

u

u u

u

u

u

u

uu

u

u

uu

u

u

uu

u

u

uu

uu

u

uuu

u

u

u

u

3

3

33

3

3

33

3

3

3

33

3

3 3

33 3

3

333

3

33

3

33 3

33

33

33

33

3 3

3

3

33

3

3

3

3

33

33 3

3

33

ii

ii

i

ii

i

ii

i

i

ii

i

i

i

ii

i

i

i

iiii

ii

ii

ii

i

i

ii

ii

i i

i

i

i

i iiii

ii

i i

i i i

ii

i

ii

e

e

e

e

ee

ee

e

e

ee

eee

eee

e e

ee

eeee

ee

ee

@

@@

@

@ @

@@

@@

@

@

@@

@

@

@@

@@

@

@@

@ @

@

@@

@@

LLL

L

L

L

LL L

L

LL

LL

L

L

LL

LL

L L

L

L LLL

L L

L

a

a

a

a

a

a

a

aa

aa

a

aa

a

a

aa

a

a

a

a

a

a

a

a

a a

aa

c cc

c

c

c

cc

c

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

cc

vvvv

v

vv

v

v

v v

v

v

v

v

v

vv

v

v v

vv

v

v vv

v

v

v

u

u

u

u

u u

u

u

u

u u

u

u

u

u

u

u

u

u

u

u

u

u

u

u

u

uu

u

u

3

3

33

33

3

3

33

33

3

3

3

333

33

33

3

3

3 3

3333

Peterson and Barney (1952)

Frequency of F1 (kHz)

Fre

qu

ency

of

F2

(k

Hz)

0.2 0.4 0.6 0.8 1.00.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Fre

quency o

f F

2 (

Hz)

Peterson and Barney (1952)

ii

i

i

ii

i

i

ii

ii

ii

ii

iii

iii

i i

iiii

ii

ii

ii

i i

ii

i i

iii

i

ii

i i

ii

ii

ii

ii

iiii

ii

ii i i

ii

ii

ii i

ii

i

ii ii

ii

ii

iii i

ii i i

i

ii

i

i i

ii

ii

ii

ii

i i

ii

i

i

ii

i i

i

iii

ii

i iii i

i

ii

ii

e

eee

ee e

e e

eeeee

e

eee

ee e

eeee

e

e eee

ee

ee

e e

ee ee

ee ee

ee

ee

ee

e

eee ee

ee ee e

e e

eee

@

@@@@

@

@@

@@@ @

@@

@

@@@

@@ @

@@

@@@

@@@@

@

@@

@@@@

@

@

@

@ @@

@

@@

@@

@@@

@

@@@@@@

@@

@@

@@

@@

LL

LL

LL

LL

L

L L

LLLLL

LL

LL

LL

LL

LL

LL

LL

LL

LL

LL

LL LL

L

L

LL

L

L

LL

L

L

LL

LL

L

LLL

LL

LL

LL

LL

a aa

aa a aa

aa

a

aaa

aa

aaaaa

a

a

a

aaaa

aa

aa

aaaa

a aaa

aaa

a

aa

a

aaa

a

aa

a

a

a

aa

a

a

a

aa

a aa

c c

cc

ccc

c

c

c

c

c

c

c

cc

cc

cc

c

c cc

cc

cc

c

c

cc

c

c

cc

ccc

c

c

cc

c

c

c

c

c

c

c

c

c

cc

cc

cc

c c

c

c

c

c

cc

vv

v

v

v

v

vvvv

vvv vv

v

vvv

v

vv

vv

v

vv

v

vv

vv

v

v

vv

v

v

vv

vv

v

vv

v

v

v

v vvv

vv

v

v

vv

vvv

vvv vv

u

u

uu

u

u

u

u

u

uu

u

uu

u

u

u

u

uu

u

u

u

u

uu

uu

uu

uu

uu

u

u

uu

u

u

u u

uu

u

u

uu

u

u

u

u

u

u

u

uu

u u

u

uuu

uuu

3

3

3

3

33

333

3

33 33

33

3

3

33

3

333

3333

33

33

3 33

333

3

33 3

33

333

3

33

33

3

3

3 3

333

3 33

33

33

iii

iii

ii

ii

ii

ii

iiii

ii

iii ii

ii ii i

ii

ii

iii ii

ii

i iii i

ii

ii

iii

ii

i

ii

iiii

ii

ii

ii

ii

ii

ii

iii

ii

iiiii

ii

i

i

ii

iii

iii

iiii

ii

ii

ii

iii

i

i iee

ee

e

e

ee

ee

e

eee

e e

ee

ee

ee

e

e eeee

ee

e

ee

e

ee

eee e

ee

e

eee

ee

e eee

ee

e

e

@@

@@

@@

@@@@

@

@

@@@@ @@

@@@

@@@

@

@@ @ @

@

@@

@@

@

@

@@ @

@@

@

@@

@@

@@

@@

@

@

@@

@@

L L

L

LLL

LL

L

L

LLLL

LL

L

L

LLL

LL

L

LL

LL

LL

LLL

LL

L

L

L

L

L

LL

LL

L

LLL

LL

L

L

LL

L

L

aa

a

aaa

aa

aa aa

a aaaa

aa

aa

aa

a

a

a

aa

aaa

a

a

a aaa

a

aa

aa

aa

aa

aa

aa

aa

a

a

a

a

c

c

cc c

c

cc

c

c

cc

c cc

c

c

c c

c

c

c

c c

cc c

c

cc

cc

cc

cc

c

c

c

cc c

c

c

cccc

cc

c

c

cc

cc

vv

vv

vvv

vv v

vvv

v

vv

v

v

vvv

vv

v

vv

v v

v

v

v

vv

v

v

v

vv

vv

vv

v

v

v

v

vv v

v

vvvv

v v

u

u

u

uu

u

uu

uu

u

u

u u

uu

u

u

u

u

u

u

u

u

u

u

u u

u

u

u

u

uu

u

u

uu

u

u

uu

u

u

uu

uu

u

uuu

u

u

u

u

33

33

3

3

33

3

3

3

33

3

3 3

33 3

3

333

3

33

3

33 3

33

33

33

33

3 3

3

3

33

3

3

3

333

33 3

33

3

ii

ii

i

ii

i

ii

i

i

ii

i

i

i

ii

i

i

i

iiii

ii

ii

ii

i

i

ii

ii

i i

i

i

ii ii

ii

ii

i i

i i i

ii

i

ii

e

e

e

e

ee

ee

ee

ee

eee

eee

e e

ee

eeee

ee

ee

@

@@

@

@ @

@@

@@

@

@

@@

@

@

@@

@@

@

@@

@ @@

@@

@@

LLL

L

L

L

LL L

L

LL

LL

L

L

LL

LL

L L

L

L LLL

L L

L

a

a

a

a

a

a

a

aa

aa

a

aa

a

a

aa

a

a

a

a

a

a

aa

a a

aa

c ccc

c

c

cc

c

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

cc

vvvv

v

vv

v

v

v v

v

v

v

v

v

vv

v

v v

vv

v

v vv

v

v

v

u

u

u

u

u u

u

u

u

u u

u

u

u

u

u

u

u

u

u

u

u

u

u

u

u

uu

u

u

3

3

33

33

3

3

33

33

3

3

3

333

33

33

3

3

33

3333

Invariance problem

Frequency of F1 (kHz)

Fre

qu

ency

of

F2

(k

Hz)

Invariance problem

Dynamic cues in vowel perception

Talker normalization theories

– Potter and Steinberg (1950): invariant pattern

of stimulation shifted up or down along the

basilar membrane

– Miller (1989): Formant ratio theory

– Joos (1948): Frame of reference theory

– Nearey (1989): Extrinsic and intrinsic factors

Formant Dynamics

Formant frequency changes over time:

0 1 2 3 4 5 0

20

40

60

Frequency (kHz)

Am

plit

ude (

dB

)

F1

F2

F3

F4 F5

Vowel-inherent spectral change

Page 17: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

17

Dual-target model

F1

F2

F1

F2

Males

Females

““ formant tracks

5 6 7 8 9 10 11 12 13 14 15 16 17 18

1200

1400

1600

1800

Age (years)

Geo. M

ean form

ant fr

equency (

Hz)

Boys

Girls

FFs as a function of age and sex Vowel formant space: F1 x F2

Current study: Adults

300 400 500 600 800 1000

1000

1500

2000

2500

3000

3500

U o ɑ

æI

i

Uo

ɑ

ʌ

æɛ

I

i

Males

Females

F1 Frequency (Hz)

F2

Fre

que

ncy (

Hz)

ɛ

ʌ

Vowel formant space: F1 x F2Current study: Children (all age groups)

300 400 500 600 800 1000

1000

1500

2000

2500

3000

3500

Ã

I

ʌBoys

Girls

F1 Frequency (Hz)

F2

Fre

qu

en

cy (

Hz)

ii

II ɛ

ɛ ææU

U oo ɑ

ɑ

Page 18: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

18

Graphical interpretation of CLIH

(sliding template) model

Movement along

diagonal for

different

speakers

Fixed pattern

of ‘holes’ in the

template

correspond to

stored vowel

reference

pattern

Nearey & Assmann,

2006

5 6 7 8 9 10 11 12 13 14 15 16 17 18

100

150

200

300

Age (years)

Fundam

enta

l fr

equency (

Hz)

Boys

Girls

F0 as a function of age and sex F0 distribution – child talkers

50 100 150 200 250 300 350 4000

200

400

600

800

1000

Fundamental Frequency (Hz)

Num

ber

of

tokens

F0 distribution – males (blue)

50 100 150 200 250 300 350 4000

200

400

600

800

1000

Fundamental Frequency (Hz)

Num

ber

of

tokens

F0 distribution – females (red)

50 100 150 200 250 300 350 4000

200

400

600

800

1000

Fundamental Frequency (Hz)

Num

ber

of

tokens

Page 19: HCS 7367 Speech Perceptionassmann/hcs6367/lec1.pdf3 Human vocal tract Acoustics of speech Articulation Phonation Organs of speech • Lungs: apply pressure to generate air stream (power

19