ch11 lecture-f revised.ppt - the university of texas at...

4/23/2008

1

11Music and Speechp

Perception

Properties of sound

• Sound has three basic dimensions: – Frequency (pitch)– Intensity (loudness)Intensity (loudness)– Time (length)

Properties of sound

• The frequency of a sound wave, measured in cycles per second or Hertz (abbreviated Hz) indicates the number of cycles each wave makes in one second. The more cycles per second, the higher the pitch we hear.

high-pitched tone

low-pitched toneTime (sec)

Properties of sound

• The intensity of a sound wave is measured in decibels (abbreviated dB). The higher the intensity of a sound, the louder it sounds.

Time (sec)

low intensity

high intensity

Decibel scale

Rustling leaves 10 dBPurring cat 30 dBBird singing nearby 50 dBg g yConversational speech 60 dBBarking dog nearby 70 dBRoaring lion 90 dBThunder 110 dBJet taking off nearby 120 dB

11 Music and Speech Perception

• Music

• Speech

4/23/2008

2

11 Music

• Music as a way to express thoughts and emotions

– Pythagoras: Numbers and musical intervals

– Some clinical psychologists practice music therapy

11 Music (cont’d)

• Musical notes

– Sounds of music extend across frequency range: 25–4200 Hz

11 Frequency Range of Music 11 Music (cont’d)

• Octave: The interval between two sound frequencies having ratio of 2:1

– Example: Middle C (C4) has fundamental frequency of 261.6 Hz; notes that are one octave from middle C are 130.8 (C3) and 523.2 (C5)

– There is more to musical pitch than just frequency!

11 Music (cont’d)

• Tone height:

– A sound quality whereby a sound is heard to be of higher or lower pitch; monotonically related to frequency

• Tone chroma:

– A sound quality shared by tones that have the same octave interval

• Musical helix:

– visualize musical pitch

11 Tone Height and Chroma Helix

4/23/2008

3

11 Music (cont’d)

• Musical instruments: Produce notes below 4 kHz

– Listeners:

• Great difficulty perceiving octave relationships between tones when one or both tones are greater than 5 kHz

11 Music (cont’d)

• Chords: Created when three or more notes are played simultaneously

– Consonant or dissonant

– Consonant: Have simple ratios of note frequencies

– Dissonant: Less elegant ratios of note frequenciesDissonant: Less elegant ratios of note frequencies

11 Music (cont’d)

• Cultural differences

– Research on music perception:

• Western vs. Javanese

– Javanese culture:

F t ithi t t i ti• Fewer notes within an octave; greater variation in note’s acceptable frequencies

– Even young infants can learn to distinguish sounds in their native scale

11 Music (cont’d)

• Melody: An arrangement of notes or chords in succession

– Examples: “Twinkle, Twinkle Little Star,” “Baa Baa Black Sheep”

– Not a sequence of specific sounds: Sensitive to change, (i.e., change in octave)

– Notes and chords vary in duration: Tempo; fast or slow

11 Music (cont’d)

• Rhythm: Not just in music!

– Lots of activities have rhythm: Walking, waving, finger tapping, etc.

– Bolton (1894): Experiments with sequence of identical sounds, perfectly spaced in time, but no rhythm; listeners reported hearing first sound of group as “accented,” while the rest remained unaccented

– More examples: Car, train rides

– “Syncopated auditory polyrhythms”: When different rhythms are overlapped

11 Dominant Rhythm

4/23/2008

4

11 Music (cont’d)

• Melody development

– 8-month olds: Able to learn new melodies

– 7-month olds: Can associate particular movements with particular melodies

11 Speech

• The Vocal Tract:

– The airway above the larynx used for production of speech. Includes the oral tract and nasal tract

• Humans capablility for speech sounds

– 5000 languages spoken today utilizing over 8505000 languages spoken today, utilizing over 850 different speech sounds

– flexibility of vocal tract:

• important in speech production

Primate vocal tract

The evolution of speech:

a comparative review

W. Tecumseh Fitch

Trends in Cognitive Sciences 4(7) July 2000

larynx

orangutan chimpanzee human tongue body

larynx

air sac

Specialized vocal resonators

Howler Monkey (Alouatta)

Gibbon (Hylobates)

Human vocal tract

Acoustics of speech

● Phonation● Articulation

4/23/2008

5

Organs of speech

• Lungs: apply pressure to generate air stream (power supply)

• Larynx: air forced through the glottis, a small opening between the vocal folds (sound source)

• Vocal tract: pharynx, oral and nasal cavities serve as complex resonators (filter)

Source-filter theory of speech production

Output

Sound

Vocal

Tract

Vibrating

Vocal

folds

Lungs

Soundfolds

Power supply Oscillator Resonator

Vocal fold oscillation

• One-mass model– Air flow through the

glottis during the closing phase travels at the

d b fsame speed because of inertia, producing lowered air pressure above the glottis.

Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.

Source-Filter Theory

From Fitch, W.T. (2000). Trends in Cognitive Sciences

Audio demo: the source signal

Source signal for an adult male voiceSource signal for an adult female voiceSource signal for a 10-year childSource signal for a 10 year child

Source properties

In voiced sounds the glottal source spectrum contains a series of lines called harmonics. The lowest one is called the fundamental frequency (F0).

F0

0 200 400 600 800 1000-50

-40

-30

-20

-10

0

Rel

ativ

e A

mpl

itude

(dB

)

Frequency (Hz)

Amplitude

Spectrum

4/23/2008

6

F0 range in speech

• 80-200 Hz for adult males• 180-400 Hz for adult females• 200-600 Hz for young children

• Even-tempered Scale for the Octave Above Middle C

261.63 Hz 523.26 Hz

F0 measurement

• F0 estimates • 48 sentences• 1 adult male

(blue)• 1 adult female

(red)

Intonation patterns

• Declination: pitch tends to fall over the course of a sentence or utterance; “declination reset”

quen

cy (H

z)

F l

Fundamental frequency variationLee, Potamianos & Narayanan JASA 1999

70–90% shift

Fund

amen

tal F

req

Age (years)

Females

Males

Demo: harmonic synthesis

Additive harmonic synthesis: vowel /i/Cumulative sum of harmonics: vowel /i/Additive synthesis: “wheel”Cumulative sum of partials:

Filter propertiesThe vocal tract resonances (called formants) produce peaks in the spectrum envelope. Formants are labelled F1, F2, F3, ... in order of increasing frequency.

F1 F2F

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Ampl

itude

in d

B

F3

F4Amplitude

Spectrum

(with superimposed

LPC spectral envelope)

4/23/2008

7

source⊗ filter⊗ radiation=output sound

/ i /

Frequency

Am

plitu

de

/ A /

11 The Basic Components of Speech Production (Part 1)

11 The Basic Components of Speech Production (Part 2) 11 Speech (cont’d)

• Speech Production

– respiration (lungs)

– phonation (vocal cords)

– articulation (vocal tract)

11 Speech (cont’d)

• Respiration and phonation

– Initiating speech:

• diaphragm pushes air out of lungs, through trachea, up to larynx

– At larynx:

f• Air must pass through two vocal folds

– Children:

• Few vocal cords, high-pitched voices

– Adult men:

• Larger mass of vocal cords, low-pitched voices


• Articulation

– Area above larynx: Vocal tract

– Humans have ability to change shape of vocal tract by manipulating jaw, lips, tongue, body,tract by manipulating jaw, lips, tongue, body, tongue tip, velum

– Manipulations: Articulation

– Resonance characteristics

4/23/2008

8

11 Sound from Vocal Folds 11 Speech (cont’d)

• Peaks in speech spectrum: Formants

– Labeled by number, from lowest to highest (F1, F2, F3)—concentrations in energy occur at different frequencies, depending on length of vocal tract

– For shorter vocal tracts (children, short adults): Formants are at higher frequencies than for longer vocal tracts

– Spectrogram

Spectral analysis of speech

Why perform a frequency analyses of speech?

E +b i t f f f l i– Ear+brain carry out a form of frequency analysis

– Relevant features of speech are more readily visible in the amplitude spectrum than in the raw waveform

Spectral analysis of speech

But: the ear is not a spectrum analyzer.

– Auditory frequency selectivity is best at lowAuditory frequency selectivity is best at low frequencies and gets progressively worse at higher frequencies.

Short-term amplitude spectrum

0 1 2 3 4-10

0

10

20

30

40

50

60

Frequency (kHz)

Am

plitu

de (d

B) F3 = 2755 Hz

F1 = 281 HzF2 = 2196 Hz

Speech spectrogramrunning amplitude spectra (codes amplitude changes in different frequency bands over time).

4/23/2008

9

11 Sound SpectrogramSpeech terminology…

Fundamental frequency (F0): lowest frequency component in voiced speech sounds, linked to vocal fold vibration.Formants: resonances of the vocal tract.

F0Formant

Frequency

Amplitude

Source properties: Pitch

Fundamental frequency (F0) is determined by the rate of vocal fold vibration, and is responsible for the perceived voice pitch.

Harmonicity and Periodicity• Period: regularly repeating pattern in

the waveform Period duration T0 = 6 ms Waveform

Harmonics are integer multiples of F0 and are evenly spaced in frequency

0 0.5 1 1.5 2 2.5

-40

-20

0

20

F (kH )

Am

plitu

de (d

B)

F0 = 1000 / 6 = 166 Hz F0 = 1 / T0

Amplitude Spectrum

Source properties: Pitch

F0 can be removed by filtering (as in telephone circuits) and the pitch remains the same. This is the problem of the missing fundamental, one of the oldest problems in hearing science.Pitch is determined by the frequency pattern of the harmonics (or their equivalent in the time domain, the periodicities in the waveform).

20

-10

0

e in

dB

Formants correspond to peaks in the spectrum envelope.

F1 F2 F3

F4

0 1 2 3 4-50

-40

-30

-20

Frequency (kHz)

Ampl

itude

4/23/2008

10

Vowel formant space: F1 x F2Assmann & Katz JASA 2000

2.0

2.5

3.0

i

IQ

i

IE

Q

ncy

(Hz)

cy (k

Hz)

Females

E

250 500 1000 1500

1.0

1.5

Q

√

Ao

U

√

Ao

U

F1 Frequency (Hz)

F2 F

requ

en

F1 Frequency (Hz)

F2 F

requ

enc

Males

2000

2500

3000

3500

i

ιε

æ

i

i

ιε

æ

i

i

ιε

æ

Λ

i

f F2

(Hz)

Peterson and Barney (1952)

Men

Women

Children

Peterson and Barney (1952)

200 300 400 600 800 1000 200

1000

1500

Λα

cu

Λ

α

cu

Λ

α

c

u

F1 frequency (Hz)

Freq

uenc

y of

11 Vowel Sounds of English 11 Speech (cont’d)

• Classifying speech sounds

– described in terms of articulation

• Place of articulation:

– (e.g., at lips, at alveolar ridge, etc.)

V i i• Voicing:

– Whether cords are vibrating, not vibrating

– English: Only small sample of sounds used by languages around the world; a lot more sounds are used!


• Speech perception

– Speech production: Very fast

– Experienced talkers: Coarticulation; attributes of successive speech units overlap in articulatory orsuccessive speech units overlap in articulatory or acoustic patterns

– Example: Say the word “moody” a few times, observe what happens to tongue


• Categorical perception

– Research on acoustic cues used to distinguish different speech sounds

– “Categorical perception”: Sharp labeling (identification), discontinuous discrimination, predictability of discrimination

4/23/2008

11


• How special is speech?

– “Motor theory” of speech perception: Special mechanisms just for perceiving speech

• Problems for motor theory:

– Speech production is just as complex, so speech perception complexity must be result of this complexitycomplexity

– Nonhuman animals can learn to respond to speech signals in similar way to human listeners

– Categorical speech perception: Not limited to speech sounds; also includes musical intervals; other categorical perceptions: faces, facial expressions

11 Categorical Perception


• Coarticulation and spectral contrast

– Research: How speech perception is explained by general ways that hearing, and perception works

– Example: • Perception of coarticulated speech; explained

b f d t l f dit tby some fundamental ways of auditory system

– Contrast effects: • Melodies are defined by changes between

adjacent notes; spectral contrast helps listeners perceive speech


• Using multiple acoustic cues

– Perception depends on experience

– Comparison with face recognition


• Learning to listen

– Babies learn to listen even before they are born!

– Prenatal experience: Newborns prefer hearing their mother’s voice over other women’s voices

– Research of babies in FranceResearch of babies in France


• Becoming a native listener

– Sound distinctions specific to various languages

– Example: “r” and “l” are not distinguished in Japanese

– Infants begin filtering out irrelevant acoustics longInfants begin filtering out irrelevant acoustics long before they start to say speech sounds

4/23/2008

12


• Learning words

– How do we know where one word ends and another begins?

– Research (Saffran et al.): Novel language with infants; can learn to distinguish words from nonwords after two minutes

– Statistical learning


• Speech in the Brain

– Brain damage follows patterns of blood vessels, not brain function, so difficult to study

– PET and fMRI studies: Help to learn about speech processing in brain

– Listening to speech: Left and right superior temporal lobes are activated more strongly in response to speech than to nonspeech sounds


– Some challenges in creating good controls for experiments

– Categorical perception tasks

– How do processes of hearing and speaking interact?

Syrinx

http://www.indiana.edu/~songbird/research/Cardinal/Movie/7.mov

ch11 lecture-f revised.ppt - the university of texas at...

Documents