ch11 lecture-f revised.ppt - the university of texas at...
TRANSCRIPT
![Page 1: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/1.jpg)
4/23/2008
1
11Music and Speechp
Perception
Properties of sound
• Sound has three basic dimensions: – Frequency (pitch)– Intensity (loudness)Intensity (loudness)– Time (length)
Properties of sound
• The frequency of a sound wave, measured in cycles per second or Hertz (abbreviated Hz) indicates the number of cycles each wave makes in one second. The more cycles per second, the higher the pitch we hear.
high-pitched tone
low-pitched toneTime (sec)
Properties of sound
• The intensity of a sound wave is measured in decibels (abbreviated dB). The higher the intensity of a sound, the louder it sounds.
Time (sec)
low intensity
high intensity
Decibel scale
Rustling leaves 10 dBPurring cat 30 dBBird singing nearby 50 dBg g yConversational speech 60 dBBarking dog nearby 70 dBRoaring lion 90 dBThunder 110 dBJet taking off nearby 120 dB
11 Music and Speech Perception
• Music
• Speech
![Page 2: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/2.jpg)
4/23/2008
2
11 Music
• Music as a way to express thoughts and emotions
– Pythagoras: Numbers and musical intervals
– Some clinical psychologists practice music therapy
11 Music (cont’d)
• Musical notes
– Sounds of music extend across frequency range: 25–4200 Hz
11 Frequency Range of Music 11 Music (cont’d)
• Octave: The interval between two sound frequencies having ratio of 2:1
– Example: Middle C (C4) has fundamental frequency of 261.6 Hz; notes that are one octave from middle C are 130.8 (C3) and 523.2 (C5)
– There is more to musical pitch than just frequency!
11 Music (cont’d)
• Tone height:
– A sound quality whereby a sound is heard to be of higher or lower pitch; monotonically related to frequency
• Tone chroma:
– A sound quality shared by tones that have the same octave interval
• Musical helix:
– visualize musical pitch
11 Tone Height and Chroma Helix
![Page 3: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/3.jpg)
4/23/2008
3
11 Music (cont’d)
• Musical instruments: Produce notes below 4 kHz
– Listeners:
• Great difficulty perceiving octave relationships between tones when one or both tones are greater than 5 kHz
11 Music (cont’d)
• Chords: Created when three or more notes are played simultaneously
– Consonant or dissonant
– Consonant: Have simple ratios of note frequencies
– Dissonant: Less elegant ratios of note frequenciesDissonant: Less elegant ratios of note frequencies
11 Music (cont’d)
• Cultural differences
– Research on music perception:
• Western vs. Javanese
– Javanese culture:
F t ithi t t i ti• Fewer notes within an octave; greater variation in note’s acceptable frequencies
– Even young infants can learn to distinguish sounds in their native scale
11 Music (cont’d)
• Melody: An arrangement of notes or chords in succession
– Examples: “Twinkle, Twinkle Little Star,” “Baa Baa Black Sheep”
– Not a sequence of specific sounds: Sensitive to change, (i.e., change in octave)
– Notes and chords vary in duration: Tempo; fast or slow
11 Music (cont’d)
• Rhythm: Not just in music!
– Lots of activities have rhythm: Walking, waving, finger tapping, etc.
– Bolton (1894): Experiments with sequence of identical sounds, perfectly spaced in time, but no rhythm; listeners reported hearing first sound of group as “accented,” while the rest remained unaccented
– More examples: Car, train rides
– “Syncopated auditory polyrhythms”: When different rhythms are overlapped
11 Dominant Rhythm
![Page 4: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/4.jpg)
4/23/2008
4
11 Music (cont’d)
• Melody development
– 8-month olds: Able to learn new melodies
– 7-month olds: Can associate particular movements with particular melodies
11 Speech
• The Vocal Tract:
– The airway above the larynx used for production of speech. Includes the oral tract and nasal tract
• Humans capablility for speech sounds
– 5000 languages spoken today utilizing over 8505000 languages spoken today, utilizing over 850 different speech sounds
– flexibility of vocal tract:
• important in speech production
Primate vocal tract
The evolution of speech:
a comparative review
W. Tecumseh Fitch
Trends in Cognitive Sciences 4(7) July 2000
larynx
orangutan chimpanzee human tongue body
larynx
air sac
Specialized vocal resonators
Howler Monkey (Alouatta)
Gibbon (Hylobates)
Human vocal tract
Acoustics of speech
● Phonation● Articulation
![Page 5: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/5.jpg)
4/23/2008
5
Organs of speech
• Lungs: apply pressure to generate air stream (power supply)
• Larynx: air forced through the glottis, a small opening between the vocal folds (sound source)
• Vocal tract: pharynx, oral and nasal cavities serve as complex resonators (filter)
Source-filter theory of speech production
Output
Sound
Vocal
Tract
Vibrating
Vocal
folds
Lungs
Soundfolds
Power supply Oscillator Resonator
Vocal fold oscillation
• One-mass model– Air flow through the
glottis during the closing phase travels at the
d b fsame speed because of inertia, producing lowered air pressure above the glottis.
Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.
Source-Filter Theory
From Fitch, W.T. (2000). Trends in Cognitive Sciences
Audio demo: the source signal
Source signal for an adult male voiceSource signal for an adult female voiceSource signal for a 10-year childSource signal for a 10 year child
Source properties
In voiced sounds the glottal source spectrum contains a series of lines called harmonics. The lowest one is called the fundamental frequency (F0).
F0
0 200 400 600 800 1000-50
-40
-30
-20
-10
0
Rel
ativ
e A
mpl
itude
(dB
)
Frequency (Hz)
Amplitude
Spectrum
![Page 6: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/6.jpg)
4/23/2008
6
F0 range in speech
• 80-200 Hz for adult males• 180-400 Hz for adult females• 200-600 Hz for young children
• Even-tempered Scale for the Octave Above Middle C
261.63 Hz 523.26 Hz
F0 measurement
• F0 estimates • 48 sentences• 1 adult male
(blue)• 1 adult female
(red)
Intonation patterns
• Declination: pitch tends to fall over the course of a sentence or utterance; “declination reset”
quen
cy (H
z)
F l
Fundamental frequency variationLee, Potamianos & Narayanan JASA 1999
70–90% shift
Fund
amen
tal F
req
Age (years)
Females
Males
Demo: harmonic synthesis
Additive harmonic synthesis: vowel /i/Cumulative sum of harmonics: vowel /i/Additive synthesis: “wheel”Cumulative sum of partials:
Filter propertiesThe vocal tract resonances (called formants) produce peaks in the spectrum envelope. Formants are labelled F1, F2, F3, ... in order of increasing frequency.
F1 F2F
0 1 2 3 4-50
-40
-30
-20
-10
0
Frequency (kHz)
Ampl
itude
in d
B
F3
F4Amplitude
Spectrum
(with superimposed
LPC spectral envelope)
![Page 7: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/7.jpg)
4/23/2008
7
source⊗ filter⊗ radiation=output sound
/ i /
Frequency
Am
plitu
de
/ A /
11 The Basic Components of Speech Production (Part 1)
11 The Basic Components of Speech Production (Part 2) 11 Speech (cont’d)
• Speech Production
– respiration (lungs)
– phonation (vocal cords)
– articulation (vocal tract)
11 Speech (cont’d)
• Respiration and phonation
– Initiating speech:
• diaphragm pushes air out of lungs, through trachea, up to larynx
– At larynx:
f• Air must pass through two vocal folds
– Children:
• Few vocal cords, high-pitched voices
– Adult men:
• Larger mass of vocal cords, low-pitched voices
11 Speech (cont’d)
• Articulation
– Area above larynx: Vocal tract
– Humans have ability to change shape of vocal tract by manipulating jaw, lips, tongue, body,tract by manipulating jaw, lips, tongue, body, tongue tip, velum
– Manipulations: Articulation
– Resonance characteristics
![Page 8: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/8.jpg)
4/23/2008
8
11 Sound from Vocal Folds 11 Speech (cont’d)
• Peaks in speech spectrum: Formants
– Labeled by number, from lowest to highest (F1, F2, F3)—concentrations in energy occur at different frequencies, depending on length of vocal tract
– For shorter vocal tracts (children, short adults): Formants are at higher frequencies than for longer vocal tracts
– Spectrogram
Spectral analysis of speech
Why perform a frequency analyses of speech?
E +b i t f f f l i– Ear+brain carry out a form of frequency analysis
– Relevant features of speech are more readily visible in the amplitude spectrum than in the raw waveform
Spectral analysis of speech
But: the ear is not a spectrum analyzer.
– Auditory frequency selectivity is best at lowAuditory frequency selectivity is best at low frequencies and gets progressively worse at higher frequencies.
Short-term amplitude spectrum
0 1 2 3 4-10
0
10
20
30
40
50
60
Frequency (kHz)
Am
plitu
de (d
B) F3 = 2755 Hz
F1 = 281 HzF2 = 2196 Hz
Speech spectrogramrunning amplitude spectra (codes amplitude changes in different frequency bands over time).
![Page 9: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/9.jpg)
4/23/2008
9
11 Sound SpectrogramSpeech terminology…
Fundamental frequency (F0): lowest frequency component in voiced speech sounds, linked to vocal fold vibration.Formants: resonances of the vocal tract.
F0Formant
Frequency
Amplitude
Source properties: Pitch
Fundamental frequency (F0) is determined by the rate of vocal fold vibration, and is responsible for the perceived voice pitch.
Harmonicity and Periodicity• Period: regularly repeating pattern in
the waveform Period duration T0 = 6 ms Waveform
Harmonics are integer multiples of F0 and are evenly spaced in frequency
0 0.5 1 1.5 2 2.5
-40
-20
0
20
F (kH )
Am
plitu
de (d
B)
F0 = 1000 / 6 = 166 Hz F0 = 1 / T0
Amplitude Spectrum
Source properties: Pitch
F0 can be removed by filtering (as in telephone circuits) and the pitch remains the same. This is the problem of the missing fundamental, one of the oldest problems in hearing science.Pitch is determined by the frequency pattern of the harmonics (or their equivalent in the time domain, the periodicities in the waveform).
20
-10
0
e in
dB
Formants correspond to peaks in the spectrum envelope.
F1 F2 F3
F4
0 1 2 3 4-50
-40
-30
-20
Frequency (kHz)
Ampl
itude
![Page 10: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/10.jpg)
4/23/2008
10
Vowel formant space: F1 x F2Assmann & Katz JASA 2000
2.0
2.5
3.0
i
IQ
i
IE
Q
ncy
(Hz)
cy (k
Hz)
Females
E
250 500 1000 1500
1.0
1.5
Q
√
Ao
U
√
Ao
U
F1 Frequency (Hz)
F2 F
requ
en
F1 Frequency (Hz)
F2 F
requ
enc
Males
2000
2500
3000
3500
i
ιε
æ
i
i
ιε
æ
i
i
ιε
æ
Λ
i
f F2
(Hz)
Peterson and Barney (1952)
Men
Women
Children
Peterson and Barney (1952)
200 300 400 600 800 1000 200
1000
1500
Λα
cu
Λ
α
cu
Λ
α
c
u
F1 frequency (Hz)
Freq
uenc
y of
11 Vowel Sounds of English 11 Speech (cont’d)
• Classifying speech sounds
– described in terms of articulation
• Place of articulation:
– (e.g., at lips, at alveolar ridge, etc.)
V i i• Voicing:
– Whether cords are vibrating, not vibrating
– English: Only small sample of sounds used by languages around the world; a lot more sounds are used!
11 Speech (cont’d)
• Speech perception
– Speech production: Very fast
– Experienced talkers: Coarticulation; attributes of successive speech units overlap in articulatory orsuccessive speech units overlap in articulatory or acoustic patterns
– Example: Say the word “moody” a few times, observe what happens to tongue
11 Speech (cont’d)
• Categorical perception
– Research on acoustic cues used to distinguish different speech sounds
– “Categorical perception”: Sharp labeling (identification), discontinuous discrimination, predictability of discrimination
![Page 11: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/11.jpg)
4/23/2008
11
11 Speech (cont’d)
• How special is speech?
– “Motor theory” of speech perception: Special mechanisms just for perceiving speech
• Problems for motor theory:
– Speech production is just as complex, so speech perception complexity must be result of this complexitycomplexity
– Nonhuman animals can learn to respond to speech signals in similar way to human listeners
– Categorical speech perception: Not limited to speech sounds; also includes musical intervals; other categorical perceptions: faces, facial expressions
11 Categorical Perception
11 Speech (cont’d)
• Coarticulation and spectral contrast
– Research: How speech perception is explained by general ways that hearing, and perception works
– Example: • Perception of coarticulated speech; explained
b f d t l f dit tby some fundamental ways of auditory system
– Contrast effects: • Melodies are defined by changes between
adjacent notes; spectral contrast helps listeners perceive speech
11 Speech (cont’d)
• Using multiple acoustic cues
– Perception depends on experience
– Comparison with face recognition
11 Speech (cont’d)
• Learning to listen
– Babies learn to listen even before they are born!
– Prenatal experience: Newborns prefer hearing their mother’s voice over other women’s voices
– Research of babies in FranceResearch of babies in France
11 Speech (cont’d)
• Becoming a native listener
– Sound distinctions specific to various languages
– Example: “r” and “l” are not distinguished in Japanese
– Infants begin filtering out irrelevant acoustics longInfants begin filtering out irrelevant acoustics long before they start to say speech sounds
![Page 12: Ch11 Lecture-f revised.ppt - The University of Texas at Dallasotoole/PSY_4362/Ch11_Lecture-f_revised.pdf · Power supply Oscillator Resonator ... phase travels at the same speedb](https://reader033.vdocuments.site/reader033/viewer/2022042801/5aad508f7f8b9aa06a8e2656/html5/thumbnails/12.jpg)
4/23/2008
12
11 Speech (cont’d)
• Learning words
– How do we know where one word ends and another begins?
– Research (Saffran et al.): Novel language with infants; can learn to distinguish words from nonwords after two minutes
– Statistical learning
11 Speech (cont’d)
• Speech in the Brain
– Brain damage follows patterns of blood vessels, not brain function, so difficult to study
– PET and fMRI studies: Help to learn about speech processing in brain
– Listening to speech: Left and right superior temporal lobes are activated more strongly in response to speech than to nonspeech sounds
11 Speech (cont’d)
– Some challenges in creating good controls for experiments
– Categorical perception tasks
– How do processes of hearing and speaking interact?
Syrinx
http://www.indiana.edu/~songbird/research/Cardinal/Movie/7.mov