auditory perception april 9, 2009 auditory vs. acoustic so far, we’ve seen two different auditory...

47
Auditory Perception April 9, 2009

Upload: lee-rose

Post on 02-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Auditory Perception

April 9, 2009

Auditory vs. Acoustic• So far, we’ve seen two different auditory measures:

1. Mels (unit of perceived pitch)

• Auditory correlate of Hertz (frequency)

2. Sones (unit of perceived loudness)

• Auditory correlate of decibels (intensity)

• Both were derived from pitch and loudness estimation experiments…

Masking• Another scale for measuring auditory frequency emerged in the 1960s.

• This scale was inspired by the phenomenon of auditory masking.

• One sound can “mask”, or obscure, the perception of another.

• Unmasked:

• Masked:

• Q: How narrow can we make the bandwidth of the noise, before the sinewave becomes perceptible?

• A: Masking bandwidth is narrower at lower frequencies.

Critical Bands• Using this methodology, researchers eventually determined that there were 24 critical bands of hearing.

• The auditory system integrates all acoustic energy within each band.

• Two tones within the same critical band of frequencies sound like one tone

• Ex: critical band #9 ranges from 920-1080 Hz

• F1 and F2 for might merge together

• Each critical band 0.9 mm on the basilar membrane.

• The auditory system consists of 24 band-pass filters.

• Each filter corresponds to one unit on the Bark scale.

Bark Scale of Frequency

• The Bark scale converts acoustic frequencies into numbers for each critical band

Bark TableBand Center Bandwidth Band Center

Bandwidth

1 50 20-100 13 1850 1720-2000

2 150 100-200 14 2150 2000-2320

3 250 200-300 15 2500 2320-2700

4 350 300-400 16 2900 2700-3150

5 450 400-510 17 3400 3150-3700

6 570 510-630 18 4000 3700-4400

7 700 630-770 19 4800 4400-5300

8 840 770-920 20 5800 5300-6400

9 1000 920-1080 21 7000 6400-7700

10 1170 1080-1270 22 8500 7700-9500

11 1370 1270-1480 23 10500 9500-12000

12 1600 1480-1720 24 13500 12000-15500

Your Grandma’s Spectrograph

• Originally, spectrographic analyzing filters were constructed to have either wide or narrow bandwidths.

Spectral Differences

• Acoustic vs. auditory spectra of F1 and F2

Cochleagrams• Cochleagrams are spectrogram-like representations which incorporate auditory transformations for both pitch and loudness perception

• Acoustic spectrogram vs. auditory cochleagram representation of Cantonese word

• Check out Peter’s vowels in Praat.

Cochlear Implants• Cochlear implants transmit sound directly to the cochlea through a series of band-pass filters…

• like the critical bands in our native auditory system.

• These devices can benefit profoundly deaf listeners with nerve deafness.

• = loss of working hair cells in the inner ear.

• Contrast with: a hearing aid, which is simply an amplifier.

• Old style: amplifies all frequencies

• New style: amplifies specific frequencies, based on a listener’s particular hearing capabilities.

Cochlear Implants A Cochlear Implant artificially stimulates the nerves which are connected to the cochlea.

Nuts and Bolts• The cochlear implant chain of events:

1. Microphone

2. Speech processor

3. Electrical stimulation

• What the CI user hears is entirely determined by the code in the speech processor

• Number of electrodes stimulating the cochlea ranges between 8 to 22.

• poor frequency resolution

• Also: cochlear implants cannot stimulate the low frequency regions of the auditory nerve

Noise Vocoding• The speech processor operates like a series of critical bands.

• It divides up the frequency scale into 8 (or 22) bands and stimulates each electrode according to the average intensity in each band.

This results in what sounds (to us) like a highly degraded version of natural speech.

What CIs Sound Like• Check out some nursery rhymes which have been processed through a CI simulator:

CI Perception• One thing that is missing from vocoded speech is F0.

• …It only encodes spectral change.

• Last year, Aaron Byrnes put together an experiment testing intonation perception in CI-simulated speech for his honors thesis.

• Tested: discrimination of questions vs. statements

• And identification of most prominent word in a sentence.

• 8 channels:

• 22 channels:

The Findings• CI User:

• Excellent identification of the most prominent word.

• At chance (50%) when distinguishing between statements and questions.

• Normal-hearing listeners (hearing simulated speech):

• Good (90-95%) identification of the prominent word.

• Not too shabby (75%) at distinguishing statements and questions.

• Conclusion 1: F0 information doesn’t get through the CI.

• Conclusion 2: Noise-vocoded speech might not be a completely accurate CI simulation.

Mitigating Factors• The amount of success with Cochlear Implants is highly variable.

• Works best for those who had hearing before they became deaf.

• The earlier a person receives an implant, the better they can function with it later in life.

• Works best for (in order):

• Environmental Sounds

• Speech

• Speaking on the telephone (bad)

• Music (really bad)

Practical Considerations• It is largely unknown how well anyone will perform with a cochlear implant before they receive it.

• Possible predictors:

• lipreading ability

• rapid cues for place are largely obscured by the noise vocoding process.

• fMRI scans of brain activity during presentation of auditory stimuli.

Infrared Implants?• Some very recent research has shown that cells in the inner ear can be activated through stimulation by infrared light.

• This may enable the eventual development of cochlear implants with very precise frequency and intensity tuning.

• Another research strategy is that of trying to regrow hair cells in the inner ear.

One Last Auditory Thought• Frequency coding of sound is found all the way up in the auditory cortex.

• Also: some neurons only fire when sounds change.

A Philosophical Interlude• Q: What’s a category?

• A classical answer:

• A category is defined by properties.

• All members of the category exhibit the same properties.

• No non-members of the category exhibit all of those properties.

The properties of any member of the category may be split into:

• Definitive properties

• Incidental properties

Classical Example• A rectangle (in Euclidean geometry) may be defined as

having the following properties:

1. Four-sided, two-dimensional figure (quadrilateral)

2. Four right angles

This is a rectangle.

Classical Example• Adding a third property gives the figure a different

category classification:

1. Four-sided, two-dimensional figure (quadrilateral)

2. Four right angles

3. Four equally long sides

This is a square.

Classical Example• Altering other properties does not change the category

classification:

1. Four-sided, two-dimensional figure (quadrilateral)

2. Four right angles

3. Four equally long sides

This is still a square.

A. Is red.

definitive properties

incidental property

Classical Linguistic Categories• Formal phonology traditionally defined all possible speech sounds in terms of a limited number of properties, known as “distinctive features”. (Chomsky + Halle, 1968)

[d] = [CORONAL, +voice, -continuant, -nasal, etc.]

[n] = [CORONAL, +voice, -continuant, +nasal, etc.]

• Similar approaches have been applied in syntactic analysis. (Chomsky, 1974)

Adjectives = [+N, +V]

Prepositions = [-N, -V]

Prototypes• The psychological reality of classical categories was

called into question by a series of studies conducted by Eleanor Rosch in the 1970s.

• Rosch claimed that categories were organized around privileged category members, known as prototypes.

• (instead of being defined by properties)

• Evidence for this theory initially came from linguistic tasks:

1. Semantic verification (Rosch, 1975)

• Is a robin a bird?

• Is a penguin a bird?

2. Category member naming.

Prototype Category Example: “Bird”

Exemplar Categories• Cognitive psychologists in the late ‘70s (e.g., Medin & Schaffer, 1978) questioned the need for prototypes.

• Phenomena explained by prototype theory could be explained without recourse to a category prototype.

• The basic idea:

• Categories are defined by extension.

• Neither prototypes nor properties are necessary.

• Categorization works by comparing new tokens to all exemplars in memory.

• Generalization happens on the fly.

A Category, Exemplar-style

“square”

Back to Perception• When people used to talk about categorical perception, they meant perception of classical categories.

• A stop is either a [b] or a [g]

• (no in between)

• Remember: in classical categories, there are:

• definitive properties

• incidental properties

• Q: What are the properties that define a stop category?

• The definitive properties must be invariant.

• (shared by all category members)

• So…what are the invariant properties of stop categories?

The Acoustic Hypothesis• People have looked long and hard for invariant acoustic properties of stops, with little success.

• (and some people are still looking)

• Frequency values of compact (synthetic) bursts cueing different places of articulation, in various vowel contexts.

(Liberman et al., 1952)

Theoretical Revision• Since invariant acoustic properties could not be found (especially for velars)…

• It was assumed that listeners perceived (articulatory) gestures, not (acoustic) sounds.

• Q: What invariant articulatory properties define stop categories?

• A: If they exist, they’re hard to find.

• Motor Theory Revision #2: Listeners perceive “intended” gestures.

• Note: “intentions” are kind of impossible to observe.

• But they must be invariant…right?

Another Brick in the Wall• Another problem for motor theory:

• Perception of speech sounds isn’t always categorical.

• In particular: vowels are perceived in a more gradient fashion than stops.

• However, vowel perception becomes more categorical when the vowels are extremely short.

• It’s also hard to identify any invariant acoustic properties for vowels.

• Variation is rampant across:

• tokens

• speakers

• genders

• dialects

• age groups, etc.

• Variability = a huge problem for speech perception.

More Problems• Also: infants exhibit categorical perception, too…

• Even though they don’t know category labels.

• Chinchillas can do it, too!

An Alternative• It has been proposed that phoneme categories are defined by prototypes…

• which we use to identify vowels in speech.

• One relevant finding: the perceptual magnet effect.

• Part 1: play listeners a continuum of synthetic vowels in the neighborhood of [i].

• Task: judge how much each one sounds like [i].

• Some are better = prototypical

• Others are worse = non-prototypes

Perceptual Magnets• Part 2: define either a prototype or a non-prototype as a category center.

• Task: determine whether other vowels on the continuum belong to those categories.

• Result: more same responses when the category center is a prototype.

• Prototype = a “perceptual magnet”

Same? Different?

Prototypes, continued• The perceptual magnet prototypes are usually located at a listener’s average F1 and F2 values for [i].

• 4-month olds exhibit the perceptual magnet effect…

• but monkeys do not.

• Note: the prototype is the only thing that has to be “invariant” about the category.

• particular properties aren’t important.

• Testing a prototype model on the Peterson & Barney data yielded 51% correct classification.

• (Human listeners got 94% correct)

• Variability is still hard to deal with.

Flipping the Script• Another approach to speech perception is to preserve all variability that we hear…

• Rather than boiling it down to properties or prototypes.

• In this model, speech categories are defined by extension.

• = consist of exemplars

• So, your mental representaton of /b/ consists of every token of /b/ you’ve ever heard in your life.

• …rather than any particular acoustic or articulatory properties.

• Analogy: phonetics field project notes

• (your mind is a pack rat)

Exemplar Categorization1. Stored memories of speech experiences are known as

traces.

• Each trace is linked to a category label.

2. Incoming speech tokens are known as probes.

3. A probe activates the traces it is similar to.

• Note: amount of activation is proportional to similarity between trace and probe.

• Traces that closely match a probe are activated a lot;

• Traces that have no similarity to a probe are not activated much at all.

• A (pretend) example: traces = vowels from the Peterson & Barney data set. *

probe

• Activation of each trace is proportional to distance (in vowel space) from the probe.

highly activated

traces

low activation

Echoes from the Past• The combined average of activations from exemplars in memory is summed to create an echo of the perceptual system.

• This echo is more general features than either the traces or the probe.

• Inspiration: Francis Galton

Exemplar Categorization II• For each category label…

• The activations of the traces linked to it are summed up.

• The category with the most total activation wins.

• Note: we use all exemplars in memory to help us categorize new tokens.

• Also: any single trace can be linked to different kinds of category labels.

• Test: Peterson & Barney vowel data

• Exemplar model classified 81% of vowels correctly.

Exemplar Predictions• Point: all properties of all exemplars play a role in categorization…

• Not just the “definitive” ones.

• Prediction: non-invariant properties of speech categories should have an effect on speech perception.

• E.g., the voice in which a [b] is spoken.

• Or even the room in which a [b] is spoken.

• Is this true?

• Let’s find out…

Another Experiment!• Circle whether each word is a new or old word in the list.

1. 9. 17.

2. 10. 18.

3. 11. 19.

4. 12. 20.

5. 13. 21.

6. 14. 22.

7. 15. 23.

8. 16. 24.

Another Experiment!• Circle whether each word is a new or old word in the list.

25. 33.

26. 34.

27. 35.

28. 36.

29. 37.

30. 38.

31. 39.

32. 40.

Continuous Word Recognition• In a “continuous word recognition” task, listeners hear a long sequence of words…

• some of which are new words in the list, and some of which are repeats.

• Task: decide whether each word is new or a repeat.

• Twist: some repeats are presented in a new voice;

• others are presented in the old (same) voice.

• Finding: repetitions are identified more quickly and more accurately when they’re presented in the old voice. (Palmeri et al., 1993)

• Implication: we store voice + word info together in memory.