psy 369: psycholinguistics language comprehension speech recognition

37
PSY 369: Psycholinguistics Language Comprehension Speech recognition

Upload: daisy-turner

Post on 04-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PSY 369: Psycholinguistics Language Comprehension Speech recognition

PSY 369: Psycholinguistics

Language ComprehensionSpeech recognition

Page 2: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Announcements Homeworks

Extended due date for Homework 5 to Thursday Posted Homework 6 (due Apr 3)

Briefly go over that at start of class Cut the number of homework from 11 down to 8

still will drop lowest grade, so top 7 count) So class “total” will be out of 925 points instead of 1000 Hope to have Hwks 7 & 8 in there soon (one speech

error collection, 1 journal summary), they’ll be for after exam 3

Page 3: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Different features than visualVisual word recognition Speech Perception

Some parallel input Orthography

Letters Clear delineation Difficult to learn

Serial input Phonetics/Phonology

Acoustic features Usually no delineation “Easy” to learn

Where are you going

Page 4: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Acoustic features

Spectrogram Time on the x-axis Amplitude is

represented by the darkness of the lines

Frequency (pressure under which the air is pushed) on the y-axis

Page 5: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Acoustic features

<-- Formant transitions ------->

<-- F1 -->

<-- F 2

<-- F 3Burst -->

Formants - bands of resonant frequencies Formant transitions - up or down movement of formants Steady states - flat formant patterns

Bursts - sudden release of air

Page 6: PSY 369: Psycholinguistics Language Comprehension Speech recognition

40 ms5 msbit pit

Formants - bands of resonant frequencies Formant transitions - up or down movement of formants Steady states - flat formant patterns

Bursts - sudden release of air Voice onset time (VOT) - when the voicing begins

relative to the onset of the phoneme

Acoustic features

Page 7: PSY 369: Psycholinguistics Language Comprehension Speech recognition

The confusion of palatalized labials > dentals & alveolars

[

What looks similar to the eye will probably seem similar to the ear!

Page 8: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard problems: Ambiguity in speech signal

Chest Jew Wade Aim In ItJust you wait a minute

Delights Haven DimeDaylight Savings Time

Canoes He Wad Ice HeCan You See What I See?

Free Quaintly As Quest Shuns Frequently Asked Questions

http://www.playmadgabonline.com/

TV ad

Page 9: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception

Segmentation problem Lack of Invariance Linearity (parallel transmission) Co-articulation Trading relations

Page 10: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception

Segmentation problem: Unlike visual input, the acoustic input is not physically segmented

Illusion of silence. There are no silent gaps in the wave form, even though we may “hear” some.

Page 11: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception

Segmentation problem: Unlike visual input, the acoustic input is not physically segmented

Here the silence that we see in the acoustics isn’t perceived as a gap in the word

Page 12: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception

Lack of Invariance: One phoneme should have a one waveform

This is not the case. The /i/ (‘ee’) in ‘money’ and ‘me’ are different

Show me the money

Page 13: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception

Lack of Invariance: One phoneme should have a one waveform

Another example: Here is the phoneme /d/ followed by different vowels

Page 14: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception

Lack of Invariance: One phoneme should have a one waveform

And another. The phrase has five /t/ phonemes, but there are not 5 identical sweeps in the spectrogram

There aren’t invariant cues for phonetic segments Although the search continues

Peter buttered the burnt toast

Page 15: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception

Linearity (parallel transmission): Acoustic features often spread themselves out over other sounds

Where does show start and money end?

Wave form

Show me the money

Page 16: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception Co-articulation: the influence of the articulation

(pronunciation) of one phoneme on that of another phoneme. Essentially, producing more than one speech sound at once May be helpful because it allows some parallel transmission of information

(possibly helping predict what’s coming next) Each sound partially shaped by sounds before & after it

keel vs kill vs cool / kil / vs / kIl / vs / kul / (IPA characters)

place of articulation and rounding on the k differ a lot different versions of “the same sound” in

different contexts from different speakers

This is what allows us to talk so fast May be helpful because it allows some parallel transmission of

information (possibly helping predict what’s coming next)

Page 17: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception

Trading relations Most phonetic distinctions have more than one acoustic cue

as a result of the particular articulatory gesture that gives the distinction.

Voice-onset-time (VOT) Energy in burst Onset frequency of the first formant Placement in syllable

e.g., slit–split – the /p/ relies on silence and rising formant, different mixtures of these can result in the same perception

Perception must establish some "trade-off" between the different cues.

Page 18: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Hard Problems in Speech Perception

Many factors that may be important Acoustic Information Visual information Prosodic information Semantic context Syntactic structure

Top-down

UNDERSTANDING

Bottom-up

Page 19: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Using Visual information

McGurk effect

The McGurk effect: McGurk and MacDonald (1976)• Showed people a video where the audio and the video don’t

match (Think “dubbed movie”)• Visual /ga/ with auditory /ba/ often hear /da/

Implications• Phoneme perception is an active process • Influenced by both audio and visual information

Page 20: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Beyond the segment Prosodic factors (supra segmentals)

English: Speech is divided into phrases. Every phrase has a focus. Word stress is meaningful in English. Stressed syllables are aligned in a fairly regular rhythm, while

unstressed syllables take very little time. An extended flat or low-rising intonation at the end of a phrase can

indicate that a speaker intends to continue to speak. A falling intonation sounds more final.

Page 21: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Beyond the segment Prosodic factors (supra segmentals)

Stress Emphasis on syllables in sentences

On meaning “black bird” versus “blackbird”

Top-down effects on perception Better anticipation of upcoming segments when syllable is stressed

Rate Speed of articulation: Faster talking - shorter vowels, shorter

VOT Normalization: taking the speaker’s rate into account

Intonation Use of pitch to signify different meanings across sentences

Page 22: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Top-down effects on Speech Perception

Sentence context effects Excised speech Sentence context effects Phoneme restoration effect

Top-down

UNDERSTANDING

Bottom-up

Page 23: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Excised Speech

Syntactic and semantic cues can help

Pollack & Pickett (1964)

Task: Recorded conversations and excised individual words. Presented the words to listeners for identification

Within context Out of context

Results:Words out of context were only recognized 47% of time, identification was greatly improved with contextSuggests that clarity in speech reflects processing (top-down as well as bottom-up)

Page 24: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Semantic Influences

Garnes & Bond (1976):

16 tokens, spanning the spectrum of bait-date-gate (/b/ /d/ /g/) So some were clear examples (unambiguous), others in between (ambiguous)

3 carrier sentences (context): Here’s the fishing gear and the ______. Check the time and the _______. Paint the fence and the _______.

Results If unambiguous, get semantically implausible sentences

(Paint the fence and the bait.) If ambiguous (near a phoneme boundary), semantic context effects

– interpreted the word as contextually appropriate

Page 25: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Phoneme restoration effect

Task: Listen to a sentence which contained a word from which a phoneme was deleted and replaced with another noise (e.g., a cough)

The state governors met with their respective legi*latures convening in the capital city.

* /s/ deleted and replaced with a cough

Click here for a demo and additional informationWarren (1970)

Results:

• Participants heard the word normally, despite the missing phoneme

• Usually failed to identify which phoneme was missing

Interpretation:

We can use top-down knowledge to “fill in” the missing information

Page 26: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Phoneme restoration effect

Warren and Warren (1970)

What if the missing phoneme was ambiguous?

The *eel was on the axle.

Results:

Participants heard the contextually appropriate word normally, despite the missing phoneme

The *eel was on the shoe. The *eel was on the orange. The *eel was on the table.

Page 27: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Phoneme restoration effect

Possible loci of phoneme restoration effects Perceptual loci of effect:

Lexical or sentential context influences the way in which the word is initially perceived.

Post-perceptual loci of effect: Lexical or sentential context  influences decisions

about the nature of the missing phoneme information.

Samuel (2001) attempts to look at this issue

Page 28: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Cross-modal priming

Shillcock (1990)

hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming)

The scientist made a new discovery last year.

Hear:

NUDIST

Page 29: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Cross-modal priming

The scientist made a novel discovery last year.

Hear:

Shillcock (1990)

hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming)

NUDIST

Page 30: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Cross-modal priming

The scientist made a novel discovery last year.

Hear:

The scientist made a new discovery last year. faster

Shillcock (1990)

hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming)

NUDIST

Page 31: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Cross-modal priming

The scientist made a novel discovery last year.

Hear:

NUDIST gets primed by segmentation error

faster

Although no conscious report of hearing “nudist”

The scientist made a new discovery last year.

Shillcock (1990)

hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming)

Page 32: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Theories of speech perception Motor Theory Direct Realist Theory General Auditory Approach Cohort TRACE Model

Page 33: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Motor theory of speech perception

A. Liberman (initially proposed in late 50s, recent Liberman & Mattingly, 1985)

Direct translation of acoustic speech into articulatory categories Holds that speech perception and motor control involved linked

(or the same) neural processes Theory held that categorical perception was a direct reflection of articulatory

organization Categories with discrete gestures (e.g., consonants) will be perceived

categorically Categories with continuous gestures (e.g., vowels) will be perceived continuously

There is a speech perception module that operates independently of general auditory perception

Page 34: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Frontal slices showing differential activation elicited during lip and tongue movements (Left), syllable articulation including [p] and [t] (Center), and listening to syllables including [p] and

[t] (Right)

Pulvermüller F et al. PNAS 2006;103:7865-7870

©2006 by National Academy of Sciences

Speech Perception & the brain

Page 35: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Motor theory of speech perception Some problems for MT

Categorical perception found in non-speech sounds (e.g., music)

Categorical perception for speech sounds in non-humans Chinchillas can be trained to show categorical perception of /t/ and /d/

consonant-vowel syllables (Kuhl & Miller, 1975)

Page 36: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Other theories of speech perception Direct Realist Theory (C. Fowler and others)

Similar to Motor theory, articulation representations are key, but here they are directly perceived (related to Gibson’s perceptual theory)

Perceiving speech is part of a more general perception of gestures that involves the motor system

General Auditory Approach (e.g., Diehl, Massaro) Do not invoke special mechanisms for speech

perception, instead rely on more general mechanisms of audition and perception

For nice reviews see: Diehl, Lotto, & Holt (2003) Galantucci, Fowler, Turvey (2006)

Page 37: PSY 369: Psycholinguistics Language Comprehension Speech recognition

Other theories of spoken word rec. Cohort Model (Marslen-Wilson & Welsh, 1978; Discussed last time)

1) The acoustic information at the beginning of a word activates a “cohort” of possible words

2) Syntax and semantics influence the selection of the target word from the cohort

TRACE Model (Elman and McClelland 1984, 1986) Connectionist, parallel distributed model Processing occurs through excitatory and inhibitory

connections – in processing units called nodes 3 levels of nodes: features, phonemes, and words all highly

interconnected