speakers exhibit different assimilation in function and content words. e.g. /m/ assimilates to place...

Speakers exhibit different assimilation in function and content words. E.g. /m/ assimilates to place of next consonant in I’m but not lime or crime: 11

I’m blowing/going/watching: lime bark lime goes crime wave:

In principle, the acoustic pattern can be used by the listener to inform about the grammatical class of the speech segment being perceived. In its place in an utterance, ‘I’m’ has few or no acoustic competitors.

Productive morpheme

Unproductive morpheme

Sarah Hawkins & Ingrid JohnsrudePhonetics Laboratory, Dept. of Linguistics, University of Cambridge, UK; Dept. of Psychology, Queen’s University, Canada

Models of speech perception often emphasize phonetic or phonological categories (features, phonemes, gestures) that:

• are stable, abstract entities;• result from stripping (irrelevant) variation from the

speech stream;• are prerequisite to the processing of other aspects of

speech (grammar and meaning).

What are “phonetic categories”?

Funded in part by the Leverhulme Trust

The problem Conclusions

The acoustic realization of a phoneme is systematically influenced by10,11: 1) allophonic variation:

• position in the syllable (eg “tip” vs “pit”)• boundaries between words (eg “grey train” vs “great rain”)• grammatical status (eg the productivity of a morpheme;

content vs function words)2) speaker intent & register (discourse function, casualness, rate)3) talker identity

Experiments show listeners use much of this systematic variability12-17

Anatomical considerations (cont)

[email protected]

[email protected]

Range & frequency effects: category boundaries tend towards the middle of the stimulus series. When stimuli are removed from one end of a continuum, the boundary shifts towards the other end. Stimulus frequency (& previous stimulus) affect current decision.1-4

Meaning: phonemic boundaries favour the phoneme related to the word in word-nonword continua;5 they favour sensible meanings in word-word continua in sentences.6

Perceptual learning: Rather little exposure to a novel pronunciation is required for a phonemic category boundary to shift.3,7-9

But:1) phonemic category boundaries shift with phonetic

context, meaning, and the function of the utterance.

2) much variability in speech sounds is systematic and potentially informative about features of speech other than phonemic categories.

1) Phonemic category boundaries are context-dependent; thus not stable

2) Acoustic variability is systematic and potentially informative

Examples: Grammatical information conveyed by systematic acoustic-phonetic variation

Anatomical considerations

Syllable-internal spectro-temporal relationships indicate morphemic productivity. Spectrograms of ‘mistimes’ & ‘mistakes’ from ‘I’d be surprised if Tess___ it.’ The first four phonemes (/mist/) are the same. Their acoustic differences produce a different rhythm that may signal that ‘mis’ in ‘mistimes’ is a productive morpheme, whereas ‘mis’ in ‘mistakes’ is not.

This systematic acoustic variation has implications for models of word recognition incorporating lexical competition.15, 18-20

ABOVE: Anatomical organization of the macaque cortex suggests four or five discrete, hierarchically organized stages of auditory processing between primary core and frontal cortex 21

The anatomical organization of the brain is not consistent with serial, feedforward models of speech perception.19

The multiple cognitive processes required for speech comprehension probably rely on multiple cortical networks that operate in parallel. This functional organization may, in humans, map onto anatomically segregated, hierarchically organized processing streams similar to those identified in macaque monkeys.21,22

Different pathways may be differentially specialized to serve different processes or operate on complementary representations of speech(eg articulatory; phonological; crossmodal).23

RIGHT: Temporofrontal connections are parallel among multiple levels of auditory cortex (belt to superior temporal sulcus), segregated, bidirectional, and follow a strict anterior-posterior topographic organization 24,25

• fine phonetic detail informs about perceptual units at multiple linguistic ‘levels’ (phonetics/phonology/grammar/meaning) simultaneously

• and thus over different time domains (variable grain sizes)12,13,35

Hence a phonetic category:

• is relational & plastic: each element is bound with other elements (larger, smaller) and no element can be described independently of its prosodic, grammatical, & functional context

• entails cognitively and neuropsychologically distributed processes which operate on different types of information13,36,37

Some implications for models of speech perception:

• speech perception, like visual object perception,38,39 mayconform to Bayesian models: e.g. hypotheses about speech segmental identity (at multiple scales of temporal integration)may be generated by ‘higher-order’ regions and tested in‘lower-order’ regions.

• major challenges for the next generation of models include:• use of acoustic-phonetic information at all linguistic levels• long-range phonetic dependencies, at all linguistic levels

A phonemic category boundary shift due to the Ganong effect.

boundary

shift

short VOT (d) long VOT (t)

% /d

/

100

0

50

dask-taskdash-tash

Petrides & Pandya (1988). J Comp Neurol 273: 52-66

Seltzer & Pandya (1989). J Comp Neurol 281: 97-113

Results of functional neuroimaging studies of speech perception are consistent with multiple, parallel, cascaded auditory streams of processing.22,23,26-28

Information flow in the auditory system is not unidirectional. Cortical feedforward connections each have their feedback complement.29-31

Anatomy suggests converging influences from multiple higher stages of perception, removed from the stage in question by zero, one, or more intervening stages.21

Neurophysiological studies suggest that information in even core auditory cortex regions is integrated over multiple time domains.32,33

1. Parducci. A. (1974) Psychophysical Judgment & Measurement. ed. Carterette/Friedman, 127-141.

2. Rosen, S., (1979) Journal of Phonetics 7, 393-402.3. Pastore, R. (1987) Categorical Perception, ed. Harnad.

Cambridge. 29-52.4. Hawkins/Stevens (1985) J. Acoust. Soc. Am. 77, 1560-755. Ganong, W.F., (1980) J. Exp. Psych.: HPP 6, 110-125.6. Borsky, S., et al.(2000) J. Psycholing. Res., 29, 155-168.7. Ladefoged/Broadbent (1957) J. Ac..Soc. Am. 29, 98-104.8. Norris, D. et al. (2003) Cognit Psychol, 47, 204-38.9. Eisner/McQueen (2006) J. Acoust. Soc. Am.119, 1950-3.10. Abercrombie, D, (1967) Elements of General Phonetics.11. Local, J.K., (2003) J. Phonetics 31, 321-339.12. Hawkins/Smith (2001) Italian J. Linguistics 13, 99-188.13. Hawkins, S., (2003) J. Phonetics 31, 373-405.14. Pisoni, D.B. (1997) Talker Variability in Speech

Processing. ed. Johnson, JW, Academic. 9-32.15. Davis et al.(2002) J. Exp. Psych.: HPP 28, 218-244.16. Kemps, R. et al.,(2005) Mem. Cognit. 33, 430-46.17. Salverda, A., et al. (2003) Cognition 90, 51-89.18. Marslen-Wilson (1990) In Cognitive Models of Speech

Processing. Ed Altmann, Cambridge. 148-172.19. Norris, D. (1994) Cognition 52, 189-234.20. McClelland/Elman (1986) Cognit. Psychol. 18, 1-86.21. Kaas, J.. et al. (1999) Curr. Opin. Neurobiol. 9, 164-170.22. Davis/Johnsrude (2003) J. Neurosci. 23, 3423-31.23. Scott/Johnsrude (2003) Trends Neurosci. 26, 100-7.24. Petrides/Pandya (1988) J. Comp. Neurol. 273, 52-66.25. Seltzer/Pandya (1989) J. Comp. Neurol. 281, 97-113.26. Davis/Johnsrude/Horwitz (2004) Soc. Neurosci. Ann. Mtg 27. Rodd, R. et al. (2005) Cereb. Cortex 15, 1261-9.28. Buchsbaum. B.R. et al. (2005) Neuron 48, 687-97.29. Pickles, J. (1988) An Introduction to the Physiology of

Hearing. London: Academic Press.30. Pandya, D.N. (1995) Rev. Neurol., 151, 486-494.31. de la Motte, L. et al. (2006) J. Comp. Neurol. 496, 27-71.32. Nelken, I. et al. (2003) Biol. Cybern. 89, 397-406.33. Ulanovsky, N., et al. (2004) J. Neurosci. 24, 10440-53.34. Ogden, R. et al. (2000) Comput. Sp. & Lang.14, 177-21035. Boemio, A., et al (2005) Nat. Neurosci. 8, 389-95.36. Andruski, J et al. (1994) Cognition 52, 163-18737. Blumstein, S.et al. (2005) J. Cog. Neurosci. 17, 1353-66.38. Murray, S. et al. (2002) Proc. Natl. Acad. Sc. 99,5164-9.39. Kersten, D. et al. (2004). Ann. Rev. Psychol. 55, 271-304.

“Bottom-up” sequence of processing levels typically assumed

“Top-down” influences are poorly understood but are typically assumed to be separable from bottom-up processes

abstract initial categories e.g. phonological features

abstract phonemes

abstract word representations

meaninggrammar

??

????

Usually neglected by models of speech perception

Standard domain of models of speech perception

Time (s)0 0.900952

-0.45

0.7464

0

Phonetic and anatomical data are consistent with the hypotheses that

Speech acoustics inform about multiple linguistic levels simultaneously12,13,34 References

Perceptual information available in the short sections of sound, ‘mist,’ taken from ‘Tess mistimes it’ and ‘Tess mistakes it’. Information about featural, phonemic and lexical identity, and syllabic, morphemic and grammatical structure is conveyed simultaneously in the fine acoustic-phonetic detail, comprising both ‘events’ at segment boundaries and longer-term relationships. Prior knowledge is required for linguistic information—at all levels—to be extracted from sensory input. No unit is identifiable independent of context, and no unit/level is primary. Information is mapped onto prosodic structures linked to grammatical structures12,13,34 (example at http://kiri.ling.cam.ac.uk/sarah/docs/CNS06trees.pdf).

Time (s)0 2.24538

-0.7581

1

0

Time (s)0 2.10844

-0.8403

0.9441

0

1 2 3 4 1 2? 3 4

Bold font = nodes in linguistic structure = potential perceptual units

Events

Relationships

transient + aspiration

˅ t ɛ s mɪ s t a ɪ m z ɪ t ˅ t ɛ s m ɪ s t e ɪ k s ɪ tPerceptual correlate

Relative durations:sonorant:sibilant

plus sibilant: silence

RelationshipsRelative durations:

sonorant:sibilant

Long

phoneme /s/; features for mis, maybe dis;features for [t]?; syllable coda continues??morpheme continues (is nonproductive)??

rel.Late

phoneme /s/; voiceless coda; coda ends?; new syllable? features for [t]?; morpheme ends?? productive morpheme/same word??

rel. Early

4. fricative-silence boundary

1:12:1

1:1

rel. Late

AbruptClear

Nature

Short

1:23:1

1:2

rel. Early

UnclearUnclear

Nature

features for nasal? labial??high vowel? front vowel??

features for [m]; phoneme /m/?high front vowel?

2. nasal-oral boundary + formant definition

same as ‘mistimes’ except:syllable is unstressed (weak)?

syllable coda starts; rhyme has voiceless coda??; features for [s]; phoneme /s/?syllable is unstressed (weak, light)??

3. frication start

weak heavy syllableweak light syllable?

weak heavy syllable; strong (stressed) syll. onset of same word (monomorphemic polysyllable)?; defocussed verb missed??

weak, light syllable; productive morpheme mis? (dis??)silence +intonation heralds new syll. onset, new foot?

same as ‘mistimes’new syllable (simple onset); morpheme; wordpoor segment identity

Events (see waveforms )

1. periodic, nasal

new strong syll. onset [st]; new foot. Con-firms monomorphemic word beginning mis(t), vis, bis (dis?); features for [t]; phoneme /t/

confirms: productive morpheme mis (dis??); new strong syllable onset [th], new foot, new morpheme,same polymorphemic word; features for [th]; phoneme /t/

Perceptual correlate Acoustic cue

speakers exhibit different assimilation in function and content words. e.g. /m/ assimilates to place...

Documents