speakers exhibit different assimilation in function and content words. e.g. /m/ assimilates to place...
TRANSCRIPT
Speakers exhibit different assimilation in function and content words. E.g. /m/ assimilates to place of next consonant in I’m but not lime or crime: 11
I’m blowing/going/watching: lime bark lime goes crime wave:
In principle, the acoustic pattern can be used by the listener to inform about the grammatical class of the speech segment being perceived. In its place in an utterance, ‘I’m’ has few or no acoustic competitors.
Productive morpheme
Unproductive morpheme
Sarah Hawkins & Ingrid JohnsrudePhonetics Laboratory, Dept. of Linguistics, University of Cambridge, UK; Dept. of Psychology, Queen’s University, Canada
Models of speech perception often emphasize phonetic or phonological categories (features, phonemes, gestures) that:
• are stable, abstract entities;• result from stripping (irrelevant) variation from the
speech stream;• are prerequisite to the processing of other aspects of
speech (grammar and meaning).
What are “phonetic categories”?
Funded in part by the Leverhulme Trust
The problem Conclusions
The acoustic realization of a phoneme is systematically influenced by10,11: 1) allophonic variation:
• position in the syllable (eg “tip” vs “pit”)• boundaries between words (eg “grey train” vs “great rain”)• grammatical status (eg the productivity of a morpheme;
content vs function words)2) speaker intent & register (discourse function, casualness, rate)3) talker identity
Experiments show listeners use much of this systematic variability12-17
Anatomical considerations (cont)
Range & frequency effects: category boundaries tend towards the middle of the stimulus series. When stimuli are removed from one end of a continuum, the boundary shifts towards the other end. Stimulus frequency (& previous stimulus) affect current decision.1-4
Meaning: phonemic boundaries favour the phoneme related to the word in word-nonword continua;5 they favour sensible meanings in word-word continua in sentences.6
Perceptual learning: Rather little exposure to a novel pronunciation is required for a phonemic category boundary to shift.3,7-9
But:1) phonemic category boundaries shift with phonetic
context, meaning, and the function of the utterance.
2) much variability in speech sounds is systematic and potentially informative about features of speech other than phonemic categories.
1) Phonemic category boundaries are context-dependent; thus not stable
2) Acoustic variability is systematic and potentially informative
Examples: Grammatical information conveyed by systematic acoustic-phonetic variation
Anatomical considerations
Syllable-internal spectro-temporal relationships indicate morphemic productivity. Spectrograms of ‘mistimes’ & ‘mistakes’ from ‘I’d be surprised if Tess___ it.’ The first four phonemes (/mist/) are the same. Their acoustic differences produce a different rhythm that may signal that ‘mis’ in ‘mistimes’ is a productive morpheme, whereas ‘mis’ in ‘mistakes’ is not.
This systematic acoustic variation has implications for models of word recognition incorporating lexical competition.15, 18-20
ABOVE: Anatomical organization of the macaque cortex suggests four or five discrete, hierarchically organized stages of auditory processing between primary core and frontal cortex 21
The anatomical organization of the brain is not consistent with serial, feedforward models of speech perception.19
The multiple cognitive processes required for speech comprehension probably rely on multiple cortical networks that operate in parallel. This functional organization may, in humans, map onto anatomically segregated, hierarchically organized processing streams similar to those identified in macaque monkeys.21,22
Different pathways may be differentially specialized to serve different processes or operate on complementary representations of speech(eg articulatory; phonological; crossmodal).23
RIGHT: Temporofrontal connections are parallel among multiple levels of auditory cortex (belt to superior temporal sulcus), segregated, bidirectional, and follow a strict anterior-posterior topographic organization 24,25
• fine phonetic detail informs about perceptual units at multiple linguistic ‘levels’ (phonetics/phonology/grammar/meaning) simultaneously
• and thus over different time domains (variable grain sizes)12,13,35
Hence a phonetic category:
• is relational & plastic: each element is bound with other elements (larger, smaller) and no element can be described independently of its prosodic, grammatical, & functional context
• entails cognitively and neuropsychologically distributed processes which operate on different types of information13,36,37
Some implications for models of speech perception:
• speech perception, like visual object perception,38,39 mayconform to Bayesian models: e.g. hypotheses about speech segmental identity (at multiple scales of temporal integration)may be generated by ‘higher-order’ regions and tested in‘lower-order’ regions.
• major challenges for the next generation of models include:• use of acoustic-phonetic information at all linguistic levels• long-range phonetic dependencies, at all linguistic levels
A phonemic category boundary shift due to the Ganong effect.
boundary
shift
short VOT (d) long VOT (t)
% /d
/
100
0
50
dask-taskdash-tash
Petrides & Pandya (1988). J Comp Neurol 273: 52-66
Seltzer & Pandya (1989). J Comp Neurol 281: 97-113
Results of functional neuroimaging studies of speech perception are consistent with multiple, parallel, cascaded auditory streams of processing.22,23,26-28
Information flow in the auditory system is not unidirectional. Cortical feedforward connections each have their feedback complement.29-31
Anatomy suggests converging influences from multiple higher stages of perception, removed from the stage in question by zero, one, or more intervening stages.21
Neurophysiological studies suggest that information in even core auditory cortex regions is integrated over multiple time domains.32,33
1. Parducci. A. (1974) Psychophysical Judgment & Measurement. ed. Carterette/Friedman, 127-141.
2. Rosen, S., (1979) Journal of Phonetics 7, 393-402.3. Pastore, R. (1987) Categorical Perception, ed. Harnad.
Cambridge. 29-52.4. Hawkins/Stevens (1985) J. Acoust. Soc. Am. 77, 1560-755. Ganong, W.F., (1980) J. Exp. Psych.: HPP 6, 110-125.6. Borsky, S., et al.(2000) J. Psycholing. Res., 29, 155-168.7. Ladefoged/Broadbent (1957) J. Ac..Soc. Am. 29, 98-104.8. Norris, D. et al. (2003) Cognit Psychol, 47, 204-38.9. Eisner/McQueen (2006) J. Acoust. Soc. Am.119, 1950-3.10. Abercrombie, D, (1967) Elements of General Phonetics.11. Local, J.K., (2003) J. Phonetics 31, 321-339.12. Hawkins/Smith (2001) Italian J. Linguistics 13, 99-188.13. Hawkins, S., (2003) J. Phonetics 31, 373-405.14. Pisoni, D.B. (1997) Talker Variability in Speech
Processing. ed. Johnson, JW, Academic. 9-32.15. Davis et al.(2002) J. Exp. Psych.: HPP 28, 218-244.16. Kemps, R. et al.,(2005) Mem. Cognit. 33, 430-46.17. Salverda, A., et al. (2003) Cognition 90, 51-89.18. Marslen-Wilson (1990) In Cognitive Models of Speech
Processing. Ed Altmann, Cambridge. 148-172.19. Norris, D. (1994) Cognition 52, 189-234.20. McClelland/Elman (1986) Cognit. Psychol. 18, 1-86.21. Kaas, J.. et al. (1999) Curr. Opin. Neurobiol. 9, 164-170.22. Davis/Johnsrude (2003) J. Neurosci. 23, 3423-31.23. Scott/Johnsrude (2003) Trends Neurosci. 26, 100-7.24. Petrides/Pandya (1988) J. Comp. Neurol. 273, 52-66.25. Seltzer/Pandya (1989) J. Comp. Neurol. 281, 97-113.26. Davis/Johnsrude/Horwitz (2004) Soc. Neurosci. Ann. Mtg 27. Rodd, R. et al. (2005) Cereb. Cortex 15, 1261-9.28. Buchsbaum. B.R. et al. (2005) Neuron 48, 687-97.29. Pickles, J. (1988) An Introduction to the Physiology of
Hearing. London: Academic Press.30. Pandya, D.N. (1995) Rev. Neurol., 151, 486-494.31. de la Motte, L. et al. (2006) J. Comp. Neurol. 496, 27-71.32. Nelken, I. et al. (2003) Biol. Cybern. 89, 397-406.33. Ulanovsky, N., et al. (2004) J. Neurosci. 24, 10440-53.34. Ogden, R. et al. (2000) Comput. Sp. & Lang.14, 177-21035. Boemio, A., et al (2005) Nat. Neurosci. 8, 389-95.36. Andruski, J et al. (1994) Cognition 52, 163-18737. Blumstein, S.et al. (2005) J. Cog. Neurosci. 17, 1353-66.38. Murray, S. et al. (2002) Proc. Natl. Acad. Sc. 99,5164-9.39. Kersten, D. et al. (2004). Ann. Rev. Psychol. 55, 271-304.
“Bottom-up” sequence of processing levels typically assumed
“Top-down” influences are poorly understood but are typically assumed to be separable from bottom-up processes
abstract initial categories e.g. phonological features
abstract phonemes
abstract word representations
meaninggrammar
??
????
Usually neglected by models of speech perception
Standard domain of models of speech perception
Time (s)0 0.900952
-0.45
0.7464
0
Phonetic and anatomical data are consistent with the hypotheses that
Speech acoustics inform about multiple linguistic levels simultaneously12,13,34 References
Perceptual information available in the short sections of sound, ‘mist,’ taken from ‘Tess mistimes it’ and ‘Tess mistakes it’. Information about featural, phonemic and lexical identity, and syllabic, morphemic and grammatical structure is conveyed simultaneously in the fine acoustic-phonetic detail, comprising both ‘events’ at segment boundaries and longer-term relationships. Prior knowledge is required for linguistic information—at all levels—to be extracted from sensory input. No unit is identifiable independent of context, and no unit/level is primary. Information is mapped onto prosodic structures linked to grammatical structures12,13,34 (example at http://kiri.ling.cam.ac.uk/sarah/docs/CNS06trees.pdf).
Time (s)0 2.24538
-0.7581
1
0
Time (s)0 2.10844
-0.8403
0.9441
0
1 2 3 4 1 2? 3 4
Bold font = nodes in linguistic structure = potential perceptual units
Events
Relationships
transient + aspiration
˅ t ɛ s mɪ s t a ɪ m z ɪ t ˅ t ɛ s m ɪ s t e ɪ k s ɪ tPerceptual correlate
Relative durations:sonorant:sibilant
plus sibilant: silence
RelationshipsRelative durations:
sonorant:sibilant
Long
phoneme /s/; features for mis, maybe dis;features for [t]?; syllable coda continues??morpheme continues (is nonproductive)??
rel.Late
phoneme /s/; voiceless coda; coda ends?; new syllable? features for [t]?; morpheme ends?? productive morpheme/same word??
rel. Early
4. fricative-silence boundary
1:12:1
1:1
rel. Late
AbruptClear
Nature
Short
1:23:1
1:2
rel. Early
UnclearUnclear
Nature
features for nasal? labial??high vowel? front vowel??
features for [m]; phoneme /m/?high front vowel?
2. nasal-oral boundary + formant definition
same as ‘mistimes’ except:syllable is unstressed (weak)?
syllable coda starts; rhyme has voiceless coda??; features for [s]; phoneme /s/?syllable is unstressed (weak, light)??
3. frication start
weak heavy syllableweak light syllable?
weak heavy syllable; strong (stressed) syll. onset of same word (monomorphemic polysyllable)?; defocussed verb missed??
weak, light syllable; productive morpheme mis? (dis??)silence +intonation heralds new syll. onset, new foot?
same as ‘mistimes’new syllable (simple onset); morpheme; wordpoor segment identity
Events (see waveforms )
1. periodic, nasal
new strong syll. onset [st]; new foot. Con-firms monomorphemic word beginning mis(t), vis, bis (dis?); features for [t]; phoneme /t/
confirms: productive morpheme mis (dis??); new strong syllable onset [th], new foot, new morpheme,same polymorphemic word; features for [th]; phoneme /t/
Perceptual correlate Acoustic cue