auditory and linguistic processes in speech perception

27
Psychological Review 1976, Vol. 83, No. 2, 114-140 Auditory and Linguistic Processes in Speech Perception: Inferences from Six Fusions in Dichotic Listening James E. Cutting Wesleyan University and Raskins Laboratories, New Haven, Connecticut A number of phenomena in speech perception have been called fusion, but little effort has been made to compare these phenomena in a systematic fashion. The present paper examines six of them. All can be exemplified using the syllable /da/ as in dot, and all occur during dichotic listening. In each type of fusion, the robustness of the fused percept is observed against variation in three parameters: the relative onset time of the two opposite-ear stimuli, their relative intensity, and their relative fundamental frequency. Patterns of results are used to confirm the arrangement of the six fusions in a hierarchy, and supporting data are summoned in an analysis of the mechanisms that underlie each with reference to speech. The six fusions are sound localization, psychoacoustic fusion, spectral fusion, spectral/temporal fusion, phonetic feature fusion, and phonological fusion. They occur at three, perhaps four, different levels of perceptual analysis. The first two levels are characterized by per- ceptual integration, the other (s) by perceptual disruption and recombination. Many accounts of speech and language em- phasize a hierarchical structure (see, e.g., Fry, 1956; Pisoni, in press; Studdert-Kennedy, 1974, in press). Recently, the interface be- tween two particular levels in this hierarchy has aroused much attention: the general level logically just prior to linguistic analysis, typ- ically called the auditory level, and the first tier of the language hierarchy logically just subsequent to that auditory analysis, the phonetic level (Cutting, 1974; Pisoni, 1973; Studdert-Kennedy, Shankweiler, & Pisoni, 1972; Wood, 1975; Wood, Goff, & Day, 1971). These and other levels of processing appear to operate in parallel, but the out- come at one level appears to be contingent This research was supported in part by Grant HD-01994 from the National Institute of Child Health and Human Development to the Haskins Laboratories, and by a seed grant from Wesleyan University to the author. Initial portions of this research were reported in Cutting (Note 1) and in Cutting (1973), a doctoral dissertation submitted to Yale University. I thank Ruth S. Day for her help- fulness in all stages of this enterprise, and Michael Studdert-Kennedy, Michael Turvey, Terry Halwes, Bruno Repp, and the reviewers for many suggestions to improve the paper. Requests for reprints should be sent to J. Cutting, Haskins Laboratories, 270 Crown Street, New Haven, Connecticut 06510. on the output at a prior level (Marslen- Wilson, 1975; Wood, 1974, 1975). 1 The present paper looks at these and other levels from a new vantage point. Information-processing analyses assume that perception takes time and that systematic dis- ruption or complication of the process can reveal its underlying properties. This epis- temological position often leads the researcher to paradigms of masking in both visual (Turvey, 1973) and auditory (Darwin, 1971; Massaro, 1972, 1974) modalities. Masking occurs through the rivalry of two stimuli competing for the limited processing capac- ities of a single processor: Information is lost at a bottleneck in the perceptual system. The reciprocal process to rivalry, one equally suited to information-processing analysis, is fusion. In this process, information from the two stimuli is not strictly lost, but rather 1 More recently, however, many of the phenomena thought to characterize phonetic perception have been found to occur for music and musiclike sounds (Sever & Chiarello, 1974; Blechner, Day, & Cutting, in press; Cutting, in press; Cutting & Rosner, 1974; Cutting, Rosner, & Foard, in press; Locke & Kellar, 1973). Among other implications, these results sug- gest that "phoneticlike" perception is characteristic of general higher level processing in the auditory system encompassing both speech and music. 114

Upload: others

Post on 06-Jun-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Auditory and Linguistic Processes in Speech Perception

Psychological Review1976, Vol. 83, No. 2, 114-140

Auditory and Linguistic Processes in Speech Perception:Inferences from Six Fusions in Dichotic Listening

James E. CuttingWesleyan University and

Raskins Laboratories, New Haven, Connecticut

A number of phenomena in speech perception have been called fusion, butlittle effort has been made to compare these phenomena in a systematic fashion.The present paper examines six of them. All can be exemplified using thesyllable /da/ as in dot, and all occur during dichotic listening. In each typeof fusion, the robustness of the fused percept is observed against variation inthree parameters: the relative onset time of the two opposite-ear stimuli, theirrelative intensity, and their relative fundamental frequency. Patterns ofresults are used to confirm the arrangement of the six fusions in a hierarchy,and supporting data are summoned in an analysis of the mechanisms thatunderlie each with reference to speech. The six fusions are sound localization,psychoacoustic fusion, spectral fusion, spectral/temporal fusion, phonetic featurefusion, and phonological fusion. They occur at three, perhaps four, differentlevels of perceptual analysis. The first two levels are characterized by per-ceptual integration, the other (s) by perceptual disruption and recombination.

Many accounts of speech and language em-phasize a hierarchical structure (see, e.g., Fry,1956; Pisoni, in press; Studdert-Kennedy,1974, in press). Recently, the interface be-tween two particular levels in this hierarchyhas aroused much attention: the general levellogically just prior to linguistic analysis, typ-ically called the auditory level, and the firsttier of the language hierarchy logically justsubsequent to that auditory analysis, thephonetic level (Cutting, 1974; Pisoni, 1973;Studdert-Kennedy, Shankweiler, & Pisoni,1972; Wood, 1975; Wood, Goff, & Day,1971). These and other levels of processingappear to operate in parallel, but the out-come at one level appears to be contingent

This research was supported in part by GrantHD-01994 from the National Institute of ChildHealth and Human Development to the HaskinsLaboratories, and by a seed grant from WesleyanUniversity to the author. Initial portions of thisresearch were reported in Cutting (Note 1) and inCutting (1973), a doctoral dissertation submitted toYale University. I thank Ruth S. Day for her help-fulness in all stages of this enterprise, and MichaelStuddert-Kennedy, Michael Turvey, Terry Halwes,Bruno Repp, and the reviewers for many suggestionsto improve the paper.

Requests for reprints should be sent to J. Cutting,Haskins Laboratories, 270 Crown Street, New Haven,Connecticut 06510.

on the output at a prior level (Marslen-Wilson, 1975; Wood, 1974, 1975).1 Thepresent paper looks at these and other levelsfrom a new vantage point.

Information-processing analyses assume thatperception takes time and that systematic dis-ruption or complication of the process canreveal its underlying properties. This epis-temological position often leads the researcherto paradigms of masking in both visual(Turvey, 1973) and auditory (Darwin, 1971;Massaro, 1972, 1974) modalities. Maskingoccurs through the rivalry of two stimulicompeting for the limited processing capac-ities of a single processor: Information is lostat a bottleneck in the perceptual system. Thereciprocal process to rivalry, one equallysuited to information-processing analysis, isfusion. In this process, information from thetwo stimuli is not strictly lost, but rather

1 More recently, however, many of the phenomenathought to characterize phonetic perception havebeen found to occur for music and musiclike sounds(Sever & Chiarello, 1974; Blechner, Day, & Cutting,in press; Cutting, in press; Cutting & Rosner, 1974;Cutting, Rosner, & Foard, in press; Locke & Kellar,1973). Among other implications, these results sug-gest that "phoneticlike" perception is characteristicof general higher level processing in the auditorysystem encompassing both speech and music.

114

Page 2: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 115

transformed into something new. With theoutstanding exception of Julesz (1971), fusionhas received little systematic attention invision, and the phenomenon has receivedessentially no systematic attention in audi-tion. The present investigation takes a smallstep in this direction.

One reason that little attention has beenpaid to auditory fusions may be that, asJulesz (1971, p. 52) suggests, the "auditorysystem is basically an archaic, diffuse struc-ture that is hard to probe." A second reasonmay be that seemingly too large a numberof auditory phenomena have been calledfusion. When one reads a given paper in thisfield (e.g., Broadbent & Ladefoged, 1957;Day, 1968; Halwes, 1969; Perrott & Barry,1969; Sayers & Cherry, 1957), it is clearwith what phenomenon its authors are con-cerned; moreover, one feels confident thateach author or group of authors has properlylabeled each phenomenon as a fusion. How-ever, when one inspects the papers as a group,it is not clear that they share any commonground except for two superficial facts: Theyall use the word fusion in their titles andthey all present their stimuli dichotically—that is, one stimulus to one ear and one tothe other. Fusion is clearly not one phenome-non but many phenomena, yet how are theyrelated? At best, these findings appear to bejust "curious binaural phenomena" (Tobias,1972); at worst, they may lead the carefulreader to cowfusion. The purpose of thispaper, then, is (a) to enumerate the differentkinds of auditory fusion, (b) to arrange six ofthe dichotic phenomena relevant to speechprocessing in a hierarchy according to theprocessing characteristics implied by each,and (c) to confirm that arrangement by sub-jecting each fusion to a common set of pre-sentational and stimulus variables that haveproved important to auditory processing ingeneral.

The list of fusions considered here is notintended to be exhaustive, but is merely orga-nized with regard to three themes. First, allfusions discussed here are dichotic fusions,combinations of stimuli presented to oppositeears. This stipulation eliminates temporal fu-sions of repeating noise patterns (Guttman &

Julesz, 1963), tone patterns (van Noorden,1975), and briefly interrupted segments ofspeech (Huggins, 1964, 1975). Second, allfusions in the present study reflect processesunderlying the perception of speech. This con-straint eliminates consideration of centrallygenerated perceptions of simple pitches (Bilsen& Goldstein, 1974; Cramer & Huggins, 1958;Fourcin, 1962), patterns of pitches (Kubovy,Cutting, & McGuire, 1974), musical intervalsand chords (Houtsma & Goldstein, 1972),musical illusions (Deutsch, 1975a, 1975b;Deutsch & Roll, 1976), or integrated pulsetrains (Huggins, 1974). Third, all of theselected fusions are exemplified by a singlerule: The fused percept is different from thetwo dichotic inputs. This rule eliminates thedichotic switching-time experiments of Cherryand Taylor (1954) and Sayers and Cherry(1957) using speech stimuli, and the phe-nomenon of masking-level difference (see,e.g., Jeffress, 1972). Masking-level differenceis typically eliminated by the second stipula-tion—most often, tones are imbedded innoise, and alteration of interstimulus phaserelations yields percepts of either tone-plus-noise or noise alone. However, this is notnecessarily always the case. Speech stimulican easily be imbedded in noise and theirintelligibility increased through the manipula-tion of phase relations.

It may seem that these constraints eliminateall the possible fusions that might occur in au-dition. However, there are at least five othersthat are relevant to speech, and they will bediscussed with regard to a sixth and mostbasic type of fusion, sound localization. Sinceprevious authors have simply called their phe-nomena fusion, I have taken the liberty inmost cases of adding a descriptive adjective.The fusions considered in this paper are (a)sound localization, (b) psychoacoustic fusion,(c) spectral fusion, (d) spectral/temporalfusion, (e) phonetic feature fusion, and(f) phonological fusion. For purposes ofcomparability, each is discussed primarilywith regard to the syllable /da/, as in dot,as shown in Figure 1. Schematic spectro-grams of the dichotic stimuli are presentedto suggest the manner in which items canbe perceptually combined. The present paper

Page 3: Auditory and Linguistic Processes in Speech Perception

116 JAMES E. CUTTING

TABLE 1

PREVIOUS UPPER LIMITS OF INTERSTIMULUS DISCREP-ANCIES PERMITTING THE FUSION or SOUNDS PRE-

SENTED TO OPPOSITE EARS FOR SixTYPES OF FUSION

Fusion type

Sound localization

Onsettime

(msec)

2.5"

Intensity(db.)

60"

Fre-quency

(Hz)

25«80'

< 28Psychoacoustic fusion11

Spectral fusion1-'-1*

Spectral/temporal fusion111

Phonetic feature fusion0-*

Phonological fusion8

< 250'

6()p< 120i> ISO'<200»

4U»-

4Qm20'

> 15"

< 251

> 14s

> 20"

" Woodworth (1938, p. 528) for clicks.l> Cherry & Taylor (1954) for natural speech st imuli . Also

see Tobias (1972) and Babkoff (1975).0 Groen (1964) for binaural beats of sine waves.d Application of "cyclotean" stimuli of Kubovy, Cutting, &

McGuire (1974).• Licklider, Webster, & Hedlun (1950) and Perrott & Nelson

(1969) for binaural beats.' Perrott & Barry (1969) for sine waves near 2,000 Hz; con-

siderably greater differences possible for higher frequencies.Also see Thurlow & Elfner (1959) and Tobias (1972).

« Halwes (1969) for synthetic speech stimuli.' Existence of this fusion gleaned from Halwes (1969). Also

see Repp (1975d).1 Broadbent (1955) for natural speech patterns.J Broadbent & Ladefoged (1957) for synthetic speechlike

patterns.k Leakey, Sayers, & Cherry (1958) for nonspeech; Matzker

(1959), Linden (1964), and Smith & Resnick (1972) for naturalspeech; Halwes (1969), Ades (1974), and Haggard (Note 2)for synthetic speech. Several examples cited by Tobias (1972)may also fit into this category.

1 Pilot research by author using metronome-like ticks.m Rand (1974) for synthetic speech.n Nye, Nearey, & Rand (Note 3) and Nearey & Levitt (Note

4) for synthetic speech.° Shankweiler & Studdert-Kennedy (1967, subsequent analy-

sis) for synthetic speech stimuli; Studdert-Kennedy, & Shank-weiler (1970) for natural speech,

i> Estimated from Studdert-Kennedy, Shankweiler, & Schul-man (1970) for synthetic speech.

i Repp (1975a, 1975b, 197Sc) for synthetic speech.r Estimated from Cullen, Thompson, Hughes, Berlin, & Sam-

son (1974) and Speaks & Bissonette (1975) for natural speechstimuli.

8 Day (1968) for natural speech stimuli.'Day (1970) for synthetic speech; Day & Cutting (Note 5)

for natural speech.» Cutting (1975) for synthetic speech.

is concerned with the pressures that can beplaced on the perceptual system to inhibitfusion. All fusions are more or less subjectto these pressures.

In general, three variables have proven in-formative in previous investigations of thesefusions: relative onset time of the two stimuli,their relative intensity, and their relativefundamental frequency. Table 1 summarizesthe results of this previous research, whichused many different kinds of stimuli. Only inrare cases were the same stimuli employedto investigate more than one type of fusionor even to investigate more than one param-eter within a given fusion type. The following

overview of the six types of fusion incorpo-rates the material from both Figure 1 andTable 1.

Sound Localization: Fusion of Two"Identical" Events

Sound localization is included here as areference point to be used when consideringthe other forms of fusion. All audible sounds,simple or complex, can be localized—andusually are. Sound localization is the mostbasic form of fusion and occurs for bothspeech sounds and nonspeech sounds alike.Provided that microsecond accuracy is notcrucial, a convenient way to study soundlocalization in the laboratory is to use thesame apparatus needed for studying othertypes of fusion: a good set of earphones, adual-track tape recorder, and a two-channeltape with appropriate stimuli recorded on it.Although approximate sound localization canbe obtained using just one ear (Angell & Fite,1901; Mills, 1972; Perrott & Elfner, 1968),discussion in the present paper is limited tothe phenomenon obtained with both ears.

The three primary parameters that affectsound localization were mentioned previously:the relative timing of the events at each ear,the relative intensity of those events, and alsotheir relative frequency. First, consider rela-tive timing. If one presents a brief clicksimultaneously to each ear, the listener reportshearing one click localized at his or her mid-line. Delaying one click by as little as .1msec causes the apparent source of the per-cept to move away from the midline towardthe ear receiving the leading stimulus. Delay-ing that click by 1 msec causes the apparentsource to move to the extreme side of theauditory field away from the delayed click.With delays (onset time differences) of 2.5msec, the fused percept disintegrates and twoclicks can be heard (Woodworth, 1938)shooting across the auditory field, one afterthe other. Apparently, the effect of disinte-gration is postponed for longer and morecomplex stimuli such as speech syllables untilrelative phase differences (or onset time dif-ferences) are as great as 10 msec (Cherry& Taylor, 19S4) or more (Tobias, 1972).Thus, when two /da/s are presented to op-

Page 4: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 117

sound• localization

9 psychoacoustic^•fusion

o spectralJ- fusion

/, spectral/tem-^-poral fusion

R phonetic°- feature fusion

r phonologicalD- fusion

/da/ Fl /da/F2

/do/without F2 transitionF2 transition of/do/

= /da/

= /da/

= /do/

= /da/+chirp

= /da/ (or/pa/)

= /dra/

FIGURE 1. Schematic spectrograms of stimuli used in the six fusions.(Fl = Formant 1 j F2 - Formant 2.)

posite ears with as much as 10 msec sepa-rating their onsets, a single /da/ may still beheard. Experiment 2 is designed, in part, toconfirm this finding.

Intensity is a second parameter that affectssound localization. Interaural differences assmall as a few decibels or less easily affectthe perceived locus of a sound. The problemhere, however, is that unlike the potentialfused percept in other fusions, the fusedpercept in sound localization does not "dis-integrate." By making one stimulus less andless intense compared to a stimulus of con-stant intensity in the other ear, the locus ofthe percept migrates from the midline towardthe ear receiving the most intensive stimulus.Most importantly, the difference between per-cepts in some binaural and monaural condi-

tions is negligible, if detectable at all. I havefound that when Stimulus A is presented toone ear at a comfortable listening level andto the other ear at 20 db. lower, and alsowhen Stimulus A is presented to one ear ata comfortable listening level and no stimulusis presented to the other ear, one cannotconsistently hear the difference between thetwo trials.

To get around this problem, at least withregard to sine waves, one can look to the phe-nomenon of binaural beats: a "whooshing" or"roughness" percept, depending on the fre-quency difference between sine waves (to bediscussed below). This phenomenon is per-ceived through the cross-correlation of oppo-site-ear signals. Groen (1964) presented datasuggesting that with the intensity of the

Page 5: Auditory and Linguistic Processes in Speech Perception

118 JAMES E. CUTTING

signals differing by as much as 60 db., a"pulling" is still heard. Although Groen's pro-cedure is not clear, I have replicated thatresult using the stimuli of Kubovy, Cutting,and McGuire (1974). In the case of theirstimuli, a melody is heard through the cross-correlation and subsequent localization of themelodic elements as spatially distinct from abackground of complex noise. No melody canbe heard in a single stimulus because physi-cally it is not present (which distinguishesthe phenomenon from masking-level differ-ence). The percept is robust enough sothat if one stimulus is presented to one earat 100 db. (SPL) and the other stimulus tothe other ear at 35 db., a faint melody canstill be heard and identified. The fact that itcan be heard suggests that the sound localiza-tion process is still functional at interauralintensity differences as great as 65 db. Never-theless, intensity is not considered relevant tothe fused percept when /da/ is presented toboth ears, because the percept never ceasesto be /da/.

The third parameter is frequency. Sinewaves may differ in frequency by as muchas 25 Hz in certain frequency ranges, anda fused percept, the "roughness" of binauralbeats, can be maintained. The principle hereis that stimuli differing slightly in frequencycan be thought of as stimuli with identicalfrequencies but with constantly changingphase relations. Differences of 3 Hz or lesscan be heard as an oscillating stimulus whirl-ing around the head in an elliptical path(Oster, 1973). Greater frequency differencesare heard as roughness until the two tonesbreak apart. Outside the realm of binauralbeats, Perrott and Barry (1969) found thatdichotic tones of considerably greater fre-quency differences could be heard as a singlestimulus, especially above about 2000 Hz.However, when the signals are complex andperiodic, the pitch is typically much lower—for speech sounds, in particular, a fundamen-tal frequency of 100 Hz is not uncommon inmale speakers. It can be easily demonstratedthat a /da/ at 100 Hz presented to oneear and a /da/ of 102 Hz presented to theother ear are heard as two events. Experi-ment 4 is designed, in part, to confirmthis finding.

Psychoacoustic Fusion: Fusion of ProximalAcoustic Features by Perceptual Averaging

Unlike sound localization, about whichmany volumes have been written, little hasbeen written about psychoacoustic fusion. Inthis type of fusion, acoustic features fromopposite-ear stimuli appear to be averaged toyield an intermediate result. The phenomenoncould logically occur for many different kindsof sounds, both speech and nonspeech. Itsexistence is gleaned from my own experimen-tation with dichotic synthetic speech stimuliand from a large table of data presented byHalwes (1969, p. 61). Perhaps the best wayto demonstrate the phenomenon is to considerthe stimuli in Figure 1. If this particular/ba/ is presented to one ear and this par-ticular /ga/ to the other, the listener oftenreports hearing a single item, /da/. Noticethat these stimuli differ only in the directionand extent of the transition in the secondformant, the resonance of highest frequencyfor these items (Delattre, Liberman, & Cooper,1955). Note further that the second-formanttransitions are arrayed such that /da/ lies be-tween /ba/ and /ga/, and that an "average"of the two extreme items would fall veryclose to /da/.

To date, little is known about this type offusion, in part because it is less likely tooccur with the more complex and more widelydiffering natural speech syllables. Therefore,Experiment 1 is designed to demonstrate thepsychoacoustic nature of the phenomenon.Also in Experiment 1, the relation betweenthis fusion and a similar phenomenon knownas the "feature sharing effect" is considered(see Pisoni & McNabb, 1974; Repp, 1975a;Studdert-Kennedy & Shankweiler, 1970; Stud-dert-Kennedy et al., 1972). Experiments 2,3, and 4 are designed so that the effect ofthe three crucial variables on psychoacousticfusion can be observed.

Spectral Fusion: Fusion of Different SpectralParts of the Same Signal

Broadbent (1955) and Broadbent andLadefoged (1957) reported this third phe-nomenon, which I call spectral fusion. Itoccurs when different spectral ranges of thesame signal are presented to opposite ears.A given stimulus, for example, is filtered into

Page 6: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 119

two parts: one containing only the low fre-quencies and the other containing only thehigh frequencies. Each is presented separatelybut simultaneously to opposite ears. Thelistener invariably reports hearing the originalstimulus, as if it has undergone no specialtreatment. In his initial study, Broadbentfound that this fusion readily occurred forcomplex stimuli of many types—for non-speech sounds such as metronome ticks aswell as for speech sounds. Moreover, whenlisteners were informed about the nature ofthe stimuli and asked to report which earreceived the low frequency sounds andwhich ear the high frequency sounds, theyperformed at chance level.

Relative timing is an important parameterin spectral fusion. Broadbent (19SS) foundthat onset time differences of 250 msec weresufficient to disrupt the fused percept. Myown pilot research suggests that this intervalmay be at least an order of magnitude toolarge. For example, when the different spec-tral portions of metronome-like ticks are off-set by as little as S msec, the listener hearstwo sets of ticks, not one. As in sound locali-zation, the temporal differences tolerable inspectral fusion may be greater for more com-plex sounds such as speech items. Therefore,Experiment 2 is directed at finding the rela-tive onset times limiting fusion when thespeech syllable /da/ is spectrally split and itsfirst formant presented to one ear and itssecond formant to the other. For generality,the syllables /ba/ and /ga/ are also used.

The effect of relative intensity on spectralfusion has been explored systematically byRand (1974; see also Nye, Nearey, & Rand,Note 3; Nearey & Levitt, Note 4). The mostinteresting case occurs when the second andhigher formants, presented to one ear, aredecreased in amplitude with respect to thefirst formant, presented to the other ear.Rand found that decreases of as much as40 db. have little effect on identification of/ba, da, ga/. This large effect is particularlysurprising since attenuations in the upperformants of only 30 db. are sufficient torender the syllables unrecognizable when allformants are presented to both ears. Randtermed the phenomenon "dichotic release frommasking," and the release is clearly substan-

tial. However, the emphasis in the presentpaper is not on masking but on fusion, andsince Rand (1974) used stimuli very similarto those used in the present study and Nye,Nearey, and Rand (Note 3) have alreadyreplicated those results, intensity effects onspectral fusion are not explored further.

Fundamental frequency is also an importantparameter in spectral fusion. Broadbent andLadefoged (1957) found that the fundamen-tal frequencies of the to-be-fused stimuli mustbe identical for fusion to occur (that is, forone item to be heard). Differences of 25 Hzinhibited fusion, and Halwes (1969) sug-gested that differences of 2 Hz may inhibitfusion as well, but these investigators appearto have been dealing with two types of fusion:One concerns the number of items heard (oneor two), and the other concerns the identityof the stimulus. While the effect of differencesin pitch between the two component stimulion the identity of the fused percept is con-sidered in Experiment 4, the effect of funda-mental frequency on the number of itemsheard is considered in Experiment 5.

Spectral/Temporal Fusion: PerceptualConstruction of Phonemes from Speechand Nonspeech Stimuli

Rand (1974) discovered a fourth type offusion. In addition to dividing the stimulusspectrally and presenting those portions toeither ear, as noted previously, he dividedthe speech syllable both spectrally and tem-porally. Two-formant renditions of his stimuliare shown schematically in Figure 1. Onestimulus is simply the second-formant transi-tion excised from the syllable and presentedin isolation. Mattingly, Liberman, Syrdal, andHalwes (1971) noted that these brief glis-sandi sounded like the discrete elements ofbirdsong, and they dubbed them "chirps."The other stimulus is the remainder of thesyllable without the second-formant transi-tion. It should be noted that the transitionless/da/—that is, the speech sound without asecond-formant transition—is not identifiableas /da/ when presented in isolation; instead,it is almost 85% identifiable as /ba/. Thisappears to result from the upward spread ofharmonics of the first-formant transition,which seems to mimic the now-absent second-

Page 7: Auditory and Linguistic Processes in Speech Perception

120 JAMES E. CUTTING

formant transition in such a manner as tocue /b/.

Perhaps the most interesting aspect ofspectral/temporal fusion, and the aspect thatdistinguishes it from the logically similarspectral fusion, is that the listener hears morethan one auditory event. He does not heartwo speech sounds. Instead, he hears onespeech sound, /da/, and one nonspeech sound,a chirp. Note that the perceptual whole isgreater than the sum of the parts: Thelistener "hears" the second-formant transitionin two different forms at the same time. Oneform is in the complete syllable /da/, whichwould sound more like /ba/ without it. Thesecond form is similar to the transition heardin isolation—a nonspeech chirp. Thus, spec-tral/temporal fusion is more complex pheno-menologically than the three fusions previ-ously considered. It may be possible forspectral/temporal fusion to occur for non-speech sounds (perhaps a complex nonspeechsound could be segmented spectrally and tem-porally in the same manner, and analogouspercepts obtained). Nevertheless, it is dis-cussed in the present paper exclusively withrespect to speech.

Of the three relevant parameters—onsettime, intensity, and frequency—only intensityhas been explored thus far for spectral/tem-poral fusion. As in spectral fusion, Rand(1974) attenuated the isolated second-formanttransitions of /ba, da, ga/ by as much as 40db., and identification was largely unimpaired.The result was in marked contrast to thatproduced by the condition in which the syl-lable remained as an integral whole but withthe second-formant transition attenuated asbefore: 30 db. was then sufficient to impairidentification. As in spectral fusion, the inten-sity data are not replicated here, but theeffects of differences in relative onset timeand frequency are explored in Experiments 2and 4, respectively. Again, /da/ is used as areference syllable, but /ba/ and /ga/ are alsoused for the sake of generality.

Phonetic Feature Fusion: Recombination ojPhonetic Feature Values by PerceptualMisassignment

With this fifth type of fusion we move toa domain that belongs exclusively to speech.

Studdert-Kennedy and Shankweiler (1970),Halwes (1969), and Repp (197Sa) have re-ported that misassignment of phonetic featurevalues often occurs in the dichotic competi-tion of certain stop-vowel syllables. This"blending" can be thought of as phoneticfeature fusion. Consulting Figure 1, one findsthat when /ba/ is presented to one ear and/ta/ to the other, the listener often reportshearing a syllable not presented. The mostfrequent errors are the blends /da/ and /pa/,in which the listener combines the voicingfeature value of one stimulus with the placefeature value of the other. For example, thevoicing value of /b/ is combined with theplace value of /t/ and the result is thefusion /d/.

Consider a stimulus repertory of six items:/ba, da, ga, pa, ta, ka/. On a particular trialwhen /ba/ and /ta/ are presented to oppositeears and the subject is asked to report whathe or she hears, three types of responses canoccur: correct responses /ba/ or /ta/, blendresponses /da/ or /pa/, and anomalous re-sponses /ga/ or /ka/. The last two items areanomalous because, although they share thevoicing value with one item in the stimuluspair, neither shares place value. Using naturalspeech stimuli, Studdert-Kennedy and Shank-weiler (1970) found that the ratio of blends(phonetic feature fusions) to anomalous re-sponses was about 2:1, a rate significantlygreater than chance. For synthetic speechitems, Halwes (1969, p. 65) found them tooccur at a rate of 10:1 or better (p. 64).Studdert-Kennedy et al. (1972) found thesefusions to occur even when the vowels of thetwo stimuli differed markedly.

Evidence for the effect of relative onsettime on phonetic feature fusion is only in-direct. Studdert-Kennedy, Shankweiler, andSchulman (1970) found that errors in identi-fication occur more often when the relativeonset times of competing pairs of stimuli areslightly staggered than when they are simul-taneous. The effect decreases substantially forrelative onset times of greater than 70 msecor so. The maximum error rate occurs forasynchronies of about 40 msec. If we assumethat the ratio of blend responses to anomalousresponses is constant for different lead times,maximum phonetic feature fusions should

Page 8: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 121

occur at about 40 msec lead time but shouldfall off rather rapidly thereafter. Experiment 2is designed in part to confirm these predictions.

Evidence for the effect of relative intensityon phonetic feature fusion is equally indirect.The data of Cullen, Thompson, Hughes,Berlin, and Samson (1974) demonstrate thatdichotic items can compete with one another,that is, yield substantial error rates, whenthe two items differ by as much as 20 db. Ifwe assume that the ratio of blend responsesto anomalous responses is constant for differ-ent intensities, phonetic feature fusions shouldcontinue to occur rather readily until inten-sity differences between the two stimuli aregreater than 20 db., at which point errorslargely cease. Errors (and fusions) should begreatest when the two stimuli have the sameintensity. Experiment 3 is designed in partto confirm these predictions.

The effect of fundamental frequency differ-ences between the stimuli is better known forphonetic feature fusion. Halwes (1969) foundthese fusions to occur almost as frequentlywhen the two competing stimuli had differentfundamental frequencies as when they had thesame fundamental frequency. Experiment 4extends the frequency differences well beyondthe 14 Hz of Halwes to observe the effect onreported fusions.

Phonological Fusion: Perceptual Constructionof Phoneme Clusters

Phonological fusion occurs when two inputs,each of n phonemes, yield a response ofn+l phonemes. Day (1968) found thatcompatible phoneme strings, one beginningwith a stop consonant and the other with aliquid, could be fused into one unit: GivenPAHDUCT and EAHDUCT presented to oppositeears, the subject often hears PRODUCT. Oneof the unique aspects of phonological fu-sion not found in either psychoacoustic orphonetic feature fusion is that two stimuli,which are presented at the same time and con-tain different phonetic segments, fuse to forma new percept that is longer and linguisticallymore complex than either of them. Anotherunique aspect of this fusion is that the orderin which the phonemes fuse is phonologicallyruled: BANKET/LANKET yields BLANKET, notLBANKET. Note that in English, initial

stop + liquid clusters occur frequently, butinitial liquid + stop clusters never occur:/b, d, g, p, t, k/ can typically precede /I, r/,but the reverse is never true at the beginningof a syllable. When these phonological con-straints are lifted, fusion can occur in bothdirections: Thus, TASS/TACK can yield bothTACKS and TASK responses (Day, 1970). Otherlinguistic influences on phonological fusionare discussed by Cutting and Day (1975) andCutting (1975).

The effects of relative onset time have beenexplored by Day (1970), Day and Cutting(Note 5), and Cutting (1975). Their resultsshow that phonological fusion is remarkablytolerant of differences in onset time. When nolead times are greater than 150 msec, fusionoccurs readily at all leads. Even when muchlonger lead times are used, fusion remainsfrequent at all relative onsets of 150 msecand less. Factors such as whether the to-be-fused stimuli are natural or synthetic speech,whether the inputs are words or nonwords,and whether the stimuli are monosyllabic ordisyllabic appear to play a role. Experi-ment 2 explores the fusion of the syntheticspeech items /da/-/ra/ and /ba/-/la/ whenrelative onset times are varied. Cutting(1975) found that phonological fusion didnot decrease with intensity and frequencydifferences between fusible stimuli of as muchas 15 db. and 20 Hz, respectively. Experi-ments 3 and 4 extend the ranges of thesedifferences to explore possible effects on thefused percept.

Purpose of the Present ExperimentsThe purpose of the experiments that follow

is fivefold. First, Experiment 1 is designedto demonstrate that psychoacoustic fusion isa separate phenomenon resulting from theperceptual averaging of acoustic features.Second, Experiments 2 through 4 are designedto replicate the results found by previousstudies (Table 1), using, as much as possible,the same stimulus or percept for each (/da/)and the same group of listeners. Third, thosesame experiments are designed to fill in thedata gaps for psychoacoustic and spectral/temporal fusion and to confirm the estimateddata for phonetic feature fusion. Fourth, Ex-periment 5 and the discussion that follows itare directed at the interactions of different

Page 9: Auditory and Linguistic Processes in Speech Perception

122 JAMES E. CUTTING

fusions. And fifth, from the results of all thestudies, the different fusions are consideredwith respect to the types of mechanisms thatmust exist at different processing levels andto their relevance in speech perception.

EXPERIMENT 1: DEMONSTRATION OFPSYCHOACOUSTIC FUSION

Of the six fusions, least is known aboutpsychoacoustic fusion. Its existence is gleanedfrom a single table presented by Halwes(1969), and he does not discuss this particu-lar phenomenon. Although several types of dif-ferent synthetic syllable pairs can yield a singlepercept ambiguous between the two dichoticitems—/ba/-/ma/, /ra/-/la/, /pa/-/ba/, toname a few—it may be only the pair /ba/-/ga/ that will frequently yield a perceptdifferent from either of the two inputs, /da/.What causes such fusion responses? Twohypotheses appear tenable. First, and sup-porting the notion that this fusion is psycho-acoustic, the listener may hear /da/ simplybecause the acoustic average of the second-formant transitions for /ba/ and /ga/ hap-pens to fall in the middle of the /da/ range.A second, alternative view is that the per-ceptual averaging may be more abstract. Per-haps linguistic information is extracted fromthe dichotic syllables with respect to place ofarticulation (see Liberman, Cooper, Shank -weiler, & Studdert-Kennedy, 1967): /b/ islabial, /g/ is velar, and the articulatorymean between the two gestures is close to thealveolar /d/. These two hypotheses have dif-ferent predictions about what happens to thefused percept when acoustic variation takesplace within the /ba/ and /ga/ inputs. Thefirst hypothesis predicts that the percentageof /da/ responses will vary according to theacoustic structure of the inputs; the secondhypothesis, on the other hand, predicts nochange in the number of /da/ responses,since all inputs are good exemplars of /ba/and /ga/, and /da/ is always an articulatorymean between the two.

MethodFour stimuli were generated on the Haskins Lab-

oratories parallel-resonance synthesizer. Two were/ba/ and two were /ga/. All four were two-formant,300-msec stimuli with a constant fundamental fre-

quency of 100 Hz. First formanls were centered at740 Hz, and second formants at 1,620 Hz. First-formant transitions were 50 msec in duration, risingin frequency, and identical for all four items. Second-formant transitions were 70 msec in duration andvaried in slope and direction as shown in Figure 2.2

The two stimuli nearest the /da/ boundaries arecalled b1 and g1 (for /ba/ and /ga/, respectively),whereas the two stimuli farthest from the boundariesare called b2 and g2. Start frequencies for the second-formant transitions were 1,232 and 1,386 Hz for thetwo /ba/ stimuli, and 1,996 and 2,156 Hz for thetwo /ga/ stimuli. Boundaries and start frequenciesare based on the findings of Mattingly et al. (1971),who used very similar stimuli. Pretesting determinedthat all items were at least 90% identifiable as theappropriate /b/ or /g/. Items were digitized andstored on disk file for the preparation of dichotictapes (Cooper & Mattingly, 1969).

Four dichotic pairs were assembled: b'-g1, b2-g2,V-g2, and b2-g\ Each of these pairs was repeated10 times in a random sequence, with 3 sec betweenpairs and with channel assignments properly coun-terbalanced. Stimuli were reconverted into analogform at the time of recording the test tape. TenWesleyan University undergraduates participated inthe task as part of a course project; four others hadbeen eliminated because they did not identify thestimuli as desired. Each was a native AmericanEnglish speaker with no history of hearing difficulty,no experience at dichotic listening and limited experi-ence with synthetic speech. The tape was played ona Crown CX-822 tape recorder, and signals weresent through a set of attenuators to matched Tele-phonics headphones (Model TDH39). Stimuli werepresented at approximately 80 db. (SPL). Earphone-to-ear assignments were counterbalanced acrosslisteners. They wrote down B, D, or G to identifythe item that they heard, and they remained unin-formed that only two items were actually presented(note that no /da/ items were presented). Exceptwhen noted otherwise, the procedure and appa-ratus of Experiment 1 were used for all the otherexperiments.

Results and Discussion

The percentages of /b/, /d/, and /g/ re-sponses for the four types of dichotic pairsare shown in Table 2. The largest numberof /da/ fusions occurred for the b'-g1 pair,and the fewest for the b2-g2 pair; inter-mediate fusion scores occurred for the other

2 These transitions are longer than those typicallyfound in synthetic syllables, but preliminary testingsuggested that longer transitions facilitate psycho-acoustic fusion. For other effects with longer transi-tions see Tallal and Piercy (1974, 1975), whosedata support the notion that transition durationhas auditory consequences independent of phonemicconsequences.

Page 10: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 123

2000

0 70 300MSEC

second-formanttransition variation

0

FIGURE 2. Schematic spectrogram and representations of four stimuli, two /ba/s and two /da/s,used in Experiment 1 for psychoacoustic fusion.

two pairs. Of paramount interest is the factthat not only did /da/ fusions decrease forthe dichotic pair whose second-formant tran-sitions were farthest from the /d/ boundaries,but both /ba/ and /ga/ responses increasedas well. The difference between the numberof /da/,responses for b'-g1 and b2-g2 pairsproved significant by a Wilcoxon matched-pair signed-ranks test, T(B) - 1.5, p < .02(two-tailed): This demonstrates that the prox-imity of the second-formant transitions in theto-be-fused pair to the /da/ boundaries isimportant to the phenomenon. Thus, simpleaveraging of formant transitions is insufficientto explain consistent psychoacoustic fusions;it is the averaging of optimally similar tran-sitions that is crucial. This result suggests a

close relationship between rivalry and fusion,as noted earlier.3

The responses for the other two dichoticpairs were predictable from the first results.The b^g2 pair had an intermediate numberof /da/ fusions and a smaller number of /ba/responses. The b2-g1 pair also had an inter-mediate number of /da/ fusions but hadan increase in the /ba/ responses, largely atthe cost of /ga/. Subtracting the number of/ga/ responses from the number of /ba/ re-sponses for each subject, this shift pattern isstatistically robust, T(T) = 0, p < .02. It ap-pears that this fusion is psychoacousticrather than psycholinguistic, since it is quitesensitive to phonemically irrelevant acousticvariation in the stimuli.4

TABLE 2

PERCENTAGES OF /b/, /d/, AND /g/ RESPONSES FORFOUR PAIRINGS OF DICHOTIC STIMULI IN EXPERI-

MENT 1 (PSYCHOACOUSTIC FUSION)

Response

Pair

b'-g1

b2-g2

b'-g2

/b/

9155

21

Al/

56243247

/g/

35616332

3 Fusion and rivalry are clearly not exclusivealternatives but interact in a probabilistic fashion.In Experiment 1, as in all experiments presentedhere, I have tried to maximize the probability offusion in most conditions. The terms fusion andrivalry as used here are intended to parallel theiruse by Julesz (1971, p. 23). Whether there is asuppression of rivalry by fusion mechanisms inaudition like that proposed in vision (Julesz, 1971,p. 218-220; Kaufman, 1963, 1974) is not known, andis beyond the scope of the present investigation.

4 In a study made independently at the time thepresent investigation was being prepared for publica-tion, Repp (1975d) found essentially the same re-

Page 11: Auditory and Linguistic Processes in Speech Perception

124 JAMES E. CUTTING

A phenomenon similar lo psychoacousticfusion in certain respects is the "featuresharing effect" as formulated by Studdert-Kennedy et al. (1972) and explored para-metrically by Pisoni and McNabb (1974).One emphasis of Pisoni and McNabb, likethat of Rand (1974) for another type offusion, is on masking; here, of course, theemphasis is on fusion. Pisoni and McNabbfound that, given a target syllable such as/ba/ presented to one ear and a mask suchas /ga/ presented to the other ear, listenersmade few errors identifying the target regard-less of the interval between the onsets of thetarget and mask. They found that for /b/-/g/pairs it made no difference whether thetarget and mask shared the same vowel orhad different vowels, but they did note that"the more similar the vowels of the two syl-lables, the more likely they are to 'fuse' orintegrate into one perceptual unit so that thelistener had difficulty assigning the correctauditory features to the appropriate stimulus"(Pisoni & McNabb, 1974, p. 357). The fusionthey allude to is most likely the psycho-acoustic fusion reported here. Given a /ba/-/ga/ target-mask pair, however, they foundfew /da/ responses to occur even when theitems had simultaneous onsets. The reason forthis result most likely stems from subjectexpectations and the difference between thetwo procedures. For example, their listenersknew that targets would be presented to oneear and masks to the other, that two itemswould be presented on every trial, and thatthey were to report the identity of the firstitem; here, on the other hand, there were notargets or masks, listeners did not know thattwo items were presented on each trial, andthey were simply to report what they heard.In Experiment 1, relative onset time was notvaried as in the Pisoni and McNabb (1974)study; however, Experiment 2 varies onsettime in a similar fashion. Just as I have con-cluded that psychoacoustic fusion occurs priorto phonetic processing, Pisoni and McNabb

suits as those reported here, using /bae/-/gae/ di-chotic pairs. From the results of several experimentson psychoacoustic fusion, he reaches many of thesame (but some different) conclusions that I reach.

(1974) concluded that the feature-sharingeffect is also prephonetic.

The present experiment confirms the ex-istence of psychoacoustic fusion and alsostrongly suggests that its nature is not lin-guistic but rather a perceptual integration ofacoustic features. The major thrust of thispaper, however, stems from those experimentsthat follow—replications and explorations ofthe effects of varying relative onset time,relative intensity, and relative frequency onthe six fusions outlined previously.

Experiments 2-5: General Methodology

Overview

Sixteen different brief experiments wereconducted to study the six fusions. All dealtwith the syllable /da/, either as a stimulusor as a potential percept, as shown schemati-cally in Figure 1. In general, three experi-ments were directed at each type of fusion:In one, the relative onset time of the dichoticsitmuli was varied; in a second, the relativeintensity was varied; and in a third, the rela-tive fundamental frequency was varied. Forsimplicity's sake, rather than numbering eachseparate demonstration as an experiment, allthose dealing with relative onset time areconsidered as part of Experiment 2, thosedealing with relative intensity as Experi-ment 3, and those dealing with relative fre-quency as Experiment 4. Experiment 5 dealswith the interaction of different fusions.

The same 10 listeners that participated inExperiment 1 participated in these experi-ments. Because of the great number of sepa-rate studies, counterbalancing of test orderwas not attempted; instead, all subjectslistened first to the three tests pertaining tophonological fusion, and then to those pertain-ing to sound localization, psychoacoustic fu-sion, spectral fusion, spectral/temporal fusion,and phonetic feature fusion, respectively.

Stimuli

Six speech syllables were generated in sev-eral renditions using the Haskins Labora-tories' parallel-resonance synthesizer: /ba, da,ga, ta, la, ra/. The /ba/ and /ga/ stimuliwere the bl and g1 items used in Experi-ment 1. All stimuli were 300 msec in dura-

Page 12: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 125

tion and shared the same /a/ vowel usedpreviously. The stop-consonant stimuli (thosebeginning with /b/, /d/, /g/, and /t/)consisted of two formants, and the liquidstimuli (those beginning with /!/ and /r/)consisted of three formants. For the stopstimuli, first- and second-formant transitionswere SO and 70 msec in duration, respec-tively. Start frequency of the second-formanttransitions for /da/ and /ta/ was 1,695 Hz.All voiced stops (/b/, /d/, and /g/) had avoice-onset time value of 0 msec, while thevoiceless stop (/t/) was aspirated with avoice-onset time of +70 msec (see Lisker &Abramson, 1964). Liquid items began with50 msec of steady-state resonance in allformants, followed by 100 msec of transitionsin the second and third formants (only 20msec in the first formant), followed by thevowel resonances. An open-response pretestshowed that each item was identified cor-rectly on at least 86% of the trials.

The standard forms of all these items hada pitch of 100 Hz (like that of an adultmale) and an intensity of 80 db. (SPL).Nonstandard forms were generated with fre-quencies of 102, 120, and 180 Hz and withintensities ranging downward to 40 db. in 5db. steps. For spectral fusion and for spec-tral/temporal fusion, /ba/, /da/, and /ga/were also parsed into separate parts as shown inFigure 1: For spectral fusion, items were gen-erated as separate formants, and for spectral/temporal fusion, the second-formant transi-tion was isolated from the remainder of thesyllable. All stimuli were digitized and storedon computer disk file for the preparation ofdichotic tapes. Except as noted below, proce-dures were identical to those of Experiment 1.

EXPERIMENT 2: RELATIVE ONSETTIME IN Six FUSIONS

MethodSix different randomly ordered sequences of di-

chotic pairs were recorded, one for each type ofdichotic fusion. Relative onset times were chosenwith regard to the temporal range for which pre-testing had determined each fusion most sensitive.

Sound localization. Tokens of the standard formof the stimulus /da/ (100 Hz and 80 db.) wererecorded on both channels of audio tape. The itemscould have synchronous onsets (0 msec lead time)or asynchronous onsets. Ten asynchronous onsetswere selected: 1, 2, 3, 4, 5, 10, 20, 40, 80, and 160

msec lead times. A sequence of 24 items was re-corded: (10 asynchronous leads) X (2 lead-time con-figurations, Channel A leading Channel B and viceversa) + (4 simultaneous-onset pairs). Listeners weretold to write down the number of items they heard,one or two, paying no special regard to the identityof the stimuli.

Psychoacoustic fusion. Tokens of the standard/ba/ were recorded on one channel and tokens ofthe standard /ga/ on the other. Nine lead times wereselected: 0, 1, 2, 5, 10, 20, 40, 80, and 160 msec. Asequence of 36 dichotic pairs was recorded: (9 leads)X (2 lead-time configurations) X (2 channel assign-ments, /ba/ to Channel A and /ga/ to Channel Band vice versa). Listeners were instructed to writedown the initial consonant of the syllable that theyheard most clearly, choosing from among the voicedstops B, D, or G. Note that no /da/ stimuli wereactually presented.

Spectral fusion. The first formants of the standarditems /ba/, /da/, and /ga/ were recorded on onechannel and the second formants on the other. Sixlead times were selected: 0, 10, 20, 40, 80, and 160msec. A sequence of 72 items were recorded: (3stimulus pairs, first and second formants for /ba/,/da/, and /ga/) X (6 lead times) X (2 lead-time con-figurations) X (2 channel assignments). Listenerswrote down the initial consonant that they heardmost clearly: B, D, or G.

Spectral/temporal fusion. Each of the 70 msecsecond-formant transitions was excised from thethree standard syllables /ba/, /da/, and /ga/ andrecorded on one channel, and the remainder of thesyllables were recorded on the other. A sequence of72 items was recorded following the same format asthe spectral fusion sequence. Again, listeners wrotedown B, D, or G.

Phonetic feature fusion. The standard form of/ba/ was recorded on one channel, and the standardform of /ta/ on the other. Seven lead times wereselected: 0, 5, 10, 20, 40, 80, and 160 msec. A se-quence of 84 items was recorded: (7 leads) X (2lead-time configurations) X (2 channel assignments)X (3 observations per pair). Listeners wrote downthe initial consonant of the syllable that they heardmost clearly, choosing from among B, D, P, and T.Note that no /da/ or /pa/ stimuli were actuallypresented.

Phonological fusion. Two types of dichotic pairswere recorded on opposite channels: /ba/ and /la/,and /da/ and /ra/, all of standard form. Five leadswere selected: 0, 20, 40, 80, and 160 msec. A se-quence of 40 items were recorded: (2 fusible dichoticpairs) X (S leads) X (2 lead-time configurations) X(2 channel assignments). Listeners were instructedto write down whatever they heard, following Day(1968).

Results and Discussion

Relative onset time is a crucial variable forall six fusions, as shown in Figure 3. In gen-eral, the greater the interval between onsets

Page 13: Auditory and Linguistic Processes in Speech Perception

126 JAMES E. CUTTING

» • I,SOUND LOCALIZATION0""»"O2.PSYCHOACOUSTIC FUSION

-*3.SPECTRAL FUSION"••0 4. SPECTRAL / TEMPORAL

FUSIONlOOr

• t5.PHONETIC FEATURE FUSION0 06. PHONOLOGICAL FUSION

o-'

msec0 5 10 20 40 80 160

msec

\

0 5 10 20 40 80 160msec

RELATIVE ONSET TIME DIFFERENCEFIGURE 3. The effect of relative onset time in six fusions (data from Experiment 2).

of members of the dichotic pair, the less fre-quently fusion occurs—or, inversely, thegreater the probability of the perceptual dis-integration of the fused percept. All resultsare discussed with regard to that relative on-set time at which fusions first occur signifi-cantly less frequently than for simultaneousonset pairs, as measured by a Wilcoxonmatched-pair signed-ranks test (p < .05).

For sound localization, the fusions, or one-item responses, decreased precipitously withrelative onset times as small as 4 and 5 msec.(In the left-hand panel of Figure 3, onsetdifferences of 2 and 3 msec are combined, aswell as those of 4 and 5 msec, because therewas little difference between them.) This esti-mate is slightly smaller than that given byCherry and Taylor (19S4), but it should benoted that some subjects continued to reporthearing only one item for onset asynchroniesas great as 20 msec. No one-item responseswere given for relative onset times of 40 msecor greater. In the same panel, the data forpsychacoustic fusion present a different pat-tern. Fusions—that is, D responses, given a/ba/-/ga/ dichotic pair—occurred readily forall trials with relative onset times of up to20 msec but dropped considerably on trialswith greater asynchronies. There was no sig-nificant effect of the order in which the twostimuli arrived: /ba/-leading-/ga/ and /ga/-leading-/ba/ trials yielded equally frequent/da/ responses.

The pattern of fused responses for spectralfusion and spectral/temporal fusion (B, D,

or G for the components of /ba, da, ga/,respectively) were markedly parallel. Theyare shown in the central panel of Figure 3.Significant decreases in the probabilities offusion responses occurred by 40 msec forspectral fusion and by 20 msec for spectral/temporal fusion. In both, the results presentedare those summed over /ba/, /da/, and /ga/items. Combined, these functions asymptoteat slightly greater than 33% performance.Listeners chose from among three responses,one of which had to be correct (unlikepsychoacoustic fusion where there were no/da/ items presented). Moreover, the firstformant alone sounded somewhat like /ba/itself: Fully 85% of all responses for bothfusions at 160-msec asynchronies were /ba/.This fact contributes to the relatively highasymptote of the combined functions. Therewas no significant effect of the order of arrivalof the dichotic items for either fusion, but Band D fusions were more frequent than Gfusions. This result is considered in thegeneral discussion.

Phonetic feature fusion and phonologicalfusion presented patterns of fused responsesnot seen for the other types of fusion. In both,fusions were slightly more frequent at briefrelative onset times than at simultaneousonsets or at longer relative onsets. For pho-netic feature fusion this effect was significant:The 20-msec asynchrony yielded more fusedresponses than asynchronies of 0, 80, and160 msec. For phonological fusion, however,the increase was not significant. In phonetic

Page 14: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 127

feature fusion, fused responses were equallyfrequent regardless of whether /ba/ beganbefore /ta/, or vice versa. For all aynchro-nies, /pa/ fusions were more frequent than/da/ fusions: The overall ratio was 4:1. Inphonological fusion, the nonlinear pattern offusion responses is slightly complicated bythe fact that when, for example, /da/ beginsbefore /ra/, fusions are more frequent thanwhen /ra/ begins before /da/. Whereas theeffect was not significant here, the trend issimilar to that found by Cutting (1975, Ex-periment 3; but see Day, 1970; Day &Cutting, Note 5). Fused responses were morefrequent for /ba/-/la/ pairs than for /da/-/ra/ pairs, a finding which replicates the stop+ /!/ and stop + /r/ findings of Day (1968)and Cutting (1975). In addition, /da/-/ra/pairs yielded many anomalous fusions; thatis, in addition to the cases where listenerswrote down DEA, they also wrote occasionalDLA, BLA, and GLA responses. See Day (1968),Cutting (1975), and Cutting and Day (1975)for more specific accounts of this effect.

In general, the results for the six types offusion fall into three groups when relativeonset time is varied. Group 1 consists ofsound localization alone, in which the fusedpercept disintegrates after asynchronies of only5 msec. Group 2 consists of psychoacousticfusion, spectral fusion, and spectral/temporalfusion. When corrections are made for differ-ences in lower asymptote, these three func-tions are much the same: After asynchroniesof 20 and 40 msec, the frequency of fusionstapers off rapidly. Phonetic feature fusion andphonological fusion form Group 3, in whichfusion increases with lead times of 20 to 40msec before decreasing with longer lead times.

Consider now the stimulus "units" that ap-pear to be fused in each of these three groups.Sound localization, the only member ofGroup 1, occurs through cross-correlation ofopposite-ear waveforms. In the present studythe two signals were speech sounds carried ona glottal source of 100 Hz. Each glottal pulsewas thus 10 msec in duration and served asa convenient acoustic "unit" to anchor thecross-correlation process. Microstructure dif-ferences between contiguous glottal pulseswere slight but may have served as an aid inthe localization process (Leakey, Sayers, &

Cherry, 1958). Onset asynchronies less than5 msec are apparently surmountable in soundlocalization, whereas larger differences gen-erally are not. It may be that the binauralhearing mechanisms can integrate glottalpulses conveying identical information if theyarrive within 5 msec of one another, in part,because each pulse overlaps by at least half aglottal wavelength. With onset times greaterthan 5 msec, the opposite-ear pulses thatarrive most nearly at the same time are notidentical, and microstructure differences mayimpede localization. From this account onewould predict that a long, continuous steady-state vowel /a/ with identical microstructureswithin each glottal pulse would always belocalizable as a single item regardless oftiming differences. Indeed, Broadbent (1955)and Sayers and Cherry (1957) performeddemonstrations similar to this.

Group 2 has three members: psycho-acoustic fusion, spectral fusion, and spectral/temporal fusion. In psychoacoustic fusion, thesecond-formant transitions of the opposite-earstimuli appear to be the "unit" of fusion.Each is 70 msec in duration, and onset timedifferences of 20 to 40 msec are needed beforethe fused percept distintegrates. Again, thetolerable onset time asynchrony appears to beabout "half the fused unit," although for thissecond-level type of fusion that unit is severaltimes larger. In psychoacoustic fusion thereis competition of opposite-ear information. Ifthis information is not meshed temporally inthe right fashion, rivalry will occur andbackward masking is the typical result (seeMassaro, 1972, 1974). In spectral fusion andin spectral/temporal fusion, on the otherhand, there is no competing information. Inboth, the "units" are the second-formanttransition properly aligned with the first-formant transition to yield the percepts /b/,/d/, or /g/. Again, transition durations arethe same and, allowing for differences inasymptote, disintegration of the percept ap-pears to occur at about the same point. Here,information about place of articulation fromone ear (either as part of the second-formantresonance or as an isolated chirp) is com-bined with the information about manner ofproduction in the first-formant transition. Inall three fusions, however, the actual fusion

Page 15: Auditory and Linguistic Processes in Speech Perception

128 JAMES E. CUTTING

appears to be only incidentally linguistic.Transitions of formants are compared andmerged, and subsequent analysis of the fusedinformation reveals it to be linguisticallylabelable as /b/, /d/, or /g/.

Group 3 consists of phonetic feature fusionand phonological fusion. In both, the "units"are linguistic, but in the first, the units arephonetic features and in the second, they arethe phonemes themselves. The increase infusion at short lead times followed by a de-crease at longer lead times separates thesefusions from the others. Phonemes and theircomposite phonetic features can have acousticmanifestations of 20 to ISO msec, dependingon the phoneme and on the feature. Thus, itseems to make less sense to talk in terms of"half a unit's duration" as the threshold forthe disintegration of the fused percept. In-stead, it makes more sense to speak in termsof the time course of processing those fea-tures. In an inverted form, the functions inthe right-hand panel of Figure 3 look likeJ-shaped backward masking functions foundin vision (Turvey, Note 6). As noted previ-ously, disruption by masking can be thoughtof as a reciprocal process to fusion. Processingof linguistic features appears to be disruptedmost readily after initial processing of thefirst-arriving item has taken place; earlier orlater arrival of the second item decreases thechance of such interference. The disruptionprocess allows for the possible misassignmentof the feature values in the case of phoneticfeature fusion; in phonological fusion, thedisruption allows for the possible combina-tion—and, in the case of /ra/-leading-/da/items, the misassignment of temporal order—of the phonemes themselves.

These three patterns of results, one for eachgroup, suggest at least three types of analysisrelevant to speech perception. Each has itsown temporal limit within which fusionoccurs, and this limit can be thought of asanalogous to different types of perceptual"moments" (see Allport, 1968; Efron, 1970).The smallest type of moment lasts up toabout 2 to 5 msec, within which time thewaveforms of stimuli presented to oppositeears can be meshed (localized). To exceedthis limit is to exceed the resolving capacity ofthe mechanisms involved. An intermediate-

sized moment lasts up to perhaps 40 msecand allows the acoustic features of opposingstimuli to merge. Again, to go beyond thisrange is, generally, to go beyond the system'sability to process (fuse) the discrepancy instimulus information. A third moment lasts20 to 80 msec, a time limit which providesmaximal opportunity for misassigning certainlinguistic feature values of competing inputs.

It seems likely that these three types ofmoments reflect processes that occur concur-rently and become relevant to the perceptaccording to variations in the stimuli and in thedemands placed on the listener in a particulartask. It also seems likely that the differentsizes of the moments reflect the level at whichfusion occurs: The smaller the interval, thelower in the system that the fusion occursand, conversely, the larger the interval, thehigher in the system the fusion occurs. Thisgeneral scheme is an extension of that sug-gested by Turvey (1973), who found thatperipheral processes in visual masking oc-curred over smaller time domains than centralprocesses. Translating the terms peripheraland central to auditory studies of fusioncannot be straightforward, since there is con-siderably more pathway interaction betweenthe two ears than between the two eyes beforeevents reach a cortical level. Nevertheless,after substituting the more conservative termslower level and higher level for peripheral andcentral, the extended analogy is worth pur-suing. For example, Turvey (1973) foundthat in visual masking, peripheral processingwas characterized by stimulus integration,and central processing, by stimulus disrup-tion. Auditory fusions of Group 1 (soundlocalization) and Group 2 (psychoacoustic,spectral, and spectral/temporal fusion) areintegrations of opposite-ear stimuli, whereasthose of Group 3 (phonetic feature fusionand phonological fusion) appear to occur be-cause of an interruption of speech perceptionin midprocess and a subsequent misassign-ment of features.6

6 Turvey (1973) noted that, in vision, there appearto be two types of integrative processes, one inte-grating energies and the other integrating features.Level 1 auditory fusions would appear to be of thefirst variety, and Level 2 fusions, of the second.

Page 16: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 129

PERCENTAGES Of "FUSION" RESPONSES AS A FUNCTION OF PRESENTATION MODE WHEN ITEMS HAVESIMULTANEOUS ONSETS AND SHARE THE SAME FREQUENCY AND INTENSITY

Fusion type and dichotic pair Dichotic" Binauralb

Sound localization/da/ + /da/

Psychoacoustic fusion/ba/ + /ga/

Spectral fusion/da/ Formant 1 + /da/ Formant 2

Spectral/temporal fusion/da/ without second-formant transition+ second-formant transition of /da/

Phonetic feature fusion/ba/ + /ta/

Phonological fusion/da/ + /ra/

100 /da/"

68 /da/

85 /da/

81 /da/

43 /da/

73 /dra/'

100 /da/0

81 /da/d

100 /da/°

100 /da/°

20 /da/°

15 /dra/''«

* Dichotic presentation occurs when Stimulus A is presented to one ear and Stimulus B to the other (regardless of whether or notthe two items are identical). Data in this column are means found in Experiments 2 through 4.b Binaural presentation occurs, for the purposes of the present paper, when stimuli are mixed and both are presented to both ears.

c These items are physically identical.ll Determined by pretesting with listeners not participating in present study.e Computed from Halwes (1969, p. 75), who used synthetic speech stimuli very similar to those used in the present studies.1 The response /dra/ here represents all fusion responses for the stimulus pair /da/ -f /ra/, including those in which a different

stop consonant or a different liquid was reported. See Day (1968), Cutting & Day (1975), and Cutting (1975) lor further details.* Cutting (1975, Experiment 2), using synthetic speech stimuli similar to those used here.

An Additional Consideration:Presentation Mode

If lower level fusions are those character-ized by perceptual integration, and higherlevel fusions, by perceptual disruption, pre-sentation mode ought to have a crucial effecton the probability of fused responses andshould provide a test of the distinction. Twomodes of presentation are of interest: dichoticpresentation, the mode used in all experi-ments in this paper, and a type of binauralpresentation. I use the term dichotic andbinaural slightly differently than do Wood-worth (1938, p. 526) and Licklider (1951,p. 1026). They view dichotic listening as aspecial form of binaural listening. Dichoticpresentation, as the term is used here, occurswhen Stimulus A is presented to one ear andStimulus B to the other ear, regardless ofwhether or not the two items are identical.Binaural presentation, on the other hand, isdenned here as occurring when Stimuli Aand B are combined and both items are pre-sented to both ears. Table 3 summarizes thepercentages of fusion responses for the sixfusions as a function of presentation mode.The dichotic scores are means from thepresent set of experiments, whereas the bin-aural scores stem from logical considerationsand from data collected elsewhere.

Fusions of Group 1 (sound localization)and Group 2 (psychoacoustic, spectral, andspectral/temporal fusion) reveal a patterndistinctively different from those of Group 3(phonetic feature fusion and phonological fu-sion.) For the first four fusions, the number of/da/ responses under binaural conditions isslightly greater than or equal to the numberof dichotic fusions. For the other two, how-ever, binaural fusions are considerably lessfrequent than dichotic fusions. The first fourcould logically occur at a neural level priorto the cortex, or at least prior to linguisticanalysis within the cortex. To a great degree,presentation mode is irrelevant here, andacoustic combination may be similar to neuralcombination (integration)—a straightforward,primarily additive process. The other two fu-sions, on the other hand, must occur subse-quent to linguistic (and cortical) analysis,since mere mixing of the signals inhibitsrather than aids in obtaining the desired per-cept. This occurs, presumably, because mixingthe signals degrades their separate integrities,and the stimuli mask each other effectivelybefore they ever arrive at some central locusfor linguistic analysis. Disruption, then, neverhas a chance to occur because integration hasalready occurred. In summary, the first fourfusions appear to be only incidentally lin-

Page 17: Auditory and Linguistic Processes in Speech Perception

130 JAMES E. CUTTING

guistic, whereas the last two are necessarilylinguistic.

In addition to providing a framework forthe data discussed above, the upper level/lower level scheme, as adapted from Turvey(1973) would predict effects of stimulusenergy on fusions at these different levels.Turvey found that stimulus energy affectedvisual masking at a peripheral level but thatit had essentially no effect centrally. Analo-gously, in auditory fusion, one would predictthat stimulus energy (intensity) would havea relatively large effect on lower level fusionsand a smaller effect on higher level fusions.Experiment 3 was designed to test thesepredictions.

EXPERIMENT 3: RELATIVE INTENSITYIN Six FUSIONS

MethodThree different randomly ordered sequences of

dichotic pairs were recorded in a fashion similar tothat of Experiment 2, one each for psychoacousticfusion, phonetic feature fusion, and phonologicalfusion. No sequences were prepared for the others,since data on these fusions are either irrelevant(sound localization) or readily obtainable elsewhere(spectral fusion and spectral/temporal fusion). Ninerelative intensities were used in all three sequences:One stimulus was always the standard 80-db. (SPL)item, and the other decreased in intensity by 0,5, 10, 15, 20, 25, 30, 35, or 40 db. Sequences forpsychoacoustic and phonetic feature fusions con-sisted of 36 dichotic pairs: (9 relative instensities)X (2 intensity configurations, an 80-db. stimulus onChannel A or on Channel B) X (2 channel assign-

ments). Again, /ba/ and /ga/ were used for psycho-acoustic fusion and /ba/ and /ta/ for phonetic fea-ture fusion. The sequence for phonological fusionwas exactly twice as long, allowing for the twofusible pairs /ba/-/W and /da/-/ra/. Listenersfollowed the same instructions for each fusion as inExperiment 2.

Results and Discussion

Patterns of results for the three types offusion in question are shown in Figure 4,along with results for the other fusionsadapted from other sources. Again, resultsare discussed in terms of the relative intensitylevels at which significant decreases in fusionfirst occur, using the same criterion as inExperiment 2.

For psychoacoustic fusion, a significantdecrease occurred with a drop of only 10 db.in either stimulus, /ba/ or /ga/ (but seeRepp, 1975d). The number of /da/ fusionstapered off at a decreased rate thereafter. Forboth phonetic feature fusion and phonologicalfusion, on the other hand, significant de-creases first occurred at 30 db. In all threetypes of fusion, there was no significant effectof which stimulus in the fusible pair was themost intense. The data plotted for soundlocalization are hypothetical, but since thefused percept never disintegrates and sincebinaural interactions can occur over intensitydifferences greater than 40 db., a straight lineat 100% is drawn. The data for spectralfusion and for "spectral/temporal" fusion are

» • 1.SOUND LOCALIZATIONO O2.PSYCHOACOUSTIC FUSION

100

2° 7SCO

Lu

1- 5°ZLUOCC 25LJQ.

0

q\

' \''''•o 0 o,,,̂

"""Cl.,,"0,,

"•o

0 10 20 30 40

db.

• « 3. SPECTRAL FUSIONO O 4. SPECTRAL/TEMPORAL

FUSION"O,,

» • 5.PHONETIC FEATURE FUSION0 06. PHONOLOGICAL FUSION

10 20 30 40

db.10 20 30 40

db.

RELATIVE INTENSITY DIFFERENCE

FIGURE 4. The effect of relative intensity in six fusions (data from Experiment 3 for Fusions 2, 5,and 6; from Rand (1974) for Fusions 3 and 4; and from logical considerations for Fusion 1).

Page 18: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 131

adapted from Rand (1974), who used stimulivirtually identical to those in the presentstudies (except his consisted of three formantsrather than two).

For a minor replication of the results ofExperiment 2, compare certain aspects ofFigures 3 and 4. Notice that the first datapoint of each function in Figure 3 (0-mseclead) represents exactly the same pairs asthose in Figure 4 (0 db.). Note furtherthat the probability of fused responses foreach fusion at these points is comparable ineach study.

Before pursuing the scheme of higher andlower level fusions, it is necessary first to re-consider the distinction between those fusionswhose stimuli "compete" with one anotherfor the same processor and those whose stim-uli do not. Competition is here defined as asituation conducive to substantive informa-tion loss or alteration. Consider the fusionswithout competition first: sound localization(Group 1), spectral fusion, and spectral/temporal fusion (both from Group 2). Thereis no information loss or alteration in soundlocalization because the two inputs are iden-tical, and the percept can change only in itsperceived locus, not in its identity. There isno information loss in either spectral fu-sion or spectral/temporal fusion because theacoustic information of opposite-ear stimuliis simply restructured back into the originalform from component parts in a straight-forward manner.

In the other three fusions, however, thereis competition. In psychoacoustic fusion (theonly other member of Group 2) , the second-formant transitions are in the same generalspace-time region, and as demonstrated inExperiment 1, they appear to contribute bestto the fused percept when they are closesttogether, which apparently enables them tobe perceived as a single-formant transition.Information is lost, in that /ba/ and /ga/ areno longer heard, and information is altered,because both items contribute to the percept/da/. In phonetic feature fusion (Group 3),the stimuli compete for the same limited-channel-capacity linguistic processor in theleft hemisphere (see Studdert-Kennedy et al.,1970, 1972). Of two values of the voicing

feature and two values of a place-of-articula-tion feature, only one value of each cantypically be processed, while the other is oftenlost. If they did not originally belong to thesame stimulus, a fusion has occurred. Finally,in phonological fusion (Group 3), the stimulimay be processed in opposite hemispheres(Cutting, 1973) with no information loss, butthere is considerable information alteration,since the two inputs are combined in themost phonologically reasonable fashion toform a single phoneme string. A more detailedcomparison of mechanisms thought to underliethe six fusions is given in the concludingdiscussion.

Setting aside those fusions in which thereis no competition, and hence, essentially noeffect of intensity with attenuations of asmuch as 40 db., one finds that the results ofthe other three fusions support the higherlevel/lower level distinction discussed earlier.Psychoacoustic fusion, a lower level processcharacterized by perceptual integration, wasquite sensitive to relatively smaller attenua-tions of intensity. On the other hand, pho-netic feature fusion and phonological fusion,higher level processes characterized by per-ceptual disruption, were relatively insensitiveto intensity variation.

The results of Experiments 2 and 3 and theadditional consideration of presentation modeprovide clear evidence for the distinctionbetween auditory processes of Group 2(psychoacoustic, spectral, and spectral/tem-poral fusion) and linguistic processes ofGroup 3 (phonetic feature fusion and phono-logical fusion). This distinction is similar tothat made by Studdert-Kennedy et al. (1971),Pisoni (1973), and Wood (1975), amongmany others, using very different paradigms.However, evidence thus far for the distinctionbetween Group 1 (sound localization) andGroup 2 is less impressive (seen only in theleft and center panels of Figure 3). Experi-ments 4 and 5 are directed at supportingthis distinction.

EXPERIMENT 4: RELATIVE FUNDAMENTALFREQUENCY IN Six FUSIONS

MethodSix different randomly ordered sequences of di-

chotic pairs were recorded, one for each type of

Page 19: Auditory and Linguistic Processes in Speech Perception

132 JAMES E. CUTTING

fusion. Four fundamentals were selected: the stan-dard frequency of 100 Hz, and three others—102,120, and 180 Hz. Pairs always consisted of onestimulus at 100 Hz and another stimulus at any ofthe four possible fundamental frequencies, yieldingrelative frequency differences of 0, 2, 20, and 80 Hz.For the sound localization sequence, /da/ was re-corded on both channels in a 16-pair sequence: (4relative fundamentals) X (2 frequency configurations,the 100-Hz item on Channel A or on Channel B)X (2 observations per pair). For the psychoacousticfusion sequence, /ba/-/ga/ pairs were recorded in a32-pair sequence: (4 relative frequencies) X (2 fre-quency configurations) X (2 channel assignments) X(2 observations per pair).

For both spectral fusion and spectral/temporalfusion, 24-item sequences were recorded. In spectralfusion, the first formant was always held constantat 100 Hz, and in spectral/temporal fusion, the firstformant and the steady-state segment of the secondformant were also always at 100 Hz: Frequency vari-ation always took place in the second formant or sec-ond-formant transition. The dichotic pairs could yield/ba/, /da/, or /ga/ responses. The 24-item sequenceconsisted of (3 stimulus pairs, those for /ba/, /da/,and /ga/) X (4 relative frequencies) X (2 frequencyconfigurations). Channel assignments were random-ized across pairs.

For phonetic feature fusion, /ba/-/ta/ pairs wererecorded in a 32-pair sequence following the sameformat as the psychoacoustic fusion sequence; forphonological fusion, a similar 32-pair sequence wasrecorded for /ba/-/la/ and /da/-/ra/ pairs. Again,listeners followed the same instructions for eachfusion as in Experiment 2.

Results and Preliminary Discussion

As shown in Figure 5, frequency differencesaffected only one type of fusion: sound locali-zation. Fusions, the number of one-item re-

sponses, plummeted from nearly 100% foridentically pitched pairs to nearly 0% forpairs with only 2-Hz differences betweenmembers. Frequency differences had no sig-nificant effect on any of the other five typesof fusion. Note again that the fusion prob-abilities for pairs with 0-Hz differences aresimilar to the standard pairs in Experiments2 and 3.

The results suggest a clear distinction be-tween the fusions of Groups 2 and 3 and thesound localizations of Group 1. A problemarises, however, when one considers that thesegroups correlate perfectly with the type ofresponse required of the listener. In soundlocalization, the listener reports whether he orshe heard one item or two; in all other fu-sions the listener identifies the item heard.It may be that when frequency varies in theseother fusions, the listener can easily reportwhether one or two items were actually pre-sented but, since a linguistic response is re-quired, may ignore the cues of numerosity.Experiment S investigates this possibility.

EXPERIMENT 5: How MANY ITEMSARE HEARD?

It is clear that sound localization is a verydifferent kind of fusion than the other five,both phenomenologically and in terms of theresults of Experiments 2 and 4. In soundlocalization, the items presented to oppositeears are either integrated into a single perceptor they are not, and the identity of the inputs

» • I SOUND LOCALIZATIONO 02. PSYCHOACOUSTIC FUSION

FUSIONO O 4. SPECTRAL / TEMPORAL

FUSION

• « 5.PHONETIC FEATURE FUSION0 06. PHONOLOGICAL FUSION

lOOr

^ 75

I- 50 •

O

-0...,O

0 2 2 0 8 0 0 2Hz Hz

Q,,,... "' ""••O

8 0 0 " 2 20 8 0Hz

RELATIVE FREQUENCY DIFFERENCEFIGURE 5. The effect of relative fundamental frequency in six fusions (data from Experiment 4).

Page 20: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 133

matters not at all. In all the other fusions,by contrast, it may be possible for the listenerto be aware that more than one item is pre-sented on a given trial but find that thefusion response is the best label for what hehears. Differences of only 2 Hz convince thelistener of the presence of two different itemsin sound localization. Is this true of otherfusions as well? In other words, is it neces-sary that the items fuse into a single acousticpercept for them to be labeled and judged asa single linguistic percept?

MethodOne pair of stimuli was chosen to represent each

of the six fusions. These pairs are shown schemat-ically in Figure 1. Pairs were either of the standardform (both items at 100 Hz), or they differed by2 Hz (one item at 100 Hz and the other at 102 Hz).The standard (80 db.) intensities were used. A se-quence of 48 simultaneous-onset pairs was recorded:(6 pairs, one for each fusion) X (2 relative frequen-cies, 0- or 2-Hz difference) X (2 channel arrange-ments) X (2 observations per pair). Twenty listeners,10 from the previous experiments and 10 othersselected according to the same criteria, wrote down1 or 2, indicating the number of items that theyheard on each trial. No practice was given.

Results and Discussion

The results for all six types of fusion areshown in Table 4. For four types of fusions

TABLE 4PERCENTAGES OF ONE-ITEM RESPONSES GIVEN

DICHOTIC PAIRS IN EXPERIMENT 5TO

Stimulus condition

Fusion type and dichotic pair

Sound localization/da/ + /da/

Psychoacoustic fusion/ba/ + /ga/

Spectral fusion/da/ Formant 1 + /da/Formant 2

Spectral/temporal fusion/da/ without second-for-mant transition + second-formant transition of /da/

Phonetic feature fusion/ba/ + /ta/

Phonological fusion/da/ + /ra/

Both itemsat 100 Hz

99

78

60

15

22>>

46

One item at100 Hz,other at102 Hz

9.

10"

12

8

« Across the two stimulus conditions, differences are signifi-cant, p < .01, by Wilcoxon matched-pairs signed-ranks test.' Halwes (1969),-using similar stimuli, found that listeners

reported hearing only one sound when both items shared thesame pitch. The difference between his results and those of thepresent study may be attributable to a contrast effect here in-duced by mixed presentation of fusible pairs. Halwes blockedpairs of same and different pitches.

the number of one-item responses droppedsignificantly when fundamental frequencyvaried: sound localization (90% decrease),psychoacoustic fusion (68%), spectral fusion(58%), and phonological fusion (41%). Inthe other two types of fusion the decreaseswere considerably smaller: only 3% for spec-tral/temporal fusion and 14% for phoneticfeature fusion.

Figure 5 and Table 4 demonstrate conclu-sively that sound localization is different fromthe other fusions. In the other five fusionsthe number of items perceived plays no rolein the linguistic identity of the fused per-cepts: In the present experiment, standardpairs may be perceived (a) as a singleitem most of the time (as in psychoacousticfusion); (b) as a single item about halfof the time (as in spectral fusion andphonological fusion); or (c) nearly always astwo items (as in spectral/temporal fusion orphonetic feature fusion). The 2-Hz difference(nonstandard) pairs fuse as readily as thestandard pairs, yet all nonstandard pairs areperceived as two-item presentations.

In view of the initial formulation of sixdifferent fusions in dichotic listening, the mostimportant result here is that spectral fusionand spectral/temporal fusion differ signifi-cantly in the number of items perceived whenthe to-be-fused stimuli have the same pitch,T(1S) =0, p < .01. Respective one-item re-sponse frequencies were 60% versus 15%.A review of Figures 3, 4, and 5 shows noimpressive difference between the two fusions:Both are moderately insensitive to relativeonset time differences and very insensitive torelative intensity and relative frequency dif-ferences. In Table 3, however, one finds sup-port for their separation in the evidence thatthe fusions are phenomenologically differentfor the listener.

GENERAL DISCUSSION: Six FUSIONS INSPEECH PERCEPTION

Overview

There are four primary results to be em-phasized from the five experiments presentedhere. First, Experiment 1 was successful indemonstrating that psychoacoustic fusion isthe result of perceptual averaging of optimally

Page 21: Auditory and Linguistic Processes in Speech Perception

134 JAMES E. CUTTING

TABLE 5

UPPER LIMITS OF INTERSTIMULUS DISCREPANCIESPERMITTING CONSISTENT AND FREQUENT FUSIONS

OE /da/ FOR Six TYPES OF FUSION(UPDATING OF TABLE 1)

Onset Fre-time Intensity quency

Fusion type (msec) (db.) (H/)

Sound localizationPsychoacoustic fusionSpectral fusionSpectral/temporal fusionPhonetic feature fusionPhonological fusion

< S< 40< 40< 40< 80> 80

a

< 1040t>4()b2525

< 2> 80> 80> 80> 80> 80

» Intensity differences are not relevant to sound localizationas discussed here, since the fused percept never disintegrate swith such variation.

b Rand (1974), using stimuli very similar to those used in thepresent study.

similar acoustic features of opposite-ear stim-uli. Second, Experiments 2-4 replicated thegeneral findings and estimates reported inTable 1, and filled in the empty cells forpsychoacoustic fusion and spectral/temporalfusion. These data, shown in revised form inTable S, are all based on the syllable /da/.Third, the results of Experiments 2-5 pro-vided patterns of results that were used todifferentiate the six fusions and to arrangethem into three groups according to a levels-of-processing analysis. Those levels are dis-cussed below. Fourth, the results of Experi-ments 4 and S suggested that the perceptionof a single event is not necessary for the as-signment of a single linguistic response in fiveof the fusions (excluding sound localization).

The results of these experiments and thesupporting evidence cited throughout thepaper preclude the possibility that the sixfusions can be accounted for in terms of asingle general mechanism: The large varia-tion in sensitivities to relative onset time dif-ferences and to intensity differences, and theeffects of pitch on number versus identity ofthe fused percepts prevent any suggestions ofa simple fusion system. Instead, at leastthree, perhaps even four, perceptual levelsare needed, one for each different kind ofperceptual combination.

Level 1: Fusion by Integration of Waveforms

Sound localization is the only fusion tooccur at this first and lowest level, and themechanism involved is one that cross-cor-relates the waveforms of opposite-ear stimuli.

Three kinds of evidence separate this fusionfrom the other five. First, as shown in Experi-ment 2, no other fusion is as sensitive to dif-ferences in relative onset time. Onset differ-ences of 4 and 5 msec are sufficient to inhibitfusion, nearly an order of magnitude less thanthose intervals necessary for other fusions todistintegrate. Second, extreme sensitivity torelative frequency differences in two speechsounds is very clear from Experiment 4,whereas frequency differences do not affectother fusions. Third, on logical grounds alone,sound localization is unique because it is theonly fusion based on number of perceptsrather than in their identity. See Deatherage(1966) for an account of the physiology ofthe binaural system and Sayers and Cherry(1957) for an indication of interactionsinvolved in sound localization.

Level 2: Fusion by Integration ofAcoustic Features

Three fusions appear to occur at thissecond level: psychoacoustic fusion, spectralfusion, and spectral/temporal fusion. Evi-dence for their common allocation comes fromfive sources. First, each is moderately sensi-tive to differences in relative onset time,yielding markedly similar patterns, especiallywhen corrections are made for differentialfloor effects. In Experiment 2, each of thesefusions withstood temporal differences of be-tween 20 and 40 msec without marked dis-integration of the fused percept. Second, andalso stemming from Experiment 2, the shapeof the functions of these three fusions asonset interval increases is quite different fromthat of the functions of phonetic featurefusion and phonological fusion, a fact thatsuggests entirely different processes are oc-curring. Third, none of these three fusions isdependent on prior perception of the dichoticinputs as a single event. Experiments 4 and S,taken together, demonstrate that frequencydifferences have no effect on the probabilityof fusion responses, but that for psycho-acoustic fusion and spectral fusion, frequencydifferences do significantly affect the numbersof items perceived to occur on any given trial.Fourth, although this third stipulation is alsotrue for phonetic feature fusion and phono-

Page 22: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 135

logical fusion, these three fusions differ fromthe two remaining fusions with respect to theimportance of presentation mode. As shownin Table 3, there are as many or more"fusions" for these three phenomena when theitems are presented binaurally than when theyare presented dichotically. (Phonetic featurefusion and phonological fusion, on the otherhand, show the reverse trend.) These dataand those for similar phenomena reportedelsewhere (Pisoni & McNabb, 1974; Repp,197Sa) suggest that this second level must beprephonetic. Fifth, relative stimulus inten-sity plays an important role in the only oneof these,fusions that occurs through dichoticcompetition—psychoacoustic fusion. Sensitiv-ity to relative energy levels is indicative oflower level processing in audition.

Although they occur at the same perceptuallevel, these three fusions are separate phe-nomena. Psychoacoustic fusion, a generalphenomenon that is most dramatically repre-sented by the fusion of /ba/ and /ga/ into/da/, occurs through the perceptual averagingof similar but slightly discrepant informationpresented to opposite ears. The same aver-aging could be accomplished using syntheticsteady-state vowels with slightly differentvowel color: A vowel of midcolor between thetwo inputs is easily perceived. No such aver-aging occurs in spectral or spectral/temporalfusion, since there is no discrepant informationcompeting between the two ears.6

Spectral fusion and spectral/temporal fusiondiffer strikingly in perceptual "appearance"to the listener. When fundamental frequen-cies of the to-be-fused stimuli are the same,listeners generally report hearing only oneitem in spectral fusion but two items in spec-tral/temporal fusion. This remarkable fact issufficient evidence to consider them separatephenomena and necessitates further considera-tion of spectral/temporal fusion.

In spectral/temporal fusion, the second-formant transition is "heard" in two forms:as part of a speech syllable giving it itsidentity as /ba/, /da/, or /ga/, and as a briefglissando, or chirp. How does this dual per-ception come about? In part, one must appealto dual perceptual systems of speech andnonspeech (Day & Bartlett, 1972; Day,

Bartlett, & Cutting, 1973; Mattingly et al.,1971; Day & Cutting, Note 7). The briefchirp appears to be fused (integrated) withthe transitionless stimuli at Level 2 and thenidentified by a "speech processor." The in-formation in this chirp also appears to remainin a relatively raw acoustic form—a briefacoustic blip against a background of a con-tinuous periodic speech sound. In spectralfusion, by contrast, there is no brief signalto establish this figure-ground relationship.Logically, however, there are three possiblepercepts in the spectral/temporal fusion of/da/, as shown in Figure 1: the speech sound/da/ and the chirp, the two sounds that areactually heard, and the transitionless /da/,which is not heard. Why not? The followingaccount is somewhat complex, but appears toexplain the phenomenon and an additionalanomaly.

The transitionless-/da/ stimulus is about85% identifiable as /ba/. It is identified as/ba/, presumably, as noted earlier, becausethe harmonics of the first-formant transitionmimic the absent second-formant transitionand mimic it in such a fashion as to be ap-propriate for /b/. Thus at some level, this/ba/ competes with a redintegrated /da/.The redintegrated /da/, like the originalstimulus, is a highly identifiable, hypernormalitem (Mattingly, 1972). It competes againstthe considerably weaker /ba/, readily identi-fiable but without a prominent transitionalcue typical of synthetic speech syllables. The/da/ easily "wins" in this mismatch and isperceived instead of /ba/. An anomaly thatsupports this account stems from the fact thatfewer /ga/ spectral/temporal fusions occurthan /da/ fusions, given the appropriatestimuli (Nye, Nearey, & Rand, Note 3; seealso Experiment 2, present paper). Here thetransitionless /ga/ (again perceived as /ba/)competes with the redintegrated /ga/. The

6 The results of Perrott and Barry (1969), whichdemonstrate fusion of sine waves whose frequenciesdiffer beyond the range of binaural beats, might bethought of as a form of psychoacoustic fusion, butsince their task required detection of one versus twosignals, and not the "identity" of the signals, theirphenomenon appears to be one more closely relatedto sound localization.

Page 23: Auditory and Linguistic Processes in Speech Perception

136 JAMES E. CUTTING

result is that fewer /ga/ percepts arise anda number of /da/ responses are reportedinstead. This /da/-for-/ga/ substitution mayarise from the psychoacoustic fusion of thetransitionless /ga/ (perceived as /ba/) andthe redintegrated /ga/.

If psychoacoustic, spectral, and spectral/temporal fusion occur at the same level ofprocessing in the auditory system, it shouldbe possible, as suggested above, for them tointeract directly to create composite fusions.Indeed, this appears to be possible. Considertwo composite fusions, the first a combinationof psychoacoustic fusion and spectral fusionand the second a combination of psycho-acoustic fusion and spectral/temporal fusion.In the first case, if the syllable /ba/ is pre-sented to one ear and the second-formant of/ga/ to the other, the listener can easily hear/da/, probably with approximately the sameprobability as he or she heard it when thestimuli were /ba/ and /ga/. Berlin et al.(1973) and Porter (1975) have performedexperiments similar to this, and from theirresults the fusion in this situation seemslikely. In the second case, the syllable /ba/might be presented to one ear and the /ga/second-formant transition to the other. In thissituation the listener may report hearing /da/plus chirp. Pilot research supports the likeli-hood of these two composite fusions.

Level 3: Fusion by Disruption andRecombination of Linguistic Features

Phonetic feature fusion and phonologicalfusion can be separated from the other fusionsby three findings. First, the functions revealedwhen relative onset time is varied are unique:For both phenomena, fusions increase slightlywith small onset-time asynchronies, only todecrease after onset asynchronies of greaterthan about 40 to 80 msec. This nonlinearitysuggests a multistage process similar, perhaps,to that proposed by Turvey (1973). Fusionoccurs best, it would seem, only after thefirst-arriving stimulus has been partially pro-cessed and features extracted from it, but itis then immediately disrupted by the arrivalof the second item. Thus, these fusions areperceptual confusions resulting from the mis-assignment of linguistic features.

Second, these fusions are only moder-ately sensitive to intensity differences be-tween the two stimuli and provide sharpcontrast to psychoacoustic fusion, the onlyother of these six phenomena that occursthrough perceptual competition of similar in-formation presented to both ears. Insensitiv-ity to relative energies is characteristic ofhigher level processes (see Turvey, 1973, forthe visual parallel).

Third, as cited previously in this discus-sion, phonetic feature fusion and phonologicalfusion are the only fusions to suffer whenpresentation mode is changed from dichoticto binaural. This occurs, presumably, becausethe mixing of the signals degrades their intel-ligibility, and the stimuli mask one anotherthrough integration before disruption can evertake place.

These are the two fusions in which theactual combinatory process is necessarilylinguistic. With the possible exception ofspectral/temporal fusion, the responses of thelisteners for other fusion phenomena are onlyincidentally linguistic. Although the resultspresented here do not distinguish phoneticfeature fusion from phonological fusion, sev-eral other results and logical considerationsmay warrant separate linguistic levels forthe two.

Levels 3 and 4? Empirical findings andseveral logical considerations distinguish thetwo language-based fusions. In Figure 3, itappears that phonological fusion is onlyslightly more tolerant of lead-time differencesthan is phonetic feature fusion. This ap-parent similarity may be misleading. Day(1970), Day and Cutting (Note 5), andCutting (197S) have shown that when longerand more complex speech stimuli are used,such as one- and two-syllable words, toleranceto onset-time asynchronies can increase fromabout 80 msec to at least ISO msec, consider-ably beyond any consistent effects of phoneticfeature fusion. Since higher level fusionstypically allow for greater tolerance to rela-tive onset differences, phonetic feature fusionmight be thought to occur at Level 3 andphonological fusion at a new level, Level 4.A second finding that might support theseparation of the two linguistic fusions is

Page 24: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 137

that several previous studies in phonologicalfusion (Cutitng, 197S; Cutting & Day, 1975)have found that pairs analogous to /da/-leading-/ra/ fuse more readily than pairs suchas /ra/-leading-/da/. Such asymmetry doesnot occur for phonetic feature fusion.

In addition, logical considerations supportthe separation of the two phenomena. Phono-logical constraints, those which dictate thelogic of contiguity for phonemes in a givenlanguage, are higher level language con-straints than are phonetic feature analyses(see, among others, Studdert-Kennedy, 1974,in press; Kirstein, Note 8). For example,whereas phonetic features are largely uni-versal across all languages, phonologies arelanguage specific. In English, liquids (/I/)and /r/) cannot precede stop consonants(/b/, /g/, /p/, and /k/, for example) ininitial position, but stops readily come beforeliquids. This fact appears to account for thefact that given a dichotic pair such asBANKET/LANKET the listener rarely reportshearing LBANKET. Instead, he or she oftenhears BLANKET.

Another consideration is also important andmay separate phonological fusion from allother fusion phenomena. Day (Note 9, Note10), Cutting and Day (197S), and to a lesserextent Cutting (1975) found that there aremarked individual differences in the frequencyof phonological fusions across different sub-jects. Some individuals fuse very frequently,others fuse relatively less often, and fewindividuals fuse at rates in between these twomodes. Moreover, these differences correlatewith those found on other tasks with thesame stimuli and on tasks involving very dif-ferent stimuli (Day, 1974; Day, Note 9).Preliminary results suggest that such radicaland systematic differences may not occurelsewhere in the six fusions.

At least one additional consideration, how-ever, supports the notion that the two lin-guistic fusions do indeed occur at the samelevel. Hofmann (Note 11) and Menyuk(1972) have suggested that clusters of pho-nemes, including stop-liquid clusters, may beparsimoniously described as single, underlyingphonemes with their own unique phoneticfeatures (see also Devine, 1971). The results

of Cutting and Day (1975, Experiment 4)appear to support this conclusion, in thatcertain aspects of the dichotic presentationmay mimic certain phonetic feature valuesand contribute substantially to the perceptionof a stop-/!/ cluster. Thus, "blending" ofphonetic features might account for both pho-netic feature fusion and phonological fusion.In summary, then, data and logical considera-tions may suggest a separation of the twofusions, but the separation cannot yet beaffirmed.

SUMMARY AND CONCLUSION

Fusion is not a single phenomenon inspeech perception, but many phenomena. Sixdichotic fusions were considered, and five ofthem are distinctive in that the fused perceptdiffers from either of the two inputs. Therobustness of these phenomena was mea-sured against variation in three parameters:relative onset time of the two stimuli, rela-tive intensity of the stimuli, and their relativefrequency. Results, gathered here using thesame subjects and essentially the same stimu-lus repertory for each fusion, agree with pre-viously published accounts or, where there areno prior data, fit nicely into the scheme ofupper and lower level fusions that is devel-oped in this paper. The various fusions cannotoccur at a single perceptual level: At leastthree, perhaps four, levels are needed.

Fusion on the one hand and rivalry andmasking on the other allow reciprocal glancesat the same phenomena. The levels of per-ceptual processing developed in the presentstudy for audition are quite similar to thosedeveloped elsewhere in audition (Studdert-Kennedy, 1974; Wood, 1974, 1975) and alsoto those developed in vision (Turvey, 1973).With the exception of Julesz (1971), in bothmodalities, most research concerned withstages of processing has used rivalry-maskingparadigms. The findings in the present papersuggest that fusion paradigms can also beused to probe the speech-processing system.

REFERENCE NOTES

1. Cutting, J. E. A preliminary report on six fu-sions in auditory research (Status Report onSpeech Research, SR-31/32). New Haven, Conn,:Haskins Laboratories, 1972.

Page 25: Auditory and Linguistic Processes in Speech Perception

138 JAMES E. CUTTING

2. Haggard, M. Asymmetrical analysis of stimuliwith dichotically split formant information(Speech Perception, 2). Belfast, Northern Ire-land: Queen's University, Department of Psy-chology, 1975.

3. Nye, P. W., Nearey, T. M., & Rand, T. C.Dichotic release from masking: Further resultsfrom studies with synthetic speech stimuli(Status Report on Speech Research, SR-37/38).New Haven, Conn.: Haskins Laboratories, 1974.

4. Nearey, T. M., & Levitt, A. C. Evidence forspectral fusion in dichotic release from upwardspread of masking (Status Report on SpeechResearch, SR 39/40). New Haven, Conn.:Haskins Laboratories, 1974.

5. Day, R. S., & Cutting, J. E. Levels of processingin speech perception. Paper presented at the 10thmeeting of the Psychonomic Society, San An-tonio, Texas, November 1970.

6. Turvey, M. T. Personal communication, January1974.

7. Day, R. S., & Cutting, J. E. What constitutesperceptual competition in dichotic learning?Paper presented at the meeting of the EasternPsychological Association, New York, April 1971.

8. Kirstein, E. F. The lag eject in dichotic speechperception (Status Report on Speech Research,SR-35/36). New Haven, Conn.: Haskins Labora-tories, 1973.

9. Day, R. S. Individual differences in cognition.Paper presented at the 13th meeting of thePsychonomic Society, St. Louis, November 1973.

10. Day, R. S. Temporal-order judgments inspeech: Are individuals language-bound or stim-ulus bound? (Status Report on Speech Research,SR-21/22. New Haven, Conn.: Haskins Labora-tories, 1970.

11. Hofmann, T. R. Initial clusters in English(Quarterly Progress Report No. 84). Cambridge:Massachusetts Institute of Technology, ResearchLaboratory of Electronics, 1967.

REFERENCES

Ades, A. E. Bilateral component in speech percep-tion? Journal of the Acoustical Society of Amer-ica, 1974, 56, 610-616.

Allport, D. A. Phenomenal spontaneity and the per-ceptual moment hypothesis. British Journal ofExperimental Psychology, 1968, 59, 395-406.

Angell, J. R., & Fite, W. The monaural localizationof sound. Psychological Review, 1901, 8, 225-246.

Babkoff, H. Dichotic temporal interaction: Fusionand temporal order. Perception &• Psychophysics,1975, 18, 267-272.

Berlin, C. E. et al. Dichotic signs of the recognitionof speech elements in normals, temporal lobec-tomees, and hemispherectomees. IEEE Trans-actions on Audio and Electroacoustics, 1973,AU-21, 189-195.

Bever, T. G., & Chiarello, R. J. Cerebral dominancein musicians and nonmusicians. Science, 1974, 195,537-S39.

Bilsen, F. A., & Goldstein, J. L. Pitch of dichoticallydelayed noise and its possible spectral basis. Jour-nal of the Acoustical Society of America, 1974,55, 292-296.

Blechner, M. J., Day, R. S., & Cutting, P. E. Pro-cessing two dimensions of nonspeech stimuli: Theauditory phonetic distinction reconsidered. Journalof Experimental Psychology: Human Perceptionand Performance, in press.

Broadbent, D. E. A note on binaural fusion. Quar-terly Journal of Experimental Psychology, 1955, 7,46-47.

Broadbent, D. E., & Ladefoged, P. On the fusion ofsounds reaching different sense organs. Journal ofthe Acoustical Society of America, 1957, 29, 708-710.

Cherry, E. C., & Taylor, W. K. Some further ex-periments upon the recognition of speech with oneear and with two ears. Journal of the AcousticalSociety of America, 1954, 26, 554-559.

Cooper, F. S., & Mattingly, I. G. Computer-controlled PCM system for invesitgation of di-chotic speech perception. Journal of the AcousticalSociety of America, 1969, 46, p. 115. (Abstract)

Cramer, E. M., & Huggins, W. H. Creation of pitchthrough binaural interaction. Journal of theAcoustical Society of America, 1958, 30, 413-417.

Cullen, J. K., Thompson, C. L., Hughes, L. F.,Berlin, C. L, & Samson, D. S. The effects of variedacoustic parameters on performance in dichoticspeech perception tasks. Brain and Language,1974, 1, 307-322.

Cutting, J. E. Levels of processing in phonologicalfusion (Doctoral dissertation, Yale University,1973). Dissertation Abstracts International, 1973,34, p. 2332B. (University Microfilms No. 73-25,191)

Cutting, J. E. Two left-hemisphere mechanisms inspeech perception. Perception & Psychophysics,1974, 16, 601-612.

Cutting, J. E. Aspects of phonological fusion. Jour-nal of Experimental Psychology: Human Percep-tion and Performance, 1975, 1, 105-120.

Cutting, J. E. The magical number two and thenatural categories of speech and music. In N. S.Sutherland (Ed.), Tutorial essays in psychology.Hillsdale, N.J.: Erlbaum, in press.

Cutting, J. E., & Day, R. S. The perception of stop-liquid clusters in phonological fusion. Journal ofPhonetics, 1975, 3, 99-113.

Cutting, J. E., & Rosner, P. S. Categories andboundaries in speech and music. Perception &Psychophysics, 1974, 16, 564-570.

Cutting, J. E., Rosner, B. S., & Foard, C. F. Per-ceptual categories for musiclike sounds: Implica-tions for theories of speech perception. QuarterlyJournal of Experimental Psychology, in press.

Darwin, C. J. Dichotic backward masking of com-plex sounds. Quarterly Journal of ExperimentalPsychology, 1971, 23, 386-392.

Day, R. S. Fusion in dichotic listening (Doctoraldissertation, Stanford University, 1968). Disserta-

Page 26: Auditory and Linguistic Processes in Speech Perception

SIX FUSIONS IN DICHOTIC LISTENING 139

lion Abstracts International, 1969, 29, p. 2649B.(University Microfilms No. 69-211)

Day, R. S. Temporal-order perception of a reversiblephoneme cluster. Journal of the Acoustical Societyof America, 1970, 48, p. 95. (Abstract)

Day, R. S. Differences in language-bound andstimulus-bound subjects in solving word-searchpuzzles. Journal of the Acoustical Society of Amer-ica, 1974, 5S, p. 412. (Abstract)

Day, R. S., & Bartlett, J. C. Separate speech andnonspeech processing in dichotic listening? Journalof the Acoustical Society of America, 1972, SI, p.79. (Abstract)

Day, R. S., Bartlett, J. C., & Cutting, J. E. Memoryfor dichotic pairs: Disruption of ear performanceby the speech/nonspeech distinction. Journal ofthe Acoustical Society of America, 1973, S3, p. 358.(Abstract)

Deatherage, B. An examination of binaural inter-action. Journal of the Acoustical Society of Amer-ica, 1966, 39, 232-249.

Delattre, P. C., Liberman, A. M., & Cooper, F. S.Acoustic loci and transitional cues for consonants.Journal of the Acoustical Society of America, 1955,27, 769-773.

Deutsch, D. Two-channel listening to musical scales.Journal of the Acoustical Society of America, 1975,57, 1156-1160. (a)

Deutsch, D. Musical illusions. Scientific American,1975, 233(4), 92-105. (b)

Deutsch, D., & Roll, P. L. Separate "what" and"where" decision mechanisms in processing a di-chotic tonal sequence. Journal of ExperimentalPsychology: Human Perception and Performance,1976, 2, 23-29.

Devine, A. M. Phoneme or cluster: A critical re-view. Phonetica, 1971, 24, 65-85.

Efron, R. The relationship between the duration ofa stimulus and the duration of a perception.Neuropsychologia, 1970, 8, 37-55.

Fourcin, A. J. An aspect of the perception of pitch.In A. Sovijarvi & P. Aalto (Eds.), Proceedings ofthe Fourth International Congress of PhoneticSciences. The Hague, Netherlands: Mouton, 1962.

Fry, D. P. Perception and recognition in speech.In M. Halle, H. G. Lunt, and C. H. Schoonveld(Eds.), For Roman Jakobson.The Hague: Nether-lands: Mouton, 1956.

Groen, J. J. Super- and subliminal binaural beats.Ada Oto-laryngologica, 1964, 57, 224-230.

Guttman, N., & Julesz, B. Lower limits of auditoryperiodicity analysis. Journal of the Acoustical So-ciety of America, 1963, 35, p. 610.

Halwes, T. G. Effects of dichotic fusion on theperception of speech (Doctoral dissertation, Uni-versity of Minnesota, 1969). Dissertation AbstractsInternational, 1970, 31, p. 1S65B. (UniversityMicrofilms No. 70-15, 736)

Houtsma, A. T. M., & Goldstein, J. L. The centralorigin of the pitch of complex tones: Excerptsfrom musical interval recognition. Journal of theAcoustical Society of America, 1972, SI, 520-529.

Huggins, A. W. F. Distortion of the temporal pat-tern of speech: Interruption and alteration. Jour-nal of the Acoustical Society of America, 1964,36, 1055-1064.

Huggins, A. W. F. On the perceptual integration ofdichotically alternating pulse trains. Journal of theAcoustical Society of America, 1974, 56, 939-943.

Huggins, A. W. F. Temporally segmented speech.Perception & Psychophysics, 1975, 18, 149-157.

Jeffress, L. A. Binaural signal detection: Vectortheory. In J, V. Tobias (Ed.), Foundations ofmodern auditory theory (Vol. 2). New York:Academic Press, 1972.

Julesz, B. Foudations of cyclopean perception.Chicago: University of Chicago Press, 1971.

Kaufman, L. On the spread of suppression and bin-ocular rivalry. Vision Research, 1963, 3, 401-415.

Kaufman, L. Sight and mind. New York: OxfordUniversity Press, 1974.

Kubovy, M. Cutting, J. E., & McGuire, R. M.Hearing with the third ear: Dichotic perception ofa melody without monaural familiarity cues.Science, 1974, 186, 272-274.

Leakey, D. M., Sayers, B. M., & Cherry, E. C. Bin-aural fusion of low- and high-frequency sounds.Journal of the Acoustical Society of America, 1958,30, 22.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P.,& Studdert-Kennedy, M. Perception of the speechcode. Psychological Review, 1967, 74, 631-661.

Licklider, J. C. R. Basic correlates of the auditorystimulus. In S. S. Stevens (Ed.), Handbook ofexperimental psychology. New York: Wiley, 1951.

Licklider, J. C. R., Webster, J. C., Hedlun, J. M.On the frequency limits of binaural beats. Journalof the Acoustical Society of America., 1950, 22,468-473.

Linden, A. Distorted speech and binaural speechresynthesis tests. Ada Oto-laryngologica, 1964, 58,32-48.

Lisker, L., & Arbamson, A. S. A cross-languagestudy of voicing in initial stops: Acoustical mea-surements. Word, 1964, 20, 384-422.

Locke, S., & Kellar, L. Categorical perception in anonlinguistic mode. Cortex, 1973, 9, 355-369.

Marslen-Wilson, W. D. Sentence perception as aninteractive parallel process. Science, 1975,189, 226-228.

Massaro, D. W. Preperceptual images, processingtime, and perceptual units in auditory perception.Psychological Review, 1972, 79, 124-145.

Massaro, D. W. Perceptual units in speech recogni-tion. Journal of Experimental Psychology, 1974,102, 199-208.

Mattingly, I. G. Speech cues and sign stimuli. Amer-ican Scientist, 1972, 60, 327-337.

Mattingly, I. G,, Liberman, A. M., Syrdal, A., &Halwes, T. Discrimination in speech and non-speech modes. Cognitive Psychology, 1971, 2, 131-157.

Matzker, J. Two new methods for the assessment ofcentral auditory function in cases of brain di§-

Page 27: Auditory and Linguistic Processes in Speech Perception

140 JAMES E. CUTTING

ease. Annals of Otology, Rhinology, and Larnygol-ogy, 1959, 68, 1185-1197.

Menyuk, P. Clusters as single underlying conso-nants: Evidence from children's productions. InA. Rigault & R. Charbonneau (Eds.), Proceedingsof the Seventh International Congress of PhoneticSciences. The Hague, Netherlands: Mouton, 1972.

Mills, A. W. Auditory localization. In J. V. Tobias(Ed.), Foundations of modern auditory theory(Vol. 2). New York: Academic Press, 1972.

Noorden, L. P. A. S. van. Temporal coherence inthe perception of tone sequences. Eindhoven,Netherlands: Instituut voor Perceptie Onderzoek,1975.

Oster, G. Auditory beats and the brain. ScientificAmerican, 1973, 229(4), 94-103.

Perrott, D. R., & Barry, S. H. Binaural fusion.Journal of Auditory Research, 1969, 3, 263-269.

Perrott, D. R., & Elfner, L. F. Monaural localiza-tion. Journal of Auditory Research, 1968, 8, 185-193.

Perrott, D. R., & Nelson, M. A. Limits for the de-tection of binaural beats. Journal of the AcousticalSociety of America, 1969, 46, 1477-1481.

Pisoni, D. B. Auditory and phonetic memory codesin the discrimination of consonants and vowels.Perception 6- Psycho physics, 1973, 13, 253-260.

Pisoni, D. B. Speech perception. In W. K. Estes(Ed.), Handbook of learning and cognitive pro-cesses (Vol. 5). Hillsdale, N.J.: Erlbaum, in press.

Pisoni, D. B., & McNabb, S. D. Dichotic interactionof speech sounds and phonetic feature processing.Brain and Language, 1974, 1, 351-362.

Porter, R. J. Effect of delayed channel on the per-ception of dichotically presented speech and non-speech sound. Journal of the Acoustical Society ofAmerica, 1975, 58, 884-892.

Rand, T. C. Dichotic release from masking forspeech. Journal of the Acoustical Society of Amer-ica, 1974, 55, 678-680.

Repp, B. H. Dichotic forward and backward"masking" between CV syllables. Journal of theAcoustical Society of America, 1975, 57, 483-4C6.(a)

Repp, B. H. Dichotic masking of consonants byvowels. Journal of the Acoustical Society of Amer-ica, 1975, 57, 724-735. (b)

Repp, B. H. Distinctive features, dichotic competi-tion, and the encoding of stop consonants. Percep-tion & Psychophysics, 1975,17, 231-242. (c)

Repp, B. H. Perception of dichotic place contrasts.Journal of the Acoustical Society of America, 1975,58 (supplement), p. 76. (Abstract) (d)

Sayers, B. M., & Cherry, E. C. Mechanism of bin-aural fusion in the hearing of speech. Journal ofthe Acoustical Society of America, 1957, 29, 973-987.

Shankweiler, D., & Studdert-Kennedy, M. Identifica-tion of consonants and vowels presented to left

and right ears. Quarterly Journal of ExperimentalPsychology, 1967, 19, 59-63.

Smith, B. A., & Resnick, D. M. An auditory testfor assessing brain stem integrity: Preliminaryreport. Laryngoscope, 1972, 82, 414-424.

Speaks, C., & Bissonette, L. J. Interaural-intensivedifferences and dichotic listening. Journal of theAcoustical Society of America, 1975, 58, 893-898.

Studdert-Kennedy, M. The perception of speech. InT. A. Sebeok (Ed.), Current trends in linguistics(Vol. 12). The Hague, Netherlands: Mouton,1974.

Studdert-Kennedy, M. Speech perception. In N. J.Lass (Ed.), Contemporary issues in experimentalphonetics. New York: Academic Press, in press.

Studdert-Kennedy, M., & Shankweiler, D. P. Hemi-spheric specialization for speech. Journal of theAcoustical Society of America, 1970, 48, 579-594.

Studdert-Kennedy, M. Shankweiler, D. P., & Pisoni,D. B. Auditory and phonetic processes in speechperception: Evidence from a dichotic study.Cognitive Psychology, 1972, 3, 455-466.

Studdert-Kennedy, M., Shankweiler, D. P., &Schulman, S. Opposed effects of a delayed channelon perception of dichotically and monotonicallypresented CV syllables. Journal of the AcousticalSociety of America, 1970, 48, 599-602.

Tallal, P., & Piercy, M. Developmental aphasia:Rate of auditory processing and selective impair-ment of consonant perception. Neuropsychologia,1974, 12, 83-93.

Tallal, P., & Piercy, M. Developmental aphasia: Theperception of brief vowels and extended stop con-sonants. Neuropsychologia, 1975, 13, 69-73.

Thurlow, W. R., & Elfner, L. F. Pure-tone cross-earlocalization effects. Journal of the Acoustical So-ciety of America, 1959, 31, 1606-1608.

Tobias, J. V. Curious binaural phenomena. In J. V.Tobias (Ed.), Foundations of modern auditorytheory (Vol 2 ) . New York: Academic Press, 1972.

Turvey, M. T. On peripheral and central processesin vision: Inferences from an information-processing analysis of masking with patternedstimuli. Psychological Review, 1973, 80, 1-52.

Wood, C. C. Parallel processing of auditory andphonetic information in speech discrimination.Perception &• Psychophysics, 1974, 15, 501-508.

Wood, C. C. Auditory and phonetic levels of pro-cessing in speech perception: Neurophysiologicaland information-processing analyses. Journal ofExperimental Psychology: Human Perception andPerformance, 1975, 1, 3-20.

Wood, C. C., Goff, W. R., & Day, R. S. Auditoryevoked potentials during speech perception. Science,1971, 173, 1248-1251.

Woodworth, R. S. Experimental psychology. NewYork: Holt, 1938.

(Received May 16, 1975)