review between sound and perception: reviewing the search for a … · 2001. 12. 16. · review...

42
Review Between sound and perception: reviewing the search for a neural code Jos J. Eggermont * Neuroscience Research Group, Departments of Physiology and Biophysics, and Psychology, University of Calgary, 2500 University Drive N.W., Calgary, AB, Canada T2N 1N4 Received 6 December 2000; accepted 29 January 2001 Abstract This review investigates the roles of representation, transformation and coding as part of a hierarchical process between sound and perception. This is followed by a survey of how speech sounds and elements thereof are represented in the activity patterns along the auditory pathway. Then the evidence for a place representation of texture features of sound, comprising frequency, periodicity pitch, harmonicity in vowels, and direction and speed of frequency modulation, and for a temporal and synchrony representation of sound contours, comprising onsets, offsets, voice onset time, and low rate amplitude modulation, in auditory cortex is reviewed. Contours mark changes and transitions in sound and auditory cortex appears particularly sensitive to these dynamic aspects of sound. Texture determines which neurons, both cortical and subcortical, are activated by the sound whereas the contours modulate the activity of those neurons. Because contours are temporally represented in the majority of neurons activated by the texture aspects of sound, each of these neurons is part of an ensemble formed by the combination of contour and texture sensitivity. A multiplexed coding of complex sound is proposed whereby the contours set up widespread synchrony across those neurons in all auditory cortical areas that are activated by the texture of sound. ß 2001 Elsevier Science B.V. All rights reserved. Key words: Neural representation; Neural transformation; Neural coding; Auditory system; Neural synchrony; Amplitude and frequency modulation ; Voice onset time ; Speech ; Vocalization 1. Introduction Approximately 35 years after the publication of Kiang’s monograph on the discharge patterns of audi- tory nerve (AN) ¢bers in the cat (Kiang et al., 1965) the physiology of the AN is well known. An important aspect thereof is the identi¢cation of the targets of the myelinated type I and unmyelinated type II ¢bers in the three subdivisions of the cochlear nucleus (CN) (Ryugo, 1992; Liberman, 1991, 1993). For the type I ¢bers, the frequency tuning curves (FTCs), period histograms and post-stimulus time histograms (PSTH) for simple stim- uli, e.g. clicks and tone bursts, are well documented. The responses to more complex stimuli, such as ele- 0378-5955 / 01 / $ ^ see front matter ß 2001 Elsevier Science B.V. All rights reserved. PII:S0378-5955(01)00259-3 * Tel.: +1 (403) 220-5214 (o/ce), +1 (403) 220-7747 (lab); Fax: +1 (403) 282-8249; E-mail: [email protected] Abbreviations: AAF, anterior auditory ¢eld; AI, primary auditory cortex; AII, secondary auditory cortex; ALSR, average localized synchronized rate; AM, amplitude modulation; AN, auditory nerve; AVCN, anterior part of the VCN; BMF, best modulating frequency; CCI, consonant closure interval; CF, characteristic frequency; CN, cochlear nucleus; CNS, central nervous system; CV, consonant vowel; DCN, dorsal cochlear nucleus; DS, directional sensitivity; EP, evoked potential; ERP, event-related potential; FM, frequency modulation; FRA, frequency response area; FTC, frequency tuning curve; GBC, globular bushy cell; HVc, hyperstriatum ventrale pars caudale; IBE, information-bearing element; IBP, information-bearing parameter; IC, inferior colliculus; ICC, central nucleus of the inferior colliculus; ICX, external nucleus of the inferior colliculus; IHC, inner hair cell; ILD, interaural level di¡erence; ITD, interaural time di¡erence; LFP, local ¢eld potential; LIN, lateral inhibitory network; LL, lateral lemniscus; LSO, lateral superior olive; MGB, medial geniculate body; MGBv, ventral part of the MGB; MNTB, medial nucleus of the trapezoid body; MSO, medial superior olive; MTF, modulation transfer function; MU, multi-unit; MUA, multi-unit activity; P, posterior auditory ¢eld; PSTH, post-stimulus time histogram; PVCN, posterior part of the VCN; RCF, rates of change of frequency (velocity); RF, response ¢eld; rMTF, rate modulation transfer function; SBC, spherical bushy cell; SC, superior colliculus; SOC, superior olivary complex; SPL, sound pressure level; SR, spontaneous ¢ring rate; STRF, spectro-temporal receptive ¢eld; tMTF, temporal modulation transfer function; VCN, ventral cochlear nucleus; VNLL, ventral nucleus of the lateral lemniscus; VOT, voice onset time; VP, ventral posterior auditory ¢eld; VS, vector strength Hearing Research 157 (2001) 1^42 www.elsevier.com/locate/heares

Upload: others

Post on 24-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Review

    Between sound and perception: reviewing the search for a neural code

    Jos J. Eggermont *Neuroscience Research Group, Departments of Physiology and Biophysics, and Psychology, University of Calgary, 2500 University Drive N.W.,

    Calgary, AB, Canada T2N 1N4

    Received 6 December 2000; accepted 29 January 2001

    Abstract

    This review investigates the roles of representation, transformation and coding as part of a hierarchical process between sound andperception. This is followed by a survey of how speech sounds and elements thereof are represented in the activity patterns along theauditory pathway. Then the evidence for a place representation of texture features of sound, comprising frequency, periodicity pitch,harmonicity in vowels, and direction and speed of frequency modulation, and for a temporal and synchrony representation of soundcontours, comprising onsets, offsets, voice onset time, and low rate amplitude modulation, in auditory cortex is reviewed. Contoursmark changes and transitions in sound and auditory cortex appears particularly sensitive to these dynamic aspects of sound. Texturedetermines which neurons, both cortical and subcortical, are activated by the sound whereas the contours modulate the activity ofthose neurons. Because contours are temporally represented in the majority of neurons activated by the texture aspects of sound,each of these neurons is part of an ensemble formed by the combination of contour and texture sensitivity. A multiplexed coding ofcomplex sound is proposed whereby the contours set up widespread synchrony across those neurons in all auditory cortical areasthat are activated by the texture of sound. ß 2001 Elsevier Science B.V. All rights reserved.

    Key words: Neural representation; Neural transformation; Neural coding; Auditory system; Neural synchrony;Amplitude and frequency modulation; Voice onset time; Speech; Vocalization

    1. Introduction

    Approximately 35 years after the publication ofKiang's monograph on the discharge patterns of audi-tory nerve (AN) ¢bers in the cat (Kiang et al., 1965) thephysiology of the AN is well known. An importantaspect thereof is the identi¢cation of the targets of the

    myelinated type I and unmyelinated type II ¢bers in thethree subdivisions of the cochlear nucleus (CN) (Ryugo,1992; Liberman, 1991, 1993). For the type I ¢bers, thefrequency tuning curves (FTCs), period histograms andpost-stimulus time histograms (PSTH) for simple stim-uli, e.g. clicks and tone bursts, are well documented.The responses to more complex stimuli, such as ele-

    0378-5955 / 01 / $ ^ see front matter ß 2001 Elsevier Science B.V. All rights reserved.PII: S 0 3 7 8 - 5 9 5 5 ( 0 1 ) 0 0 2 5 9 - 3

    * Tel. : +1 (403) 220-5214 (o¤ce), +1 (403) 220-7747 (lab); Fax: +1 (403) 282-8249; E-mail: [email protected]

    Abbreviations: AAF, anterior auditory ¢eld; AI, primary auditory cortex; AII, secondary auditory cortex; ALSR, average localizedsynchronized rate; AM, amplitude modulation; AN, auditory nerve; AVCN, anterior part of the VCN; BMF, best modulating frequency; CCI,consonant closure interval; CF, characteristic frequency; CN, cochlear nucleus; CNS, central nervous system; CV, consonant vowel; DCN, dorsalcochlear nucleus; DS, directional sensitivity; EP, evoked potential; ERP, event-related potential; FM, frequency modulation; FRA, frequencyresponse area; FTC, frequency tuning curve; GBC, globular bushy cell ; HVc, hyperstriatum ventrale pars caudale; IBE, information-bearingelement; IBP, information-bearing parameter; IC, inferior colliculus; ICC, central nucleus of the inferior colliculus; ICX, external nucleus of theinferior colliculus; IHC, inner hair cell ; ILD, interaural level di¡erence; ITD, interaural time di¡erence; LFP, local ¢eld potential; LIN, lateralinhibitory network; LL, lateral lemniscus; LSO, lateral superior olive; MGB, medial geniculate body; MGBv, ventral part of the MGB; MNTB,medial nucleus of the trapezoid body; MSO, medial superior olive; MTF, modulation transfer function; MU, multi-unit; MUA, multi-unitactivity; P, posterior auditory ¢eld; PSTH, post-stimulus time histogram; PVCN, posterior part of the VCN; RCF, rates of change of frequency(velocity); RF, response ¢eld; rMTF, rate modulation transfer function; SBC, spherical bushy cell ; SC, superior colliculus; SOC, superior olivarycomplex; SPL, sound pressure level; SR, spontaneous ¢ring rate; STRF, spectro-temporal receptive ¢eld; tMTF, temporal modulation transferfunction; VCN, ventral cochlear nucleus; VNLL, ventral nucleus of the lateral lemniscus; VOT, voice onset time; VP, ventral posterior auditory¢eld; VS, vector strength

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    Hearing Research 157 (2001) 1^42

    www.elsevier.com/locate/heares

  • ments of speech, can generally be predicted from thoseto more simple ones such as tones, two-tone combina-tions and clicks (Sachs, 1984). Within 10 years afterYoung's (1998) detailed review on the CN, we perhapswill be at the same level of understanding about theCN, but the multiplicity of cell types and circuitry(Rhode, 1991; Oertel, 1999) causes this to be a moredi¤cult endeavor. The ventral CN (VCN) extracts andenhances the frequency and timing information that ismultiplexed in the ¢ring patterns of the AN ¢bers, anddistributes the results via two main pathways: thesound localization path and the sound identi¢cationpath. The anterior part of the VCN (AVCN) mainlyserves the sound localization aspects and its two typesof bushy cells provide input to the superior olivarycomplex (SOC), where interaural time di¡erences(ITDs) and level di¡erences (ILDs) are mapped foreach frequency separately (Carr, 1993). The posteriorpart of the VCN (PVCN) extracts across frequency tim-ing aspects through its broadly tuned octopus cells,whereas its stellate cells, as well as those from theAVCN, compute estimates of the spectral representa-tion of sound. This temporal and spectral informationis carried, via the monaural nuclei of the lateral lemnis-cus (LL), to the central nucleus of the inferior colliculus(ICC). This sound identi¢cation path carries a leveltolerant representation of complex spectra (e.g. of vow-els) created by the chopper (stellate) neurons in VCN(May et al., 1998). The temporal and the spectral as-pects of sound are both topographically, but mutuallyorthogonal, mapped in the ICC (Langner, 1992). Theoutput from the SOC also arrives at the ICC, followingsome additional elaboration by the neurons in the dor-sal nucleus of the LL (Wu and Kelly, 1995). In the ICC,ITDs and ILDs are combined into frequency-speci¢cmaps of interaural di¡erences (Yin and Chan, 1988).Combining the frequency-speci¢c ITD and ILD mapsfrom the ICC results in a map of sound location in theexternal nucleus of the inferior colliculus (ICX). Thisauditory space map is subsequently represented in thedeep layers of the superior colliculus (SC) (Middle-brooks, 1988) and aligned with the retinotopic map ofvisual space and the motor map of gaze (Hyde andKnudsen, 2000; Knudsen et al., 1987).

    The inferior colliculi (ICs) and SCs form an impor-tant endpoint of the time-specialized part of the audi-tory nervous system (Trussell, 1997). In the IC, topo-graphic maps are found for frequency, periodicity, andlocation of a sound. This spatial map is su¤cient andnecessary for adequate orientation to a sound source(Cohen and Knudsen, 1999). The ICC is the ¢rst levelwhere physiological correlates of critical bandwidthproperties, such as its level independence, are present(Ehret and Merzenich, 1988a,b; Schreiner and Langner,1997).

    In the last two decades one has slowly started toaccept the notion that the auditory system evolved toallow the perception of sounds that are of survival val-ue, and that the auditory system therefore has to bestudied using such sounds. This neuroethological em-phasis (Ohlemiller et al., 1994) has brought us a majorunderstanding of the brains of auditory specialists suchas echo locating bats (Suga, 1988, 1996) and the barnowl (Konishi et al., 1988). Speech is not fundamentallydi¡erent, in the acoustic sense, from animal vocaliza-tions, albeit that it is not as stereotyped (Suga, 1992).Human speech cannot have carried any weight in theevolutionary pressure that led to vertebrate hearing,and thus one cannot expect any particular selectivityand sensitivity in the vertebrate auditory systems, in-cluding humans, to speech. However, one may assumethat human speech developed according to the con-straints posed by the auditory and vocalization systems.This is re£ected in the fact that human speech andanimal sounds, not only those of other primates, sharethe same three basic elements: steady-state harmoni-cally related frequencies, frequency modulations(FMs) and noise bursts (Mullennix and Pisoni, 1989;Suga, 1988, 1992; Fitch et al., 1997).

    Most of our knowledge about the auditory thalamusand auditory cortex is obtained by stimulating withthose sound elements that speech and animal vocaliza-tions have in common: noise bursts, FM, and harmoniccomplexes interspersed with silent gaps. Frequency,noise burst bandwidth and preference for FM appearto be topographically mapped in cortical areas of audi-tory generalists (Schreiner, 1995) and specialists (Suga,1994) alike. In contrast, sensitivity to low-frequencyamplitude modulation (AM), as well as to gaps andvoice onset times (VOTs), appears to be distributedacross most neurons in at least three cortical areasand re£ected as modulations in ¢ring rate, that aresynchronous across areas (Eggermont, 1994a, 1998c,1999, 2000a). Many auditory cortical areas are tono-topically organized (Phillips, 1995; Schreiner, 1995)and they are presumably specialized to represent a lim-ited, and likely di¡erent, set of particular importantsound features (Ehret, 1997), albeit that none of thesespecializations, except in the mustache bat's cortex, hasde¢nitively been identi¢ed. One would expect that sep-arate auditory cortical areas needed to be able to inte-grate biologically important sound features with otherperceptual and cognitive tasks. It is therefore likely thatindividual cortical areas ful¢ll a role similar to that ofthe various cell types and subdivisions in the CN andbrainstem. It seems likely that no more than a few in-dependent channels or types of processing can coexistwithin an area (Kaas, 1987). The information, extractedby each cortical area, could be used to create clusteredrepresentations of sound location (e.g. in the frontal eye

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^422

  • ¢elds, Cohen and Knudsen, 1999), and sound meaning(e.g. in the mammalian equivalents of premotor areassuch as hyperstriatum ventrale pars caudale (HVc) inbirds and Broca's area in humans, Doupe and Kuhl,1999).

    The neural code employed by a sensory system islikely determined by the innate structure of the system,i.e. it is the connection of the pathways and the proper-ties of their neurons that produce the coded represen-tation (Ehret, 1997). These anatomical connections andtheir neuronal specialization determine what kind ofneurophysiological representation of the stimulus willoccur. At higher levels of the central nervous system(CNS) these representations will be modulated by neu-ral activity re£ecting internal states such as drowsinessor arousal and also by the recent perceptual history ofthe animal (Merzenich and deCharms, 1996).

    Perkel and Bullock (1969) asked more than threedecades ago: `Is the code of the brain about to bebroken?' They subsequently listed a large number ofpotential neural codes that `made sense' to the neuro-scientists of that time. As we will see `making sense' or`having meaning' is crucial to the notion of a code; itindicates that coding occurs in context. The endless listof potential codes in that review also suggested that theconcept of `code' was very broadly de¢ned. The speci¢cmentioning of the `code of the brain' suggests a beliefthat there is only one neural code for all perceptualphenomena.

    Maybe we know more about the neurophysiologicalsubstrates of the vowel /O/ than of any other speechsound. We know nearly everything about the neuralactivity evoked by the vowel /O/ in AN ¢bers (Delgutte,1984; Sachs, 1984) and in various cell types of the CN(May et al., 1998). On the basis of those neural re-sponses the investigator is able to identify that vowelfrom a selection of other vowels with near certainty.The auditory CNS can do so too, but lacks the a prioriknowledge of the experimenter about which vowels arepresented. How does the CNS do this identi¢cation?The neural responses to /O/ will likely change dramati-cally between CN and auditory cortex. Is there, in theend, a unique code for /O/? If so, how is that codeformed out of the neural activities evoked by thesound? The vowel /O/ can be characterized by a uniquespectrum that appears to be represented in a populationof T-multipolar cells (choppers) in VCN in a level tol-erant way (Young, 1998). It most likely can also beuniquely represented in the population autocorrelogramof the phase-locked ¢rings in a population of bushycells or octopus cells in VCN (Oertel, 1999). However,we do not have any account of neural activity causedby /O/ in, for instance, the IC or the auditory cortex. We

    do know the multi-unit activity (MUA) that is pro-duced in some parts of auditory cortex by other pho-nemes such as /da/, /ta/, /ba/, and /pa/ (Steinschneider etal., 1994; Eggermont, 1995a,b). We even know whatareas in the human brain are metabolically activateddi¡erentially by presentation of voices and other soundsthrough visualization by positron emission tomographyscans or functional magnetic resonance imaging (Belinet al., 2000). However, vowel representation willstrongly depend on context, i.e. what consonants pre-cede or follow it. It is therefore not clear if a represen-tation of a word can be generated from the representa-tion of its phonemes in isolation.

    There is evidence that di¡erent acoustic representa-tions exist for identical phonemes. Identical cues forparticular phonemes can also give rise to di¡erent per-cepts as a function of context. As a result of thesecontextual e¡ects, it has been di¤cult to isolate acousticattributes or features that satisfy the perceptual invari-ance that is observed in practice. Thus, there is no sim-ple one-to-one relationship between the acoustic seg-ments of the speech waveform and the way they areperceived (Mullennix and Pisoni, 1989). The implica-tions thereof for the existence of a neural code havenot yet been explored.

    The relative importance of spectral and temporal in-formation present in the sounds used to convey speechunderstanding has been elucidated by research in co-chlear implant users where limited place-speci¢c infor-mation is delivered to the remaining AN ¢bers. Whatminimal information has to be presented to the AN¢bers in case of complete sensory hearing loss so thatthe receiver of a cochlear implant can fully understandspeech? It seems that, at least under optimal listeningconditions, spectral information is far less importantthan temporal information for recognition of phonemesand words in simple sentences (Shannon et al., 1995).Event-related potential (ERP) research in normal hear-ing and implanted human subjects has elucidated a po-tential role for non-primary auditory pathways in sig-naling temporal di¡erences in sounds that can orcannot be sensed by the auditory cortex (Ponton etal., 2000).

    This review will present an extensive, albeit not ex-haustive, selection of what is known about neural re-sponses in the auditory system of laboratory animals,such as monkeys, cats and bats, related to the identi¢-cation of complex sound. I will discuss the transforma-tions in the neural activity that take place along the wayfrom AN to cortex. The review will also speculate fur-ther, extending a previous review (Eggermont, 1998b),on that elusive interface between stimulus and percep-tion: the neural code.

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^42 3

  • 2. Biologically important features of sound have shapedthe auditory nervous system

    Worden and Galambos (1972) aptly noted that `thefull capacity of the sensory processor may not be re-vealed except through study of its response to stimulithat pose analytical tasks of the kind that shaped itsevolutionary development'. Thus, regularities in theacoustic biotope, consisting of individual vocalizationsand background sounds that are part of the naturalhabitat of the animal (Aertsen et al., 1979; Nelken etal., 1999; Smolders et al., 1979), are likely manifested inthe response properties of auditory neurons. It has beensuggested that the statistical structure of natural signalsis important for creating e¤cient representations of thesensory world in the CNS (Barlow, 1961; Rudermanand Bialek, 1994; Rieke et al., 1995). One can alsosay that sensory systems have evolved to e¤ciently pro-cess natural stimuli. In this respect, the peripheral audi-tory system may have evolved not only for the recep-tion of biologically important sounds, but also for thefrequency^time analysis of those sounds that are espe-cially important to a species. In social animals, the pro-cessing of communication calls is likely one of themajor functions of the central auditory system. Spectro-grams of speech and animal vocalizations (Fig. 1) allexhibit three basic components (information-bearingelements, IBEs; Suga, 1989) : constant frequency parts,noise bursts, and FM components. In human speech,several harmonically related constant frequencies(formants) constitute a vowel, the transition from anoise burst into the vowel is typically in the form ofa formant glide, an FM part (Mullenix and Pisoni,1989; Fitch et al., 1997). Combinations of the IBEsare important for a¡ecting the behavior of both animalsand humans. The combinations of IBEs are frequentlystereotyped in animal vocalizations but vary enor-mously in human speech. In fact, one of the mostfundamental characteristics of speech is this inherentphysical variability while keeping its perceptual con-stancy.

    Auditory information is carried not only by theacoustic parameters characterizing each of the abovethree types of IBEs, but also by information-bearingparameters (IBPs) representing relationships amongthese three IBEs in the frequency, amplitude and timedomains (Suga, 1992, 1996). Examples of IBPs could bethe characteristic delay between the noise burst and theFM component, the VOT, in some phonemes. Anotherexample is the rate of the simultaneous AM of constantfrequency components in the vowel that, depending onmodulation frequency, represents glottal pulse rate andprosody.

    One could tentatively equate the IBEs with the tex-ture of sound, and the IBPs with the contours of sound.

    Texture is largely spectral, whereas contours are largelyif not exclusively temporal in nature.

    Speech and animal vocalizations contain frequency-modulated components in which frequency and time arenot separable, that is, the product of the spectrum andthe temporal envelope is not equal to the spectrogramof the original vocalization (Eggermont et al., 1981). Incontrast, mixtures of vocalizations tend to be separableinto a stationary Gaussian process (the carrier) and atemporal envelope. Thus all frequency components inthe carrier are co-modulated by the envelope. This low-frequency co-modulation of background sound may beacquired during atmospheric propagation of sound as aresult of micro turbulence (Richards and Wiley, 1980).Co-modulation of the background makes it less e¡ec-tive in masking individual, foreground, vocalizationswhich are not subject to this co-modulation (Nelkenet al., 1999).

    Small groups of neurons may have evolved to processsounds that are essential for species survival and maybind, i.e. simultaneously represent, many IBPs of thesesounds. On the other hand, neurons or neural popula-tions that show little binding may process sounds thatare not particularly important to the species. In otherwords, the processing of these general sounds may besolely based upon their spatio-temporal patterns (Suga,1994). This could imply that a neural code exists onlyfor those sounds that are essential to a species but notfor sound in general. However, what is essential forsurvival may change during the life span of an animaland could for instance move sounds from the, initially,general category into the, presently, essential category.So context dependence or learning with respect to theinformation carried by biological sounds cannot be ne-glected, especially when taking brain plasticity into ac-count (Merzenich and deCharms, 1996). The complex-ity and variability of vocal signals and the processesdevoted to perceiving the signal constitute an importantdi¡erence between humans and animals in perceivingcomplex sounds. However, the auditory aspects ofspeech processing may depend on mechanisms special-ized for the representation of temporally complexacoustic signals in humans and other mammals alike,regardless of their communicative or linguistic relevance(Fitch et al., 1997).

    3. Representations and codes: de¢ning theneurophysiological emphasis

    I propose that in the ¢eld of neurophysiology, theconcept `neural code' be reserved for a unique speci¢-cation, a set of rules, that relates behavior to neuralactivity. The neural code is then to be equated withthe result of the last non-identity transformation of

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^424

  • neural activity between sensory receptor and cortex. Inthis sense it may be analogous to the genetic codepresent in DNA (words of three nucleic acid bases,out of four available, code for speci¢c amino acids)that is translated by RNA (through the ribosomes)into proteins (combinations of 20 amino acids) whichare the phenotype (Frank-Kamenetskii, 1997). From acryptographic standpoint, however, the genetic code isnot a code. The string `genetic code' is merely a meta-phor for the correlation between nucleic and aminoacids. Similarly, the `neural code' could be de¢ned asnot more than the neural basis of a correlation betweenstimulus and behavior.

    If a neural code such as de¢ned above exists, howdoes it materialize into its phenotype: behavior? In thisI include perception among behavior. One could won-der if there is also a cognitive tagging process that helpsrecognizing parts of the code in di¡erent contexts? Just

    as the genetic environment is important to determinewhich genes (combinations of nucleic acid `words') areexpressed and at what time, there is a context depen-dence that determines if and when the neurally codedstimulus elicits a behavior. Could one also say that thesame behavior (percept) might be elicited by di¡erentneural codes just as di¡erent DNA `words' may codefor the same amino acid?

    The labels `neural code', coding, decoding and encod-ing all have been used very widely and most of thetimes were used synonymously with neural representa-tion. Some object entirely to the use of these labels onthe basis that `decoding of sensory input, followed byrepresentation and reconstruction of the world withinthe brain, logically requires that these rebuilt images,assembled from the previously encoded sensory input,must now be viewed or processed by neural structuresthat do not require encoded input' (Halpern, 2000).

    Fig. 1. Two vocalization sounds that illustrate similarities and di¡erences in IBEs and IBPs. In the left-hand column, the waveform and spec-trogram of a kitten meow are presented. The duration of this meow is 0.87 s, the average fundamental frequency (F0) is 550 Hz, the lowest fre-quency component (F1) is about 0.5 kHz and the highest frequency component (not shown) is 5.2 kHz. The second (F2) and the third (F3)harmonics, between 1.5 and 2.5 kHz, have the highest intensity. Distinct downward and upward FMs occur simultaneously in all formants be-tween 100 and 200 ms after onset. The meow has a slow AM. In the right-hand column, the waveforms of a /pa/ syllable with a 30 ms VOTand its spectrogram are shown. The periodicity of the vowel and the VOT are evident from the waveform. Low level aspiration noise waspresent in the period before the onset of voicing. The dominant frequency ranges are F0 = 125 Hz, F1 = 700 Hz, F2 = 1200 Hz. Because the dy-namic range of the representation is only 30 dB the third formant at 2600 Hz is only weakly visible. The fundamental frequency started at125 Hz and remained at that value for 100 ms and dropped from there to 100 Hz at the end of the vowel. The ¢rst formant started at 512 Hzand increased in 25 ms to 700 Hz, the second formant started at 1019 Hz and increased in 25 ms to 1200 Hz and the third formant changedin the same time span from 2153 Hz to 2600 Hz.

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^42 5

  • However, this may be too strict an interpretation ofthese coding-related labels. The idea behind encoderand decoder operations in communication systems isthat the signal is transmitter as a sequence of symbols.The relationship between the signal and the symbols,generally not a one-to-one relationship, is usually calledthe code. Part of the problem in applying this to per-ception is that for the nervous system, the set of sym-bols itself is unknown. The symbols could be spiketimes, spike time intervals, number of spikes, particularspike sequences, spike time coincidences, etc.

    Johnson (1980) and Cariani (1995) have extensivelyaddressed the necessary distinction between representa-tion and code in a general context, and it is useful toreview and potentially extend their classi¢cation. It isalso useful to state up front that what the auditorysystem up to at least the level of the primary auditorycortex (AI) does is process information about sounds.Information that is not present at the input of a speci¢cnucleus cannot be present at its output, and neuralcoding may be a unique way to re£ect the essential,abstracted, information leading to behavior. This viewbypasses cognitive substitutions, i.e. making assump-tions about and identi¢cation of a stimulus that arenot warranted on the basis of its physical attributes.

    A discussion of information processing requires thatdistinctions are made between its three interdependentbut separate aspects: neural representation, neuraltransformation, and neural coding.

    The neural representation of a sound can be de¢nedas the complete spatio-temporal array of neural activityin all of the neurons passing through a transverse planeacross the auditory system (Johnson, 1980). This planecan be at the level of the AN, the brainstem, the mid-brain, etc. Cariani (1995) calls the neural correlate ofstimulus quality a `sign'. A sign is a characteristic pat-tern of neural activity that has all the informationneeded to a¡ect a particular sensory discrimination orbehavior. When the sign is present, the particular sen-sory discrimination or behavior may occur but it doesnot occur when the sign is not present. This is in manyways equivalent to the neural representation of a be-haviorally meaningful sound. Thus, a sign is more re-stricted than a neural representation because, for in-stance, the neural representation for the vowel /O/does not have much meaning for the experimental ani-mal in which the recordings are done and thus is gen-erally not a sign. However, after some behavioral train-ing such a neural representation may become a sign.

    The ¢rst `neural' representation resides in the cochle-ar hair cells. Subsequently, this representation is con-veyed by parallel pathways, embodied in the collateralsof the AN, to the three divisions of the CN, divergingfurther as outputs from distinct cell types and continu-ing toward the IC. This creates a set of parallel neural

    pathways. A particular pathway may contain a detailedrepresentation of the stimulus, but does not contributeto recognition of the stimulus. For instance, the local-ization pathway originating in the AVCN may not beused in stimulus recognition at all (but see Loeb et al.,1983; Koch and Grothe, 2000; Grothe, 2000). Alterna-tively, a particular pathway may carry only a partialrepresentation of the sound, but may provide all ofthe information about particular important sound di-mensions (IBEs and IBPs) that need to be discrimi-nated. This all has to be taken into account when eval-uating neural coding.

    A neural transformation is the computational processby which a neural representation at one location in asensory pathway is derived from a neural representationat a lower level. From a system's analysis point of view,the transformation between two neural representationscan be expressed as a transfer function or as the actionof a ¢lter. For a linear transformation with independentinputs, the transformation could be described com-pletely by a multi-dimensional impulse response. Trans-formations occur at synaptic junctions, and they are ingeneral non-linear. Unless synapse specialization occurs(as it does in the AN^spherical bushy cell (SBC) or inthe globular bushy cell (GBC)^medial nucleus of thetrapezoid body (MNTB) principal cell synapses) thosetransformations could destroy synchrony to the stimu-lus ¢ne structure and may, for example, set the limitsfor the discrimination of rapidly changing stimuli. Onegenerally looses information by neural transformationsbecause most transformations create noise. The onlyway to gain is when the conditional probabilities forthe response given the sounds do change, or when thenoise is reduced, e.g. as the result of converging neuralactivity such as occurring in the octopus or choppercells of the CN. Note that the gain in these cases is inan individual output neuron with respect to the infor-mation carried by the activity of individual neurons inthe input representation. Nothing can be gained in therepresentation as a whole.

    Neural code has been di¤cult to de¢ne. Some pro-posals : `Ttransformed and transmitted signals do notconstitute a bona ¢de neural code unless that informa-tion is appropriately acted upon ^ interpreted ^ by ap-propriate parts of the organism' (Perkel and Bullock,1969). `Twe shall operationally de¢ne a true code as aparameter of the signal which actually carries behavior-ally usable information' (Uttal, 1969). `Ta neural codeat a particular location within the sensory pathway asbeing the parameter of the a¡erent discharge in thepopulation that is actually used by the organism inparticular sensory behavior' (Mountcastle, 1975).`Codes are the functional organizations that actuallyutilize a particular set of signs to e¡ect a perceptualdiscrimination' (Cariani, 1995). All cited sources agree

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^426

  • that a code relates neural activity to behavior. Brugge(1992) states that `in order to qualify as a neural codefor acoustic information, it must be shown ¢rst that the(neural) pattern in question occurs in the auditory sys-tem under natural conditions or is evoked by naturalstimuli, and second that there exists a sensitive receiver ;that is a set of neurons whose activity changes in re-sponse to the candidate code it receives'. This de¢nitionis close to that of a `sign', a speci¢c form of a neuralrepresentation.

    Again adopting a system-theoretic point of view, theneural code could be de¢ned as the transfer functionbetween a (multi-dimensional) stimulus and behavior(perception). The maximum information that can betransmitted by the neural code from this point ofview is related to the signal to noise ratio, i.e. the ratioof stimulus-induced spikes and spontaneous activity.The latter should ideally be independent and additiveto the stimulus-induced activity, however, this does notappear to be the case (Cecchi et al., 2000). A problem isthat the stimulus needs to be de¢ned as all relevantsensory information that results in the behavior or per-cept. A further complication for a systems approach isthat the nervous system is not time-invariant (ischanged by experience), is not deterministic (no uniquerelationship between stimulus and percept), and cer-tainly is not linear.

    Representations in cortex may be relational, i.e. rela-tions between elements and ensembles could be used toestablish reliable and £exible representations. This is away in which the representational relations among agroup of neuronal elements, modi¢ed by learning andplasticity, can remain invariant (Merzenich and de-Charms, 1996). Relational representations could be atthe basis of the maintenance of perceptual constancy ina changing cortex. Thus, neural representations are notstatic: one may learn to attend to some components ofthe neural representation and ignore others (the `selec-tive listening' hypothesis). This could be represented byan adjustment of the neural code; the rule connectingstimulus and response. Alternatively, one could assumethat the neural code is unique and that the read out ofthe neural code is changing dependent on the context. Iprefer this latter approach because it keeps the analogywith the unique genetic code where the read out of thegene (expression) depends on its environment. On thedown side, a neural code de¢ned this way could beintractable by neurophysiological techniques. The neu-ral code provides a link between a neural representa-tion, based on neurophysiological studies, and sensorydiscrimination performance (a behavior) that dependson the information contained in that neural representa-tion (deCharms and Zador, 2000). Thus I intend toisolate the content of the neural code from its ultimatefunction as a releaser of behavior, thereby presuming

    that it can release a speci¢c behavior or cause a speci¢cpercept. Whether the code actually elicits the behaviorwill not be considered here. This restriction, in princi-ple, allows a limited search for the neural code in anaes-thetized or otherwise non-behaving animals. Later on Iwill come back to the system-theoretic approach andinterpret the neural code in terms of the mutual infor-mation (Borst and Theunissen, 1999) present in a cor-tical neural representation.

    There is a hierarchical order in this classi¢cation ofneural activity: a neural representation forms the basis,a sign is a neural representation of behaviorally mean-ingful sounds, and a code is a sign that on its own iscapable of evoking a behavior. This de¢nition limits theuse of neural code to the ¢nal interface between sensoryneural activity and behavior; an interface that likelyresides in cortex. Thus, it is probably meaningless tospeak of complex sound coding in the discharges ofthe AN ¢bers. On the other hand, at the auditory cor-tex level neural representations and neural codes mayboth exist. It is not a priori clear at what level in theauditory nervous system one can justi¢ably talk aboutneural coding, but it will be argued that it cannot be ata level below that of the IC.

    How does the neural code relate to the perceptualdistinctions that can be made? Distinctions in percep-tion are based upon distinct activity patterns of thenervous system, either speci¢cally sensory or those re-£ecting the animals internal state. Thus each dimensionof perceptual quality should be related to a dimensionof neural activity. It should thus be possible to correlatespaces of perceptual distinctions with spaces of neurallyencoded distinctions (Cariani, 1995). The correlationmatrix that contains this information, re£ects, or isidentical with, the neural code.

    4. Stimulus representation in terms of ¢ring rate or¢ring times

    Usually one reads about rate coding, or about tem-poral or synchrony coding, in the activity patterns ofthe AN. In our restricted nomenclature this is trans-lated into the representation of a speci¢c sound in theoverall or synchronized ¢ring rates of AN ¢bers or inthe interspike intervals of AN ¢ber activity. In the fol-lowing I will transcribe part of Cariani's (1995) classi-¢cation for neural code types into those for neural rep-resentations.

    Neural representations re£ecting the connectivity pat-tern of neurons and nuclei take the form of place, la-beled line, or spatial pattern representations (Brugge,1992). In the auditory system this usually takes theform of ¢ring rate vs. characteristic frequency (CF) orphase-locked (synchronized) rate vs. CF (Young and

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^42 7

  • Sachs, 1979). A problem with a rate representation isthat it rapidly deteriorates in the presence of back-ground noise (Miller and Sachs, 1984). Neural represen-tations may also re£ect the temporal patterning of ¢ringactivity without the need for any speci¢c reference toCF. Such a representation can be characterized by the(all-order) interspike interval representation of stimulusperiodicity as demonstrated in AN ¢bers (Cariani andDelgutte, 1996), in the IC (Delgutte et al., 1998), and inauditory cortex (Eggermont, 1998c). In this form onecan speak of a temporal representation of sound (seeTheunissen and Miller (1995) for a more di¡erentiatedpoint of view). Neural representations can also re£ectthe relative time of arrival as quanti¢ed by latency dif-ferences as a function of a particular stimulus dimen-sion (e.g. sound source azimuth, Eggermont, 1998a) ordegree of interneuronal synchrony (deCharms and Mer-zenich, 1996; Eggermont, 1994b, 1997a, 2000a).

    A main consideration here is to acknowledge that thestimulus and its time of presentation are unknown tothe animal. Thus one of the problems in identifying auseful code or representation for the CNS is that such acode or representation cannot in any way incorporateproperties of the external stimulus such as time of pre-sentation, frequency content or position in space. Forinstance, one cannot use period histograms which re-quire knowledge of the stimulus periodicity to infertiming representation in a periodic sound but one canuse the neuron's own interspike interval distributionwhich presents the same information and needs no resetof an internal `clock' (Horst et al., 1986; Javel et al.,1988). One cannot use latency as part of the stimulus^response features used in a 'panoramic code' (Middle-brooks et al., 1994) but one may use spike latenciesrelative to an internally available global time markersuch as a local ¢eld potential (LFP). In practice, how-ever, this does not seem to make much di¡erence withrespect to the representation of sound source location(Furukawa et al., 2000). However, internal representa-tions or codes, in principle, need to be free from exter-nal `anchors'.

    These types of neural representation are by no meansall inclusive. For instance, in the AN a ¢ring rate rep-resentation of speech exists along side, and is multi-plexed with, a synchronized ¢ring rate representation(Sachs, 1984), and also with an interspike interval rep-resentation (Cariani and Delgutte, 1996; Cariani,1999a). The cells in the CN that receive informationfrom AN ¢bers may extract either the ¢ring rate infor-mation (e.g. the stellate cells), or the ¢ring synchronyinformation (e.g. the bushy cells and octopus cells).This form of multiplexing information in neural repre-sentations may be a consequence of a specialization fortiming that exists in the subcortical auditory nervoussystem.

    5. Neural representations as maps

    A topographic map is de¢ned as an ensemble of neu-rons combining two or more neural representationswith parameters systematically ordered along a spatialdimension in a given nucleus or area. Usually this takesthe form of a spatially coded parameter (e.g. CF) andsome other parameter (e.g. average ¢ring rate, ¢rst-spike latency) (Schreiner, 1995; Ehret, 1997). As anexample, the rate^place scheme's central representationis formed by di¡erences in spatially organized ¢ringrates, i.e. as ¢ring rates di¡ering as a function of CF.At behaviorally relevant intensities, the rate tuningcurves are usually broad, so that a relative ¢ring raterepresentation (Erickson, 1974) has to be extractedfrom the topographic maps to match the often very¢ne perceptual capacity. An example is the map ofsound source azimuth in the SC. Latency^place repre-sentations use the relative time of arrival in di¡erentspatial locations in order to encode intensity and otherqualities. Latency^place mechanisms appear to be in-volved in a wide variety of sensory processes: e.g. stim-ulus localization, echo delay maps (Cariani, 1995, 1997,1999b). In the AI of the bat, neurons are tuned foramplitude as well as for frequency of sound. The bestamplitude map varies along a dimension perpendicularto the tonotopic axis. A higher order area (FM^FMarea) in the bat's secondary auditory cortex (AII)maps the time interval between acoustic events suchas emitted sound and received echo that signals preydistance. The best time interval between two FMsounds varies systematically across the cortical surface.Tonotopy is absent in the FM^FM area and neuronsare grouped in clusters according to the ranges of fre-quencies contained in the FM components (Suga,1994).

    Besides topographic maps, where the similarity of therepresentational properties of two neurons is re£ectedin their physical distance, there are also functionalmaps. Such maps have an ordering through their cor-relation structure only (Koenderink, 1984). Neural unitswith strong correlation of their ¢rings can be consid-ered close in the neural organization, units with weakcorrelation have a larger functional distance. In visualcortex, distant neurons that have similar orientationpreference tend to show signi¢cant correlation in their¢ring times (Ts'o et al., 1986), suggesting that theselike-minded neurons form a correlation map of orienta-tion preference that is superimposed in a fractionatedway upon the retinotopic map. Maps of this kind haveyet to be identi¢ed in the auditory cortex, where corre-lation strength appears relatively isotropically distrib-uted (Brosch and Schreiner, 1999; Eggermont, 1992a,1994b, 2000a).

    The apparent importance of self-organizing processes

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^428

  • in development, based on correlated neural activity pat-terns, suggests that some speci¢c features of corticalorganization, such as tonotopic representations, andtypes of modular grouping of neurons within those rep-resentations, could be byproducts of developmental se-quences in the building of brains. Thus, speci¢c featuresof cortical organization may be the outcome of thebuilding process rather than a design for optimizingfunction (Kaas, 1987).

    Functional maps may di¡er for di¡erent stimulusconditions or di¡erent contexts, whereas topographicmaps are generally situation-independent. However,one has to keep in mind that what is mapped topo-graphically is often an experimenter-based abstraction,e.g. CF for the tonotopic map. Because CFs for di¡er-ent neurons are found at di¡erent threshold intensitylevels, such a map may not have functional importancebesides re£ecting the developmental/experiential processthat led to it.

    A prime example of a functional map is that of audi-tory space in the ICX. Such maps are computed inhierarchical fashion from frequency-speci¢c topo-graphic maps of interaural time and intensity di¡eren-ces in the SOC. Inputs to the neurons that make upsuch a map are almost instantaneously transformedinto a place-coded distribution of the mapped parame-ter. Sorted in this manner, the derived information canbe accessed readily by higher order processors (mappedor not) using relatively simple schemes of connectivity(Knudsen et al., 1987).

    In most computational maps, and also in some topo-graphic maps, regardless of the parameter being ana-lyzed, neurons are typically broadly tuned (Seung andSompolinsky, 1993) for the mapped parameter. Compu-tational maps are pre-wired and are fast but still mod-i¢able. They are likely only created for parameter val-ues that are biologically relevant and require speedyaction. Despite the fact that neurons organized in com-putational maps are broadly tuned for the mapped pa-rameter, precise information about the values of param-eters is contained in the output of these maps. Theneuron's tuning curves are peaked and shift systemati-cally across the map. This gives rise to systematic di¡er-ences in ¢ring rates across the map for any given stim-ulus. Thus, high-resolution information is contained inthe relative responses of neurons across the map. Thisconstitutes a ratio map (Erickson, 1974) that is leveltolerant if the rate intensity functions of the neuronsare approximately linear. The subsequent processorhas to be sensitive to the relative levels of activity with-in a large population of neurons, and able to detectlocations of peak activity within the map. Computa-tional maps, with their parallel array of preset process-ors, are ideally suited to rapidly sort and process com-ponents of complex stimuli and represent the results in

    a simple systematic form. Regardless of the stimulusparameter processed, the answer is always representedas the location of a peak of activity within a populationof neurons. When a parameter is represented in topo-graphic form, a variety of neuronal mechanisms canoperate to further sharpen tuning in ways not possibleif the information is in a non-topographic code. Oneclass of mechanisms is regional interactions such aslocal facilitation and lateral inhibition, which can onlywork on mapped information (Knudsen et al., 1987).

    Maps provide for binding of IBEs and IBPs. Thiscould be done if IBEs are topographically mappedand IBPs are providing a temporal tagging of theIBEs across maps. A sound ultimately forms a correla-tion map that can be considered as a neural assembly.A series of sounds may result into a series of concate-nated maps. This is an ever-changing assembly thatforms a path through the N-dimensional internal repre-sentation of the acoustic biotope.

    There are multiple, often nearly identical, repetitionsof the tonotopic map at both subcortical and corticallevels. Each separate frequency representation is a unitin a system of serial/parallel channels making up thecentral auditory system. In theory, the number of pa-rameters that can be mapped independently and con-tinuously in one area is limited to the number of dimen-sions in the neural structure (Kaas, 1987; Schreiner,1995). The nervous system potentially overcomes thislimitation by organizing ¢ne-grained maps withincoarse-grained maps, such as the visual orientationmap within the retinotopic map. However, map-depen-dent neural interactions operate optimally only for theparameter mapped in ¢ne grain. Perhaps this is onereason that di¡erent parameters are mapped in sepa-rate, functionally specialized areas of the brain, mostof which also contain a coarse tonotopic map. Eachmap is likely performing a di¡erent type of analysison the sensory information from the receptors; percep-tion ultimately involves integration of the informationfrom these separate representations (Young, 1997). Thiscould be done on the basis of the underlying tonotopic-ity, but also on the basis of a synchronized representa-tion of contours. Understanding the functional roles ofthe separate maps depends on anatomical evidence,physiological properties, and behavioral evidence afteractivating or deactivating the putative map.

    Do topographic maps participate directly in neuralcoding? Speci¢cally do topographic maps convey usefulinformation? It is clear that tonotopic maps are empiri-cal constructs de¢ned by the investigator in terms of anarbitrary level of neuronal ¢ring in response to stimu-lation of speci¢c cochlear locations. In fact, the topog-raphy of cortical maps can change considerably withstimulus level (Phillips et al., 1994), localized sensorydeprivation (Harrison et al., 1991; Rajan et al., 1993;

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^42 9

  • Eggermont and Komiya, 2000) and experience (Dinse etal., 1993). The basic organization of a sensory map isestablished through developmental processes that alloweasy lateral interactions between neurons. Short-termchanges in the map re£ect modi¢cations in the balanceof excitatory and inhibitory inputs reaching certain neu-rons. However, maps may not provide direct informa-tion about pro¢les or levels of neuronal activity pro-duced by di¡erent stimuli. Maps can be interpreted or`read out' only by an external observer; they cannot beutilized internally by the brain, which `knows' only neu-ronal activity or the lack thereof. This leads to theconclusion that information about di¡erent stimuliand their functional signi¢cance is not conveyed directlyby the map structure but only indirectly by patterns ofactivity resulting therefrom.

    6. Information processing: probing the e¤ciency inneural representations

    In general, calculating whether the informationneeded to represent a particular stimulus property ispresent in the ¢rings of a given neural population ismuch easier than to determine whether the CNS ac-tually utilizes all or part of this information to modifyits behavior (Johnson, 2000). A clear example is foundin the relationship between AN activity and the thresh-old of hearing. In barn owls, the threshold for phaselocking, i.e. being able to tell the frequency of the soundfrom the representation in the interspike intervals, isbetween 10 and 20 dB lower than the threshold atwhich the ¢ring rate begins to rise. The threshold of¢ring rate increase, in fact, corresponds to the thresholdof hearing at that frequency (Ko«ppl, 1997). In somato-sensory cortex, thresholds based on ¢ring rate were alsosimilar to the animal's psychophysical threshold where-as neural thresholds based on periodicity were far lowerthan those behavioral thresholds (Hernandez et al.,2000). Earlier studies in cat and chinchilla (Javel etal., 1988) also found a lower threshold for changes inphase locking compared to ¢ring rate. However theysuggested that there was a better correspondence be-tween behavioral hearing thresholds and those basedon the emergence of signi¢cant phase locking. So thereis clearly information about the sound in the temporalpatterning of neural activity, sometimes well below thethreshold of hearing is reached. Either that informationis lost along the auditory pathway, or the decisionabout hearing or not hearing is only based on a notice-able increase in ¢ring rate. Neurons in auditory cortex,however, can signal the presence or absence of a soundbetter on the basis of ¢rings synchronized across neu-rons in di¡erent cortical areas than on increases in ¢r-ing rate (Eggermont, 2000a).

    Information theory, the most rigorous way to quan-tify the content of neural representations, is an aspectof probability theory that was developed in the late1940s by Shannon (1948) as a mathematical frameworkfor quantifying information transmission in communi-cation systems. Shannon's information theory can beconsidered as a generalized measure of stimulus dis-criminability. Given a, a priori known, set of stimuli,the amount by which the uncertainty about the stimulusH(S) is reduced after observing a response H(SMR) iscalled the mutual information:

    IR;S HS3HSMR 1

    In general, the mutual information can be written asa function of both the conditional and marginal stim-ulus and response probabilities :

    IR;S

    34ipSi logpSi34jpRj4ipSiMRj logpSiMRj2

    and can be expressed in bits/spike. Here p(Si) is the apriori distribution of stimulus parameter Si (the one tobe discriminated), and p(SiMRj) is the a posteriori condi-tional distribution of the stimulus parameter Si for agiven response Rj. This equation implies that in orderto evaluate the conditional entropy H(SMR), one ¢rsthas to convert back from the neuronal response distri-bution, to a stimulus probability distribution by meansof Bayes' rule:

    pSMRpR pRMSpS 3

    This is easy for constant stimulus sets (e.g. containingnine morphed cat vocalizations, Gehr et al., 2000) butnot for natural stimulus ensembles. The stochastic map-ping of the sensory environment S onto the set of neu-ral activity patterns R, p(RMS), forms the neural repre-sentation of the stimuli that make up the sensoryenvironment. This is what is usually studied experimen-tally. The inverse mapping p(SMR) gives the plausibilityof a sensory stimulus S given the recorded neural activ-ity R, and can be considered as the sensory interpreta-tion of neural activity (Johannesma, 1981). This is whatthe animal uses to navigate in perceptual space. Wehave called these two approaches previously experi-menter-centered and subject-centered approaches (Eg-germont et al., 1983a,b). The analysis of neurophysio-logical data from the point of view of the subject isgenerally done via the reverse correlation method (deBoer, 1967, 1969; de Boer and de Jongh, 1978). In aspeci¢c implementation this may result in the spectro-temporal receptive ¢eld (STRF), approximately equal

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^4210

  • to the average spectrogram of sounds preceding a spike(Eggermont et al., 1983a,b; Kim and Young, 1994;deCharms et al., 1998; Theunissen and Doupe, 1998).This reverse correlation approach glori¢es the informa-tion residing in the `single spike', rather than in thepopulation activity (see also Rieke et al., 1997). Theresult is either the average preferred waveform, the re-verse correlation function (in case there is phase lockingof discharges to the sound), or the average preferredfrequency^time distribution of sounds preceding thespikes. Such preferred sounds can be considered opti-mal in terms of being matched to the neurons spatio-temporal response properties (deCharms et al., 1998),but they do not necessarily evoke the highest ¢ringrates.

    A universal ¢nding in information calculations car-ried out for peripheral neurons is a relatively high im-portance of single spikes (Rieke et al., 1997) in the sensethat the mutual information per spike is high. Informa-tion measures in bits per spike do not translate directlyinto spike timing precision in ms, but in the linear case,the high-frequency cut-o¡ of stimulus encoding corre-sponds to the limiting accuracy of spike timing. In gen-eral, the auditory periphery and brainstem are highlyspecialized to process timing information.

    Several pertinent questions have to be answered. Canp(S) be de¢ned in general? Does this de¢nition requireinclusion of the entire acoustic biotope with all frequen-cies of occurrence for all the individual sounds or soundcombinations? Does this have to be weighted by themeaning of the sound, e.g. its meaning for reproductionor survival? Can mutual information change by learn-ing? Is this re£ected in a change in H(RMS)? Could onerede¢ne the probability of occurrence of a stimulusp(S), and thus H(S), by its probability of impact butleave H(RMS) untouched? This is not practical in mostexperiments.

    Similarly H(R), the uncertainty about (or entropy of)the neural response, corresponds to the number of bitsrequired to specify all possible responses under all pos-sible stimulus conditions. H(SMR) is the entropy remain-ing in the stimulus once the neural responses areknown. Adding the uncertainty remaining in the neuralresponse when the stimulus is known, the neuronalnoise H(RMS), to I(R,S) gives the total neural entropy,H(R). Therefore an alternative expression for the mu-tual information is:

    IR;S HR3HRMS: 4

    Because H(R) represents the maximal informationthat could be carried by the neuron being studied, com-paring H(RMS) to H(R) also gives an estimate of theneural code's e¤ciency.

    The amount of information present in the spike train

    can also be estimated by a comparison of the recon-structed stimulus and the original stimulus. Stimulusreconstruction can be done by, e.g., replacing eachspike in a spike train by its reverse correlation function(Johannesma, 1981; Gielen et al., 1988) or by substitut-ing the STRF (Hesselmans and Johannesma, 1989;Theunissen et al., 2000). The lower bound of the infor-mation present in the spike train can then be obtainedfrom integrating 3log(13Q2) across frequency, wherebyQ2 is the squared coherence between the reconstructedsignal and the original (Borst and Theunissen, 1999).

    Estimating information transfer without making as-sumptions about how the stimulus is encoded is done inthe direct method. The `direct method' calculates infor-mation directly from the neural response by estimatingH(R) and H(RMS), it estimates exactly the average in-formation transmitted but does not reveal what aspectsof the stimulus are being encoded. This suggests thatone can evaluate information rates by using two typesof stimuli. (1) For the evaluation of the ¢rst term H(R),total spike train uncertainty, one needs a large range ofstimuli drawn randomly from a given stimulus ensem-ble, because these stimuli have to test the limits of re-sponse variability. (2) The second term H(RMS) re£ectsthe variability of the response when the stimulus is¢xed, and can be evaluated from responses to a typicalstimulus instance repeated many times. These newlydeveloped information-theoretic methods (Buracas andAlbright, 1999) allow one to quantify the degree towhich neuronal representations become more abstract,by selective loss of irrelevant information, as one pro-ceeds from the periphery to the CNS. It appears thatmost information carried by spikes is about the timingof abrupt variations in the stimulus, i.e. about the stim-ulus contours. It may thus well be that the neural codewill be mostly re£ecting a sequence of changes in neuralactivity.

    7. Parallel distributed processing and a specialization forrepresenting time characterize the auditory system

    Sound is special among sensory stimuli. Sound sourcelocalization, in contrast to the visual and somatosen-sory system where stimulus location is directly mappedonto the receptor surface, has to be computed frominteraural spectral and timing di¡erences. The positionof a sound source produces only minute time of arrivaldi¡erences at the two ears: in the human ear at most800 Ws, for a sound located along the axis through theears. On that basis, we are able to distinguish di¡er-ences in location that produce about 10 Ws interauraltime di¡erence (corresponding to a path length di¡er-ence of approximately 3.5 mm). The specialization ofthe auditory system for the accurate processing of time

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^42 11

  • starts in the hair cells and ¢nds its basis in the purelymechanically operated transduction channels (Hud-speth, 1989). The openings and closings of these chan-nels can follow the fastest of sound frequencies so thatthe hair cell is depolarized and hyperpolarized in com-plete ¢delity and in synchrony with the sound fre-quency. The synapse between the inner hair cells(IHCs) and the AN ¢bers is also specialized to repro-duce as ¢ne a time resolution as possible (Trussell,1997; Hudspeth, 1999). This exquisite temporal sensi-tivity is also the basis for a multiplexed representationof sound: a spectral one and a temporal one. Parallelprocessing allows the initial segregation of localizationand identi¢cation pathways.

    7.1. Parallel processing between cochlea and IC

    The ear performs at least three operations on a com-plex sound besides localizing it. The ¢rst is separatingthe individual frequency components from several thatare simultaneously present (e.g. in vowel formants).This spectral complexity of the stimulus is the maindeterminant of its perceived timbre. It is also important,together with temporal cues, for determining the pitch

    of a complex sound and this information is essential forthe auditory system to di¡erentiate between two speak-ers. The second operation for the ear is to enhance thespectral and temporal contrasts of the resolved fre-quency components in order to compensate for thepoor signal to noise ratios in naturally occurringsounds. The third operation is to extract and abstractthe behaviorally meaningful parameters from the resultsof the peripheral spectral analysis (Plomp, 1976; Evans,1992). The ¢rst two of these tasks are performed in theCN.

    The output of the VCN largely follows the anatom-ical anterior and posterior subdivisions: AVCN spher-ical cell and GBC output is involved in the localizationof sound and projects to the SOC. In contrast, thePVCN is only involved in the identi¢cation of soundand its output bypasses the SOC to project to the mon-aural nuclei of the LL. The sound localization pathwaysare indicated in red in Fig. 2. The SBCs in the AVCNpreserve and convey the timing information of AN ¢-bers bilaterally to the medial superior olive (MSO). Theresponses of SBCs to tones are sharply tuned andphase-locked for frequencies below 3^5 kHz. The pro-jection patterns of the large SBCs produce delay lines in

    Fig. 2. Simpli¢ed scheme of the a¡erent pathways in the auditory system up to the IC in the cat. Various levels of parallel processing and con-vergence are noted. First of all, low-SR ¢bers have partially segregated projections from medium- and high-SR ¢bers (here indicated by 'highSR'). After the separate processing in the major cell groups of the VCN and DCN the pathways follow a distinct course for the sound localiza-tion processes (red) and the sound identi¢cation processes (black), or belong to both at some point on their course (gray). All pathways con-verge in the IC, however, only the ICC is shown. Not all known pathways are included and the output from the DCN is not dealt with at all.

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^4212

  • the MSO. Most axons terminate in an isofrequencyband in the contralateral MSO with axon collateralsthat vary systematically in length, being shortest medi-ally and largest laterally. As a consequence of this pro-jection pattern, neurons in the MSO are activated in asystematic function of the location of sound in the hor-izontal plane (Smith et al., 1993; Oertel, 1999).

    The sound identi¢cation pathways in Fig. 2 are in-dicated by black (completely separate from the local-ization path) and gray (splitting o¡ the localizationpath). Stellate cell output forms a major, direct path-way from the CNs to the contralateral midbrain butalso to the ipsilateral peri-olivary nuclei and to the con-tralateral ventral nucleus of the trapezoid body and theventral nucleus of the LL (VNLL). The sharply tuned,tonic responses of stellate cells to tones carry essentialacoustic information: each cell encodes the presence ofenergy present in a narrow acoustic band with tonic¢ring. The ¢ring of the population of stellate cells canthus provide a rate representation of the spectral con-tent of sounds.

    Almost all of the monaural and binaural pathwaysfrom the lower auditory system project to the ICC.Neurons in the ICC project to the auditory forebrain.Thus, the cellular anatomy of the IC may provide thesubstrate for integration of the ascending pathways onsingle projection neurons (Oliver and Huerta, 1992).The operation of the IC consists largely of integratinginhibitory and excitatory inputs, which themselves havedi¡erent temporal properties, and so interact to pro-duce ¢lters for temporal features of sound (Keller andTakahashi, 2000). The ¢lters appear tailored for theanalysis of sound envelope; such as estimation of theduration, the envelope modulation rate, or the rate ofFM (Covey and Casseday, 1999).

    Spatial information represented in IC cells is con-veyed both to AI and to the SC in the midbrain ofmammals, or to their homologues: ¢eld L and the optictectum in birds. In the midbrain pathway, the locationof an auditory stimulus comes to be represented in amap of space. The ¢rst step towards creating a spacemap takes place in non-tonotopic subdivisions of theIC. These subdivisions are sites where informationabout spatial cues is combined across frequency chan-nels, yielding neurons that are broadly tuned for fre-quency and tuned for sound source location. Unlikeneurons in the tonotopic pathway, neurons in theseareas are far more responsive to complex sounds thanthey are to tonal stimuli. By integrating informationacross frequency channels in a non-linear fashion,they eliminate spatial ambiguities that are inherent tofrequency-speci¢c cues and become tuned for a singlesound source location. This transformation, of a fre-quency-speci¢c code for spatial cues into a topographiccode for space in the ICX, has been described in several

    species. The output of the space processing regions ofthe IC is conveyed to the SC where, in all speciesstudied so far, an auditory map of space exists.

    At this stage it is appropriate to ask what has hap-pened between AN, the origin of the distributed pro-cessing, and the ICC where all the tonotopic pathwaysconverge. In the AN, STRFs describe the preferredsound for individual ¢bers succinctly and can be usedto predict the response to any sound (Kim and Young,1994). STRFs in the midbrain of frogs (Aertsen andJohannesma, 1980, 1981; Aertsen et al., 1980, 1981;Hermes et al., 1981; Epping and Eggermont, 1985)are more complex than those for the AN. These STRFsonly qualitatively predicted the responses of individualcells to complex sounds that were su¤ciently di¡erentfrom the Gaussian noise with which the STRF wasdetermined (Eggermont et al., 1983a). In fact, theSTRFs in frog midbrain are very similar to those ob-tained in AI of the cat (deCharms et al., 1998) and ¢eldL of birds (Theunissen and Doupe, 1998). Thus onewonders what the predictability for responses to soundsthat di¡er from those that were used to estimate theSTRF for these areas will be.

    Most of the response types described for the CN arefound in the IC. Non-monotonic rate intensity func-tions appear in the ICC, potentially re£ecting similarnon-monotonic input from the dorsal CN (DCN) (Ait-kin, 1986). The AN and ICC are both tonotopicallyorganized, however, whereas the AN has a smooth rep-resentation of CF, the ICC shows a step-wise progres-sion of CF along its main topographic axis thought tore£ect a framework for representation of psychophysi-cal critical bands. Combined herewith a smooth fre-quency gradient exists orthogonal to the main fre-quency axis (Schreiner and Langner, 1997). Thus theICC comprises both analytic and integrative propertiesin the frequency domain. In addition, periodicity infor-mation in the cat appears also topographically organ-ized orthogonal to the ¢ne structure of the frequencyrepresentation (Langner and Schreiner, 1996). The co-localization, albeit along orthogonal axes, of frequencypitch and periodicity pitch in the ICC may well providefor a similar organization proposed for auditory cortex(Schulze and Langner, 1997).

    In order to explain a potential topographic mappingof periodicity in ICC (a periodotopic map) a coinci-dence detection mechanism between onset activity andchopper activity was proposed (Langner, 1992; Lang-ner and Schreiner, 1996). Combining this with the cor-relation between best modulation frequency (BMF) andCF the mapping would follow automatically. However,as Krishna and Semple (2000) recently pointed out,because of the dependence of the rate modulation trans-fer function (rMTF) on sound pressure level (SPL),such a map is likely not level tolerant. In addition,

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^42 13

  • they considered it as more likely that the band-passrMTFs seen in IC neurons, interpreted as an indicationof rate coding of periodicity, result from coincidencedetection of synchronized excitatory inputs. This con-verts the peak of a temporal modulation transfer func-tion (tMTF) in CN to a rMTF peak in IC. Coincidencedetection mechanisms usually destroy temporal infor-mation present in phase-locked ¢rings, but if preferredintervals exist they may survive this process (Cariani,1999b). This loss of temporal information can be seenin the low-pass or band-pass shape of the synchronized¢rings as a function of modulation frequency in VCN(MÖller, 1973). Thus, it is no surprise that the ICC isthe ¢rst structure where the ¢ring rate is tuned to AMfrequency. Consequently, the IC has been proposed asthe place where certain aspects of temporal informationpresent in the CNs are transformed into a rate^placecode (Langner, 1981). However, because of the periodicnature of the phase-locked ¢rings, the output of thecoincidence detector will still be periodic and a repre-sentation of synchronized ¢ring still exists in ICC albeitthat it is largely identical to that for ¢ring rate (Eppingand Eggermont, 1986). Krishna and Semple (2000) inthe IC of the cat also demonstrated that most IC neu-rons responded with signi¢cant synchrony to modula-tion frequencies up to 300 Hz. Thus considerable tem-poral information remains in the response in the rangeof modulation frequencies where most BMFs for therMTFs lie. The emergence of rate tuning in the ICdoes not necessarily preclude the possibility that infor-mation about modulation frequency is also present in atemporal code.

    It is tempting to consider the specialization of theauditory system for timing as restricted to the levelsbelow the IC. It is at these levels that accurate temporalrepresentations for space and pitch are found and it islikely that most of these are converted into rate^placerepresentations in the IC. As a consequence, processingof auditory information above the IC should potentiallybe compared to processing of other sensory modalities.This would allow answering the question if a special-ization for processing related to the nature of soundremains. If this is not the case, then neural coding strat-egies found in the somatosensory and visual modalitiesmay apply also to the auditory system.

    7.2. Parallel processing between IC and auditory cortex

    The out of the midbrain pathways of the IC (Fig. 3)comprise: (1) a lemniscal pathway (black), that origi-nates in the ICC and terminates in the ventral divisionof the medial geniculate body (MGBv), and continuesto AI. (2) A lemniscal adjunct pathway (fat gray lines),that originates in the cortical (ICX) and paracentralsubdivisions of the IC, and targets the deep dorsal,

    dorsal and ventro^lateral nuclei of the MGB. This con-tinues to non-primary cortical areas. (3) A di¡use path-way (thin gray lines), that originates from neuronsthroughout the IC and lateral tegmentum and sendsaxons to the medial division of the MGB (Aitkin,1986; Graybiel, 1973; Oliver and Huerta, 1992).

    Cortex is a relatively new invention: it is found onlyin mammals, albeit that a homologue is found in arch-osaurs (birds and crocodiles). In most other vertebratesthe pinnacle of sound processing is a homologue of theIC, such as the torus semicircularis in ¢sh, frogs andtoads, and lizards. Obviously, this presents su¤cientinformation processing capacity of sensory events toprevent extinction of these animals.

    Fig. 3. Simpli¢ed scheme of the a¡erent and e¡erent connections be-tween the IC and auditory cortex of the cat. Five cortical areas areincluded and they are portrayed as hierarchical based on Wallace etal. (1991). Most of the connections and their strengths are based onWiner (1990). Exquisite detail about thalamo^cortical connections isprovided in Huang and Winer (2000). Distinctions are made be-tween the lemniscal pathway (black) which originates in the ICC,and the lemniscal adjunct pathways (fat gray) which originate in theICX. In addition there is a di¡use modulating pathway (thin graylines) that projects to the super¢cial layer of all known corticalareas. The lemniscal pathway projects to the MGBv and the posteri-or group of nuclei in the thalamus (PO), and from there to AI,AAF and parts of ¢eld P. The lemniscal adjunct pathway projectsto AII, P and VP. Strong reciprocal (e¡erent) connections exist be-tween all cortical areas (fat red lines) and between cortical areasand subdivisions of the thalamus, as well as with the IC but exclud-ing the ICC. Weaker e¡erent connections exist between AI andICC.

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^4214

  • What does the `new' supra-collicular processing net-work add to the representation of sound? How di¡erentis the sensory processing in this supra-collicular circuitfrom those in the visual and somatosensory modalities?Integration of sensory information with cognitive as-pects is a likely computational outcome and imposesdemands that are di¡erent from the topographic com-putational maps found in brainstem and midbrain thatprovide for fast processing. For instance, auditoryspace is represented in a clustered format (assemblies)in the forebrain (Cohen and Knudsen, 1999). The fore-brain is likely to be essential for the selection of oneauditory stimulus out of several possible stimuli, for theidenti¢cation of sounds, and for remembering the loca-tion of stimuli that were heard in the recent past.

    Clustering of neurons with similar response proper-ties is a ubiquitous feature of functional organizationthroughout the CNS. One possibility is that clusteredorganizations result from competitive interactionsamong multiple, independent parameters, all competingfor representational space in the same area, causing therepresentation of any single parameter to become se-verely disrupted. High-order maps (maps of higher or-der stimulus properties), e.g. those proposed for vowels(Ohl and Scheich, 1997) or phonemes, might be ex-pected in areas of the brain that process aspects ofperception or behavior for which the values of a param-eter are the essence of the analysis. Examples are risetime or direction and rate of FM in animal vocaliza-tions (Mendelson et al., 1993).

    The most salient properties of cortical neurons are:(1) that adjacent neurons generally ¢re independentlyunder spontaneous conditions (Eggermont, 1992a,b,c)and slightly more synchronized under stimulation (Eg-germont, 1994a,b). (2) That cortico^cortical synapsesare generally very weak and that the same neuronmay be involved in several di¡erent processes (Abeles,1982). (3) That cortical neurons can, under the appro-priate conditions, engage in higher processes for a peri-od of over 1 s and patterns of activity across severalunits have the ability to maintain strict timing evenafter delays of hundreds of ms. These properties sup-port the view that the computing element in the cortexis a neuron population in which the activity of anygiven cell has very little impact (Abeles, 1988). An ex-ception to this view has to be made for burst ¢ring; aburst of only two or three spikes in an individual neu-ron could be su¤cient to ¢re the receiving neuron insome brain areas (Lisman, 1997).

    At least seven cortical areas in the cat have a repre-sentation of the cochlea and there are potentially sevenadditional auditory areas (Winer, 1990; Huang andWiner, 2000). For six auditory cortical areas in cat,AI, anterior auditory ¢eld (AAF), AII, V, ventral pos-terior auditory ¢eld (VP), and P, information is avail-

    able about the mapping of CF on the spatial extent ofthe area (Reale and Imig, 1980). Only ¢ve of thoseareas are indicated in Fig. 3. Thus, for AI a gradualshift from low to high CF is found from the caudal tothe rostral end (Merzenich et al., 1975), that for CFs ofabout 35 kHz reverses direction, indicating that AAF isreached. At the low-frequency boundary of AI thefrequency gradient also reverses by the entrance into¢eld P.

    AI in cat is locally very homogeneous: at a givenrecording site, the CF, threshold at CF, FTC band-width and minimum latency of the separated singleunits were very similar (Eggermont, 1996). AI in catis not spatially homogeneous, in the dorso^ventral di-rection, i.e. along the isofrequency sheets the sharpnessof tuning is greatest in the medial part and increasestowards both boundaries (Schreiner and Mendelson,1990; Schreiner and Sutter, 1992; Schreiner et al.,2000). This was also found using ripple spectra, broadband signals sinusoidally modulated on a logarithmicfrequency scale (Schreiner and Calhoun, 1994; Shammaet al., 1995) that are considered to be the buildingblocks of all complex sounds (Shamma, 1996; Wangand Shamma, 1995). The maps of characteristic ripplefrequency in ferret AI (in cycles/octave, 60) exhibitedtwo trends (Versnel et al., 1995). First, along the iso-frequency planes, the largest values were grouped inone or two clusters near the middle of AI, with smallervalues found towards the edges. Second, along the to-notopic axis, the maximum 60 in an isofrequency rangeincreased with increasing CF. FTC bandwidth, whichwas inversely correlated with 60, exhibited similar dis-tributions. The maps of the characteristic phase (mea-sured in radians relative to the phase of a sinewavestarting at the low-frequency edge of the complex, x0)also showed a clustering along the isofrequency axis. Atthe center of AI symmetrical responses (x0W0) pre-dominated. Toward the edges, the response ¢elds(RFs) became more asymmetrical with x0 6 0 caudallyand x0 s 0 rostrally. The asymmetrical RFs tended tocluster along repeated bands that paralleled the tono-topic axis. The FM directional sensitivity (DS) tendedto have similar trends along the isofrequency axis as x0.The ¢ndings suggest that AI cells can, in principle,function as ripple band-pass ¢lters, analyzing an inputspectral pro¢le into separate channels tuned arounddi¡erent characteristic ripple frequencies. Equivalently,from the perspective of their response area bandwidths,they can be said to have a range of bandwidths so as toanalyze the input spectral pro¢le into di¡erent scales(Shamma, 1996).

    He et al. (1997) have suggested that the extreme dor-sal part of AI forms a separate cortical area, specializedby its capacity for temporal integration and tuning tostimulus duration. This area also features long latencies

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^42 15

  • and broad multi-peaked FTCs (He and Hashikawa,1998; Sutter and Schreiner, 1991). The FTCs in thedorsal part had a lateral inhibitory structure that wasdistinct of that in ventral AI: speci¢cally lateral sup-pression areas £anking the excitatory tuning curve onboth sides were much less common (Sutter et al., 1999;Versnel et al., 1995). On this basis, Sutter et al. (1999)suggested that the dorsal part of AI is involved inanalyzing complex spectra, whereas the ventral partwould be poorly responsive to such broad band stimuli.In AAF, the high frequencies are located dorso^cau-dally and lower frequencies are found into the rostro^ventral direction (Knight, 1977; Phillips and Irvine,1982).

    AII in cat appeared not as well organized tonotopi-cally as AI and the units showed broader frequencytuning and higher thresholds to tone burst stimulationthan in AI (Schreiner and Cynader, 1984). Recently, adouble representation of the cochlea within the classicalboundaries of AII in cat was demonstrated (Volkov andGalazyuk, 1991). One tonotopic representation wasfound in the dorso^caudal region (2.6^3.2 mm long)with a spatial orientation similar to that described inAI (low frequencies caudal, high frequencies rostral),the second ventro^rostral region was smaller (1.4^2.5 mm) and had the opposite tonotopic orientation(i.e. as in AAF). The dorso^caudal region may be closeto the transition zone between AI and AII as de¢ned bySchreiner and Cynader (1984). Alternatively it could bethe ventral auditory cortex (Ve) as de¢ned by Huangand Winer (2000). Thus AII, not unlike AI, may consistof several specialized subregions.

    The posterior ¢eld of cat auditory cortex (P) is alsocharacterized by a tonotopic organization, narrow V-shaped tuning curves and a dominance of non-mono-tonic rate intensity functions for tone pips with shortrise times. The latencies are generally longer than in AI,the accuracy of ¢rst-spike latencies is likewise poorer(Heil and Irvine, 1998b). Thresholds are invariablyhigher than in AI (Reale and Imig, 1980; Phillips andOrman, 1984; Phillips et al., 1995). Field VP, caudalfrom ¢eld P, also has a tonotopic organization as has¢eld V found caudally from AII and bordering the pos-terior ectosylvian sulcus (Reale and Imig, 1980). Non-tonotopically organized areas are ¢eld DP dorsal fromAI and bordering the anterior ectosylvian gyrus, ¢eld Trostro^medial from AII (He and Hashikawa, 1998;Reale and Imig, 1980), and ¢eld AIII medial fromAAF (Winer, 1990).

    At supra-threshold levels, the well organized topo-graphic map in AI broadens in the direction of higherCFs and partially breaks down along the isofrequencycontour to acquire a patchy appearance (Phillips et al.,1994). This has been attributed to the presence of alter-nating aggregations of monotonic and non-monotonic

    units along the isofrequency contours. Heil and Irvine(1998a) show that in cat AI there are orderly topo-graphic organizations, along the isofrequency axis, ofseveral neuronal properties related to the coding ofthe intensity of tones. These are minimum threshold,dynamic range, best SPL, and non-monotonicity ofspike count ^ intensity functions to tones of CF. Mini-mum threshold, dynamic range, and best SPL are cor-related and alter periodically along isofrequency strips(see also Schreiner et al., 2000). The steepness of thehigh intensity descending slope of spike count^intensityfunctions also varies systematically, with steepest slopesoccurring in the regions along the isofrequency stripwhere low thresholds, narrow dynamic range and lowbest SPLs are found. As a consequence, CF tones ofvarious intensities are represented by orderly and, formost intensities, periodic, spatial patterns of distributedneuronal activity along an isofrequency strip. For lowto moderate intensities, the mean relative activity alongan entire isofrequency strip increases rapidly with inten-sity, with the spatial pattern of activity remaining quiteconstant along the strip. At higher intensities, however,the mean relative activity along the strip remains fairlyconstant with changes in intensity, but the spatial pat-terns change markedly. As a consequence of these ef-fects, low and high intensity tones are represented bycomplementary distributions of activity alternatingalong an isofrequency strip. It was concluded that inAI tone intensity is represented by two complementarymodes, viz. discharge rate and place. Besides that,sharpness of tuning, response strength (combinationof threshold and dynamic range) and temporal changesof stimulus spectrum (FM rate and DS) show independ-ent topographical organizations within the isofrequencycontours and this suggests parallel and independentprocessing of these acoustic aspects in AI (Heil et al.,1992b).

    Thalamo^cortical and intrinsic cortical connectionpatterns of AI in cat indicate a modular organization(Wallace et al., 1991; Clarke et al., 1993; Schreiner etal., 2000; Huang and Winer, 2000). Single thalamo^cor-tical ¢bers branch and terminate in patches in layers II^VI. Individual neurons in MGBv can be double-labeledwhen two tracers are injected at di¡erent dorso^ventralpositions along the isofrequency axis. Thus, spatiallysegregated subsets of neurons along the isofrequencyaxis of AI share a common input from the thalamus.It is possible that subregions of AI that receive commonthalamic input also mutually innervate each other viahorizontal connections in layer III. Thalamo^corticalpatches in rabbits and cats are segregated by approx-imately 1.5 mm. This agrees with the ¢nding that neu-ron pairs 0.2^1.0 mm apart tend to have less interac-tions than pairs 1.0^2.0 mm apart (Clarke et al., 1993;Eggermont, 1993b). Adjacent intrinsic clusters in the

    HEARES 3674 12-7-01 Cyaan Magenta Geel Zwart

    J.J. Eggermont / Hearing Research 157 (2001) 1^4216

  • narrowly tuned region of AI in cats are segregated atspatial intervals of the same range. Long-range intrinsiccortical connections in AI occur between clusters ofneurons with similar CFs in an elongated patchy pat-tern that follows, and is con¢ned to, the dorso^ventralisofrequency axis. A fundamentally di¡erent and morecomplex pattern, including a large degree of frequencyconvergence from patches outside the isofrequency axis,was observed with injections into the broad band sub-regions of dorsal and ventral AI. The data provideanatomical evidence for at least two spatially segregatedsystems of spectral integration in the AI that may berelated to the distinction between CB and non-CB in-tegration behavior.

    In behaving macaque monkeys, a tonotopic organi-zation exists for the AI but not for the surrounding(belt and parabelt) areas. Topographic organization ofother response parameters, previously demonstrated inthe anesthetized cat, was not apparent in the behavingmonkey (Recanzone et al., 2000a). Some evidence for ahierarchical processing of neural activity related tosound localization, proposed by Rauschecker (1998)as part of a `where' and `what' pathway segregation(Rauschecker and Tian, 2000), was found between AIand area CM in behaving macaques (Recanzone et al.,2000b; Romanski et al., 1999). In general, whereas incat most cortical ¢elds are directly innervated by divi-sions of the MGB (Huang and Winer, 2000), the func-tional connection pattern in primates appears to bemore hierarchical. The input from the ventral divisionof the MGB activates the three core areas of auditorycortex, these activate the numerous belt areas, which inturn activate the parabelt areas (Kaas and Hackett,2000). Although there are direct projections from dorsaland medial MGB to the belt areas in primates, theseappear to modulate rather than activate. The almostcomplete interconnectivity pattern of auditory corticalareas in cat is not present in primates; there are nodirect connections from the core to the parabelt areas(Kaas and Hackett, 2000).

    7.3. E¡erent connections introduce non-linear dynamicsinto the auditory system

    The auditory system is not just an a¡erent projectionsystem but has a myriad of e¡erent connections thatmakes it a reentrant system characterized by multiple,loosely interconnected, regional feedback loops (Span-gler and Warr, 1991). At the lowest level, a loop be-tween cochlea and the SOC comprising the olivoco-chlear bundle exists. A second loop is found betweenthe lower brainstem nuclei and the IC. A third loop isformed between the IC and the thalamo^cortical systemwhich, in itself, consists of a feedback loop betweenthalamus and cortex. In Fig. 3, e¡erent connections

    are shown in red. More speci¢cally, the auditory cortexprojects back to the MGB with 10 times as many ¢bersthan the number of a¡erents from the MGB to auditorycortex. The auditory cortex also connects with the IC,but with exclusion of the central nucleus (Winer, 1990).The central and external IC subnuclei both project backto the DCN. The DCN in turn feeds back to the VCNs.It seems that the strongest contiguous projections fromcortex to the periphery involve the nuclei of the extra-lemniscal pathway, including the DCN.

    The olivocochlear bundle projects via its medialbranch to the outer hair cells, thus regulating the slowmotility of the outer hair cells and thereby the sti¡nessof the basilar membrane. Via its lateral branch it espe-cially a¡ects the low spontaneously active AN ¢berssynapsing with the IHCs (Spangler and Warr, 1991;Guinan, 1996). Activation of the olivocochlear bundleappears to improve the discriminability of signals, suchas speech, in the presence of broad band noise by low-ering the noise-induced background ¢ring rate and in-creasing the dynamic range (Winslow and Sachs, 1987;Liberman, 1988). Chronic cochlear de-e¡erentation inadult chinchillas resulted in reduced spontaneous AN¢ber ¢ring rates, increased driven discharge rates, de-creased dynamic range, increased onset to steady-statedischarge rate, and hypersensitive tails of the frequencytuning curves (Zheng et al., 1999). Clearly the e¡erentsystem plays a large role in maintaining the normaloperating mode of the cochlea.

    The role of this nested set of reentrant systems canonly be speculated upon. One possible role is an in-volvement in expectancy. A subject has some expect-ancy about the probability of various environmentaloccurrences stored in its internal representation and in