analyzing neural responses to natural signals: maximally...

28
ARTICLE Communicated by Pamela Reinagel Analyzing Neural Responses to Natural Signals: Maximally Informative Dimensions Tatyana Sharpee [email protected] Sloan–Swartz Center for Theoretical Neurobiology and Department of Physiology, University of California at San Francisco, San Francisco, CA 94143, U.S.A. Nicole C. Rust [email protected] Center for Neural Science, New York University, New York, NY 10003, U.S.A. William Bialek [email protected] Department of Physics, Princeton University, Princeton, NJ 08544, U.S.A., and Sloan–Swartz Center for Theoretical Neurobiology and Department of Physiology, University of California at San Francisco, San Francisco, CA 94143, U.S.A. We propose a method that allows for a rigorous statistical analysis of neu- ral responses to natural stimuli that are nongaussian and exhibit strong correlations. We have in mind a model in which neurons are selective for a small number of stimulus dimensions out of a high-dimensional stimulus space, but within this subspace the responses can be arbitrarily nonlin- ear. Existing analysis methods are based on correlation functions between stimuli and responses, but these methods are guaranteed to work only in the case of gaussian stimulus ensembles. As an alternative to correlation functions, we maximize the mutual information between the neural re- sponses and projections of the stimulus onto low-dimensional subspaces. The procedure can be done iteratively by increasing the dimensionality of this subspace. Those dimensions that allow the recovery of all of the information between spikes and the full unprojected stimuli describe the relevant subspace. If the dimensionality of the relevant subspace indeed is small, it becomes feasible to map the neuron’s input-output function even under fully natural stimulus conditions. These ideas are illustrated in simulations on model visual and auditory neurons responding to nat- ural scenes and sounds, respectively. Neural Computation 16, 223–250 (2004) c ° 2003 Massachusetts Institute of Technology

Upload: others

Post on 15-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

ARTICLE Communicated by Pamela Reinagel

Analyzing Neural Responses to Natural SignalsMaximally Informative Dimensions

Tatyana SharpeesharpeephyucsfeduSloanndashSwartz Center for Theoretical Neurobiology and Department of PhysiologyUniversity of California at San Francisco San Francisco CA 94143 USA

Nicole C RustrustcnsnyueduCenter for Neural Science New York University New York NY 10003 USA

William BialekwbialekprincetoneduDepartment of Physics Princeton University Princeton NJ 08544 USA andSloanndashSwartz Center for Theoretical Neurobiology and Department of PhysiologyUniversity of California at San Francisco San Francisco CA 94143 USA

We propose a method that allows for a rigorous statistical analysis of neu-ral responses to natural stimuli that are nongaussian and exhibit strongcorrelations We have in mind a model in which neurons are selective for asmall number of stimulus dimensions out of a high-dimensional stimulusspace but within this subspace the responses can be arbitrarily nonlin-ear Existing analysis methods are based on correlation functions betweenstimuli and responses but these methods are guaranteed to work only inthe case of gaussian stimulus ensembles As an alternative to correlationfunctions we maximize the mutual information between the neural re-sponses and projections of the stimulus onto low-dimensional subspacesThe procedure can be done iteratively by increasing the dimensionalityof this subspace Those dimensions that allow the recovery of all of theinformation between spikes and the full unprojected stimuli describe therelevant subspace If the dimensionality of the relevant subspace indeedis small it becomes feasible to map the neuronrsquos input-output functioneven under fully natural stimulus conditions These ideas are illustratedin simulations on model visual and auditory neurons responding to nat-ural scenes and sounds respectively

Neural Computation 16 223ndash250 (2004) cdeg 2003 Massachusetts Institute of Technology

224 T Sharpee N Rust and W Bialek

1 Introduction

From olfaction to vision and audition a growing number of experiments areexamining the responses of sensory neurons to natural stimuli (Creutzfeldtamp Northdurft 1978 Rieke Bodnar amp Bialek 1995 Baddeley et al 1997Stanley Li amp Dan 1999 Theunissen Sen amp Doupe 2000 Vinje amp Gallant2000 2002 Lewen Bialek amp de Ruyter van Steveninck 2001 Sen The-unissen amp Doupe 2001 Vickers Christensen Baker amp Hildebrand 2001Ringach Hawken amp Shapley 2002 Weliky Fiser Hunt amp Wagner 2003Rolls Aggelopoulos amp Zheng 2003 Smyth WillmoreBaker Thompson ampTolhurst 2003) Observing the full dynamic range of neural responses mayrequire using stimulus ensembles that approximate those occurring in na-ture (Rieke Warland de Ruyter van Steveninck amp Bialek 1997 Simoncelliamp Olshausen 2001) and it is an attractive hypothesis that the neural repre-sentation of these natural signals may be optimized in some way (Barlow1961 2001 von der Twer amp Macleod 2001 Bialek 2002) Many neuronsexhibit strongly nonlinear and adaptive responses that are unlikely to bepredicted from a combination of responses to simple stimuli for exampleneurons have been shown to adapt to the distribution of sensory inputs sothat any characterization of these responses will depend on context (Smir-nakis Berry Warland Bialek amp Meister 1996 Brenner Bialek amp de Ruytervan Steveninck 2000 Fairhall Lewen Bialek amp de Ruyter van Steveninck2001) Finally the variability of a neuronrsquos responses decreases substan-tially when complex dynamical rather than static stimuli are used (Mainenamp Sejnowski 1995 de Ruyter van Steveninck Lewen Strong Koberle ampBialek 1997 Kara Reinagel amp Reid 2000 de Ruyter van Steveninck Borstamp Bialek 2000) All of these arguments point to the need for general toolsto analyze the neural responses to complex naturalistic inputs

The stimuli analyzed by sensory neurons are intrinsically high-dimen-sional with dimensions D raquo 102 iexcl 103 For example in the case of visualneurons the input is commonly specied as light intensity on a grid of atleast 10pound10 pixels Each of the presented stimuli can be described as a vectors in this high-dimensional stimulus space (see Figure 1) The dimensionalitybecomes even larger if stimulus history has to be considered as well Forexample if we are interested in how the past N frames of the movie affectthe probability of a spike then the stimulus s being a concatenation of thepast N samples will have dimensionality N times that of a single frameWe also assume that the probability distribution Ps is sampled during anexperiment ergodically so that we can exchange averages over time withaverages over the true distribution as needed

Although direct exploration of a D raquo 102 iexcl 103-dimensional stimulusspace is beyond the constraints of experimental data collection progress canbe made provided we make certain assumptions about how the responsehas been generated In the simplest model the probability of response canbe described by one receptive eld (RF) or linear lter (Rieke et al 1997)

Analyzing Neural Responses to Natural Signals 225

Figure 1 Schematic illustration of a model with a onendashdimensional relevantsubspace Oe1 is the relevant dimension and Oe2 and Oe3 are irrelevant ones Shownare three example stimuli s s0 and s00 the receptive eld of a model neuronmdashtherelevant dimension Oe1 and our guess v for the relevant dimension Probabilitiesof a spike PspikejscentOe1 and Pspikejscentv are calculatedbyrst projecting all of thestimuli s onto each of the two vectors Oe1 and v respectively and then applyingequations (23 12 11) sequentially Our guess v for the relevant dimension isadjusted during the progress of the algorithm in such a way as to maximize Iv

of equation (25) which makes vector v approach the true relevant dimensionOe1

The RF can be thought of as a template or special direction Oe1 in the stimulusspace1 such that the neuronrsquos response depends on only a projection of agiven stimulus s onto Oe1 although the dependence of the response on this

1 The notation Oe denotes a unit vector since we are interested only in the direction thevector species and not in its length

226 T Sharpee N Rust and W Bialek

projection can be strongly nonlinear (cf Figure 1) In this simple modelthe reverse correlation method (de Boer amp Kuyper 1968 Rieke et al 1997Chichilnisky 2001) can be used to recover the vector Oe1 by analyzing theneuronrsquos responses to gaussian white noise In a more general case theprobability of the response depends on projections si D Oei cent s of the stimuluss on a set of K vectors fOe1 Oe2 OeKg

Pspikejs D Pspike f s1 s2 sK (11)

where Pspikejs is the probability of a spike given a stimulus s and Pspike

is the average ring rate In what follows we will call the subspace spannedby the set of vectors fOe1 Oe2 OeKg the relevant subspace (RS)2 We reiteratethat vectors fOeig 1 middot i middot K may also describe how the time dependence of thestimulus s affects the probability of a spike An example of such a relevantdimension would be a spatiotemporal RF of a visual neuron Although theideas developed below can be used to analyze input-output functions fwith respect to different neural responses such as patterns of spikes in time(de Ruyter van Steveninck amp Bialek 1988 Brenner Strong Koberle Bialekamp de Ruyter van Steveninck 2000 Reinagel amp Reid 2000) for illustrationpurposes we choose a single spike as the response of interest3

Equation 11 in itself is not yet a simplication if the dimensionality K ofthe RS is equal to the dimensionality D of the stimulus space In this articlewe will assume that the neuronrsquos ring is sensitive only to a small numberof stimulus features (K iquest D) While the general idea of searching for low-dimensional structure in high-dimensional data is very old our motivationhere comes from work on the y visual system where it was shown explic-

2 Since the analysis does not depend on a particular choice of a basis within the fullD-dimensional stimulus space for clarity we choose the basis in which the rst K basisvectors span the relevant subspace and the remaining D iexcl K vectors span the irrelevantsubspace

3 We emphasize that our focus here on single spikes is not equivalent to assumingthat the spike train is a Poisson process modulated by the stimulus No matter what thestatistical structure of the spike train is we always can ask what features of the stimulusare relevant for setting the probability of generating a single spike at one moment in timeFrom an information-theoretic point of view asking for stimulus features that capturethe mutual information between the stimulus and the arrival times of single spikes isa well-posed question even if successive spikes do not carry independent informationNote also that spikesrsquo carrying independent information is not the same as spikesrsquo beinggenerated as a Poisson process On the other hand if (for example) different temporalpatterns of spikes carry information about different stimulus features then analysis ofsingle spikes will result in a relevant subspace of artifactually high dimensionality Thusit is important that the approach discussed here carries over without modication to theanalysis of relevant dimensions for the generation of any discrete event such as a patternof spikes across time in one cell or synchronous spikes across a population of cells Fora related discussion of relevant dimensions and spike patterns using covariance matrixmethods see de Ruyter van Steveninck and Bialek (1988) and Aguera y Arcas Fairhalland Bialek (2003)

Analyzing Neural Responses to Natural Signals 227

itly that patterns of action potentials in identied motion-sensitive neuronsare correlated with low-dimensional projections of the high-dimensional vi-sual input (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Bialek amp de Ruyter van Steveninck 2003) The input-output functionf in equation 11 can be strongly nonlinear but it is presumed to dependon only a small number of projections This assumption appears to be lessstringent than that of approximate linearity which one makes when char-acterizing a neuronrsquos response in terms of Wiener kernels (see eg thediscussion in section 213 of Rieke et al 1997) The most difcult part in re-constructing the input-output function is to nd the RS Note that for K gt 1a description in terms of any linear combination of vectors fOe1 Oe2 OeKgis just as valid since we did not make any assumptions as to a particularform of the nonlinear function f

Once the relevant subspace is known the probability Pspikejs becomesa function of only a few parameters and it becomes feasible to map thisfunction experimentally inverting the probability distributions accordingto Bayesrsquo rule

f s1 s2 sK D Ps1 s2 sKjspike

Ps1 s2 sK (12)

If stimuli are chosen from a correlated gaussian noise ensemble then theneural response can be characterized by the spike-triggered covariancemethod (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Schwartz Chichilnisky amp Simoncelli2002 Touryan Lau amp Dan 2002Bialek amp de Ruyter van Steveninck 2003) It can be shown that the dimen-sionality of the RS is equal to the number of nonzero eigenvalues of a matrixgiven by a difference between covariance matrices of all presented stimuliand stimuli conditional on a spike Moreover the RS is spanned by theeigenvectors associated with the nonzero eigenvalues multiplied by the in-verse of the a priori covariance matrix Compared to the reverse correlationmethod we are no longer limited to nding only one of the relevant dimen-sions fOeig 1 middot i middot K Both the reverse correlation and the spikendashtriggeredcovariance method however give rigorously interpretable results only forgaussian distributions of inputs

In this article we investigate whether it is possible to lift the requirementfor stimuli to be gaussian When using natural stimuli which certainly arenongaussian the RS cannot be found by the spike-triggered covariancemethod Similarly the reverse correlation method does not give the correctRF even in the simplest case where the input-output function in equation 11depends on only one projection (see appendix A for a discussion of thispoint) However vectors that span the RS are clearly special directions inthe stimulus space independent of assumptions about Ps This notion canbe quantied by the Shannon information We note that the stimuli s do nothave to lie on a low-dimensional manifold within the overall D-dimensional

228 T Sharpee N Rust and W Bialek

space4 However since we assume that the neuronrsquos input-output functiondepends on a small number of relevant dimensions the ensemble of stimuliconditional on a spike may exhibit clear clustering This makes the proposedmethod of looking for the RS complementary to the clustering of stimuliconditional on a spike done in the information bottleneck method (TishbyPereira amp Bialek 1999 see also Dimitrov amp Miller 2001) Noninformation-based measures of similarity between probability distributions Ps andPsjspike have also been proposed to nd the RS (Paninski 2003a)

To summarize our assumptions

sup2 The sampling of the probability distribution of stimuli Ps is ergodicand stationary across repetitions The probability distribution is notassumed to be gaussian The ensemble of stimuli described by Ps

does not have to lie on a low-dimensional manifold embedded in theoverall D-dimensional space

sup2 We choose a single spike as the response of interest (for illustrationpurposes only) An identical scheme can be applied for example toparticular interspike intervals or to synchronous spikes from a pair ofneurons

sup2 The subspace relevant for generating a spike is low dimensional andEuclidean (cf equation 11)

sup2 The input-output function which is dened within the low-dimen-sional RS can be arbitrarily nonlinear It is obtained experimentally bysampling the probability distributions Ps and Psjspike within theRS

The article is organized as follows In section 2 we discuss how an opti-mization problem can be formulated to nd the RS A particular algorithmused to implement the optimization scheme is described in section 3 In sec-tion 4 we illustrate how the optimization scheme works with natural stimulifor model orientation-sensitive cells with one and two relevant dimensionsmuch like simple and complex cells found in primary visual cortex as wellas for a model auditory neuron responding to natural sounds We also dis-cuss the convergence of our estimates of the RS as a function of data set sizeWe emphasize that our optimization scheme does not rely on any specicstatistical properties of the stimulus ensemble and thus can be used withnatural stimuli

4 If one suspects that neurons are sensitive to low-dimensional features of their inputone might be tempted to analyze neural responses to stimuli that explore only the (puta-tive) relevant subspace perhaps along the line of the subspace reverse correlation method(Ringach et al 1997) Our approach (like the spike-triggered covariance approach) is dif-ferent because it allows the analysis of responses to stimuli that live in the full space andinstead we let the neuron ldquotell usrdquo which low-dimensional subspace is relevant

Analyzing Neural Responses to Natural Signals 229

2 Information as an Objective Function

When analyzing neural responses we compare the a priori probability dis-tribution of all presented stimuli with the probability distribution of stimulithat lead to a spike (de Ruyter van Steveninck amp Bialek 1988) For gaus-sian signals the probability distribution can be characterized by its secondmoment the covariance matrix However an ensemble of natural stimuli isnot gaussian so that in a general case neither second nor any nite num-ber of moments is sufcient to describe the probability distribution In thissituation Shannon information provides a rigorous way of comparing twoprobability distributions The average information carried by the arrivaltime of one spike is given by Brenner Strong et al (2000)

Ispike DZ

dsPsjspike log2

microPsjspike

Ps

para (21)

where ds denotes integration over full Dndashdimensional stimulus space Theinformation per spike as written in equation 21 is difcult to estimate exper-imentally since it requires either sampling of the high-dimensional proba-bility distribution Psjspike or a model of how spikes were generated thatis the knowledge of low-dimensional RS However it is possible to calculateIspike in a model-independent way if stimuli are presented multiple timesto estimate the probability distribution Pspikejs Then

Ispike Dfrac12

Pspikejs

Pspikelog2

microPspikejs

Pspike

parafrac34

s (22)

where the average is taken over all presented stimuli This can be useful inpractice (Brenner Strong et al 2000) because we can replace the ensembleaverage his with a time average and Pspikejs with the time-dependentspike rate rt Note that for a nite data set of N repetitions the obtainedvalue IspikeN will be on average larger than Ispike1 The true value Ispikecan be found by either subtracting an expected bias value which is of theorder of raquo 1=PspikeN 2 ln 2 (Treves amp Panzeri 1995 Panzeri amp Treves1996 Pola Schultz Petersen amp Panzeri 2002 Paninski 2003b) or extrapo-lating to N 1 (Brenner Strong et al 2000 Strong Koberle de Ruytervan Steveninck amp Bialek 1998) Measurement of Ispike in this way providesa model independent benchmark against which we can compare any de-scription of the neuronrsquos input-output relation

Our assumption is that spikes are generated according to a projectiononto a low-dimensional subspace Therefore to characterize relevance of aparticular direction v in the stimulus space we project all of the presentedstimuli onto v and form probability distributions Pvx and Pvxjspike ofprojection values x for the a priori stimulus ensemble and that conditional

230 T Sharpee N Rust and W Bialek

on a spike respectively

Pvx D hplusmnx iexcl s cent vis (23)

Pvxjspike D hplusmnx iexcl s cent vjspikeis (24)

where plusmnx is a delta function In practice both the average hcent cent centis acuteR

ds cent cent centPs over the a priori stimulus ensemble and the average hcent cent cent jspikeisacute

Rds cent cent cent Psjspike over the ensemble conditional on a spike are calculated

by binning the range of projection values x The probability distributionsare then obtained as histograms normalized in a such a way that the sumover all bins gives 1 The mutual information between spike arrival timesand the projection x by analogy with equation 21 is

Iv DZ

dxPvxjspike log2

microPvxjspike

Pvx

para (25)

which is also the Kullback-Leibler divergence D[PvxjspikejjPvx] Noticethat this information is a function of the direction v The information Iv

provides an invariant measure of how much the occurrence of a spike isdetermined by projection on the direction v It is a function only of directionin the stimulus space and does not change when vector v is multiplied bya constant This can be seen by noting that for any probability distributionand any constant c Pcvx D ciexcl1Pvx=c (see also theorem 964 of Cover ampThomas 1991) When evaluated along any vector v Iv middot Ispike The totalinformation Ispike can be recovered along one particular direction only ifv D Oe1 and only if the RS is one-dimensional

By analogy with equation 25 one could also calculate information Iv1

vn along a set of several directions fv1 vng based on the multipointprobability distributions of projection values x1 x2 xn along vectors v1v2 vn of interest

Pv1vn fxigjspike D

nY

iD1

plusmnxi iexcl s cent vijspike

+

s

(26)

Pv1vn fxig D

nY

iD1

plusmnxi iexcl s cent vi

+

s

(27)

If we are successful in nding all of the directions fOeig 1 middot i middot K con-tributing to the input-output relation equation 11 then the informationevaluated in this subspace will be equal to the total information Ispike Whenwe calculate information along a set of K vectors that are slightly off fromthe RS the answer of course is smaller than Ispike and is initially quadraticin small deviations plusmnvi One can therefore hope to nd the RS by maximizinginformation with respect to K vectors simultaneously The information does

Analyzing Neural Responses to Natural Signals 231

not increase if more vectors outside the RS are included For uncorrelatedstimuli any vector or a set of vectors that maximizes Iv belongs to theRS On the other hand as discussed in appendix B the result of optimiza-tion with respect to a number of vectors k lt K may deviate from the RS ifstimuli are correlated To nd the RS we rst maximize Iv and comparethis maximum with Ispike which is estimated according to equation 22 Ifthe difference exceeds that expected from nite sampling corrections weincrement the number of directions with respect to which information issimultaneously maximized

3 Optimization Algorithm

In this section we describe a particular algorithm we used to look for themost informative dimensions in order to nd the relevant subspace Wemake no claim that our choice of the algorithm is most efcient However itdoes give reproducible results for different starting points and spike trainswith differences taken to simulate neural noise Overall choices for an algo-rithm are broader because the information Iv as dened by equation 25is a continuous function whose gradient can be computed We nd (seeappendix C for a derivation)

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para (31)

where

hsjx spikei D1

Pxjspike

Zds splusmnx iexcl s cent vPsjspike (32)

and similarly for hsjxi Since information does not change with the length ofthe vector we have v centrvI D 0 as also can be seen directly from equation 31

As an optimization algorithm we have used a combination of gradientascent and simulated annealing algorithms successive line maximizationswere done along the direction of the gradient (Press Teukolsky Vetterlingamp Flannery 1992) During line maximizations a point with a smaller valueof information was accepted according to Boltzmann statistics with proba-bility exp[IviC1 iexcl Ivi=T] The effective temperature T is reduced byfactor of 1 iexcl sup2T upon completion of each line maximization Parameters ofthe simulated annealing algorithm to be adjusted are the starting tempera-ture T0 and the cooling rate sup2T 1T D iexclsup2TT When maximizing with respectto one vector we used values T0 D 1 and sup2T D 005 When maximizing withrespect to two vectors we either used the cooling schedule with sup2T D 0005and repeated it several times (four times in our case) or allowed the effec-tive temperature T to increase by a factor of 10 upon convergence to a localmaximum (keeping T middot T0 always) while limiting the total number of linemaximizations

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 2: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

224 T Sharpee N Rust and W Bialek

1 Introduction

From olfaction to vision and audition a growing number of experiments areexamining the responses of sensory neurons to natural stimuli (Creutzfeldtamp Northdurft 1978 Rieke Bodnar amp Bialek 1995 Baddeley et al 1997Stanley Li amp Dan 1999 Theunissen Sen amp Doupe 2000 Vinje amp Gallant2000 2002 Lewen Bialek amp de Ruyter van Steveninck 2001 Sen The-unissen amp Doupe 2001 Vickers Christensen Baker amp Hildebrand 2001Ringach Hawken amp Shapley 2002 Weliky Fiser Hunt amp Wagner 2003Rolls Aggelopoulos amp Zheng 2003 Smyth WillmoreBaker Thompson ampTolhurst 2003) Observing the full dynamic range of neural responses mayrequire using stimulus ensembles that approximate those occurring in na-ture (Rieke Warland de Ruyter van Steveninck amp Bialek 1997 Simoncelliamp Olshausen 2001) and it is an attractive hypothesis that the neural repre-sentation of these natural signals may be optimized in some way (Barlow1961 2001 von der Twer amp Macleod 2001 Bialek 2002) Many neuronsexhibit strongly nonlinear and adaptive responses that are unlikely to bepredicted from a combination of responses to simple stimuli for exampleneurons have been shown to adapt to the distribution of sensory inputs sothat any characterization of these responses will depend on context (Smir-nakis Berry Warland Bialek amp Meister 1996 Brenner Bialek amp de Ruytervan Steveninck 2000 Fairhall Lewen Bialek amp de Ruyter van Steveninck2001) Finally the variability of a neuronrsquos responses decreases substan-tially when complex dynamical rather than static stimuli are used (Mainenamp Sejnowski 1995 de Ruyter van Steveninck Lewen Strong Koberle ampBialek 1997 Kara Reinagel amp Reid 2000 de Ruyter van Steveninck Borstamp Bialek 2000) All of these arguments point to the need for general toolsto analyze the neural responses to complex naturalistic inputs

The stimuli analyzed by sensory neurons are intrinsically high-dimen-sional with dimensions D raquo 102 iexcl 103 For example in the case of visualneurons the input is commonly specied as light intensity on a grid of atleast 10pound10 pixels Each of the presented stimuli can be described as a vectors in this high-dimensional stimulus space (see Figure 1) The dimensionalitybecomes even larger if stimulus history has to be considered as well Forexample if we are interested in how the past N frames of the movie affectthe probability of a spike then the stimulus s being a concatenation of thepast N samples will have dimensionality N times that of a single frameWe also assume that the probability distribution Ps is sampled during anexperiment ergodically so that we can exchange averages over time withaverages over the true distribution as needed

Although direct exploration of a D raquo 102 iexcl 103-dimensional stimulusspace is beyond the constraints of experimental data collection progress canbe made provided we make certain assumptions about how the responsehas been generated In the simplest model the probability of response canbe described by one receptive eld (RF) or linear lter (Rieke et al 1997)

Analyzing Neural Responses to Natural Signals 225

Figure 1 Schematic illustration of a model with a onendashdimensional relevantsubspace Oe1 is the relevant dimension and Oe2 and Oe3 are irrelevant ones Shownare three example stimuli s s0 and s00 the receptive eld of a model neuronmdashtherelevant dimension Oe1 and our guess v for the relevant dimension Probabilitiesof a spike PspikejscentOe1 and Pspikejscentv are calculatedbyrst projecting all of thestimuli s onto each of the two vectors Oe1 and v respectively and then applyingequations (23 12 11) sequentially Our guess v for the relevant dimension isadjusted during the progress of the algorithm in such a way as to maximize Iv

of equation (25) which makes vector v approach the true relevant dimensionOe1

The RF can be thought of as a template or special direction Oe1 in the stimulusspace1 such that the neuronrsquos response depends on only a projection of agiven stimulus s onto Oe1 although the dependence of the response on this

1 The notation Oe denotes a unit vector since we are interested only in the direction thevector species and not in its length

226 T Sharpee N Rust and W Bialek

projection can be strongly nonlinear (cf Figure 1) In this simple modelthe reverse correlation method (de Boer amp Kuyper 1968 Rieke et al 1997Chichilnisky 2001) can be used to recover the vector Oe1 by analyzing theneuronrsquos responses to gaussian white noise In a more general case theprobability of the response depends on projections si D Oei cent s of the stimuluss on a set of K vectors fOe1 Oe2 OeKg

Pspikejs D Pspike f s1 s2 sK (11)

where Pspikejs is the probability of a spike given a stimulus s and Pspike

is the average ring rate In what follows we will call the subspace spannedby the set of vectors fOe1 Oe2 OeKg the relevant subspace (RS)2 We reiteratethat vectors fOeig 1 middot i middot K may also describe how the time dependence of thestimulus s affects the probability of a spike An example of such a relevantdimension would be a spatiotemporal RF of a visual neuron Although theideas developed below can be used to analyze input-output functions fwith respect to different neural responses such as patterns of spikes in time(de Ruyter van Steveninck amp Bialek 1988 Brenner Strong Koberle Bialekamp de Ruyter van Steveninck 2000 Reinagel amp Reid 2000) for illustrationpurposes we choose a single spike as the response of interest3

Equation 11 in itself is not yet a simplication if the dimensionality K ofthe RS is equal to the dimensionality D of the stimulus space In this articlewe will assume that the neuronrsquos ring is sensitive only to a small numberof stimulus features (K iquest D) While the general idea of searching for low-dimensional structure in high-dimensional data is very old our motivationhere comes from work on the y visual system where it was shown explic-

2 Since the analysis does not depend on a particular choice of a basis within the fullD-dimensional stimulus space for clarity we choose the basis in which the rst K basisvectors span the relevant subspace and the remaining D iexcl K vectors span the irrelevantsubspace

3 We emphasize that our focus here on single spikes is not equivalent to assumingthat the spike train is a Poisson process modulated by the stimulus No matter what thestatistical structure of the spike train is we always can ask what features of the stimulusare relevant for setting the probability of generating a single spike at one moment in timeFrom an information-theoretic point of view asking for stimulus features that capturethe mutual information between the stimulus and the arrival times of single spikes isa well-posed question even if successive spikes do not carry independent informationNote also that spikesrsquo carrying independent information is not the same as spikesrsquo beinggenerated as a Poisson process On the other hand if (for example) different temporalpatterns of spikes carry information about different stimulus features then analysis ofsingle spikes will result in a relevant subspace of artifactually high dimensionality Thusit is important that the approach discussed here carries over without modication to theanalysis of relevant dimensions for the generation of any discrete event such as a patternof spikes across time in one cell or synchronous spikes across a population of cells Fora related discussion of relevant dimensions and spike patterns using covariance matrixmethods see de Ruyter van Steveninck and Bialek (1988) and Aguera y Arcas Fairhalland Bialek (2003)

Analyzing Neural Responses to Natural Signals 227

itly that patterns of action potentials in identied motion-sensitive neuronsare correlated with low-dimensional projections of the high-dimensional vi-sual input (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Bialek amp de Ruyter van Steveninck 2003) The input-output functionf in equation 11 can be strongly nonlinear but it is presumed to dependon only a small number of projections This assumption appears to be lessstringent than that of approximate linearity which one makes when char-acterizing a neuronrsquos response in terms of Wiener kernels (see eg thediscussion in section 213 of Rieke et al 1997) The most difcult part in re-constructing the input-output function is to nd the RS Note that for K gt 1a description in terms of any linear combination of vectors fOe1 Oe2 OeKgis just as valid since we did not make any assumptions as to a particularform of the nonlinear function f

Once the relevant subspace is known the probability Pspikejs becomesa function of only a few parameters and it becomes feasible to map thisfunction experimentally inverting the probability distributions accordingto Bayesrsquo rule

f s1 s2 sK D Ps1 s2 sKjspike

Ps1 s2 sK (12)

If stimuli are chosen from a correlated gaussian noise ensemble then theneural response can be characterized by the spike-triggered covariancemethod (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Schwartz Chichilnisky amp Simoncelli2002 Touryan Lau amp Dan 2002Bialek amp de Ruyter van Steveninck 2003) It can be shown that the dimen-sionality of the RS is equal to the number of nonzero eigenvalues of a matrixgiven by a difference between covariance matrices of all presented stimuliand stimuli conditional on a spike Moreover the RS is spanned by theeigenvectors associated with the nonzero eigenvalues multiplied by the in-verse of the a priori covariance matrix Compared to the reverse correlationmethod we are no longer limited to nding only one of the relevant dimen-sions fOeig 1 middot i middot K Both the reverse correlation and the spikendashtriggeredcovariance method however give rigorously interpretable results only forgaussian distributions of inputs

In this article we investigate whether it is possible to lift the requirementfor stimuli to be gaussian When using natural stimuli which certainly arenongaussian the RS cannot be found by the spike-triggered covariancemethod Similarly the reverse correlation method does not give the correctRF even in the simplest case where the input-output function in equation 11depends on only one projection (see appendix A for a discussion of thispoint) However vectors that span the RS are clearly special directions inthe stimulus space independent of assumptions about Ps This notion canbe quantied by the Shannon information We note that the stimuli s do nothave to lie on a low-dimensional manifold within the overall D-dimensional

228 T Sharpee N Rust and W Bialek

space4 However since we assume that the neuronrsquos input-output functiondepends on a small number of relevant dimensions the ensemble of stimuliconditional on a spike may exhibit clear clustering This makes the proposedmethod of looking for the RS complementary to the clustering of stimuliconditional on a spike done in the information bottleneck method (TishbyPereira amp Bialek 1999 see also Dimitrov amp Miller 2001) Noninformation-based measures of similarity between probability distributions Ps andPsjspike have also been proposed to nd the RS (Paninski 2003a)

To summarize our assumptions

sup2 The sampling of the probability distribution of stimuli Ps is ergodicand stationary across repetitions The probability distribution is notassumed to be gaussian The ensemble of stimuli described by Ps

does not have to lie on a low-dimensional manifold embedded in theoverall D-dimensional space

sup2 We choose a single spike as the response of interest (for illustrationpurposes only) An identical scheme can be applied for example toparticular interspike intervals or to synchronous spikes from a pair ofneurons

sup2 The subspace relevant for generating a spike is low dimensional andEuclidean (cf equation 11)

sup2 The input-output function which is dened within the low-dimen-sional RS can be arbitrarily nonlinear It is obtained experimentally bysampling the probability distributions Ps and Psjspike within theRS

The article is organized as follows In section 2 we discuss how an opti-mization problem can be formulated to nd the RS A particular algorithmused to implement the optimization scheme is described in section 3 In sec-tion 4 we illustrate how the optimization scheme works with natural stimulifor model orientation-sensitive cells with one and two relevant dimensionsmuch like simple and complex cells found in primary visual cortex as wellas for a model auditory neuron responding to natural sounds We also dis-cuss the convergence of our estimates of the RS as a function of data set sizeWe emphasize that our optimization scheme does not rely on any specicstatistical properties of the stimulus ensemble and thus can be used withnatural stimuli

4 If one suspects that neurons are sensitive to low-dimensional features of their inputone might be tempted to analyze neural responses to stimuli that explore only the (puta-tive) relevant subspace perhaps along the line of the subspace reverse correlation method(Ringach et al 1997) Our approach (like the spike-triggered covariance approach) is dif-ferent because it allows the analysis of responses to stimuli that live in the full space andinstead we let the neuron ldquotell usrdquo which low-dimensional subspace is relevant

Analyzing Neural Responses to Natural Signals 229

2 Information as an Objective Function

When analyzing neural responses we compare the a priori probability dis-tribution of all presented stimuli with the probability distribution of stimulithat lead to a spike (de Ruyter van Steveninck amp Bialek 1988) For gaus-sian signals the probability distribution can be characterized by its secondmoment the covariance matrix However an ensemble of natural stimuli isnot gaussian so that in a general case neither second nor any nite num-ber of moments is sufcient to describe the probability distribution In thissituation Shannon information provides a rigorous way of comparing twoprobability distributions The average information carried by the arrivaltime of one spike is given by Brenner Strong et al (2000)

Ispike DZ

dsPsjspike log2

microPsjspike

Ps

para (21)

where ds denotes integration over full Dndashdimensional stimulus space Theinformation per spike as written in equation 21 is difcult to estimate exper-imentally since it requires either sampling of the high-dimensional proba-bility distribution Psjspike or a model of how spikes were generated thatis the knowledge of low-dimensional RS However it is possible to calculateIspike in a model-independent way if stimuli are presented multiple timesto estimate the probability distribution Pspikejs Then

Ispike Dfrac12

Pspikejs

Pspikelog2

microPspikejs

Pspike

parafrac34

s (22)

where the average is taken over all presented stimuli This can be useful inpractice (Brenner Strong et al 2000) because we can replace the ensembleaverage his with a time average and Pspikejs with the time-dependentspike rate rt Note that for a nite data set of N repetitions the obtainedvalue IspikeN will be on average larger than Ispike1 The true value Ispikecan be found by either subtracting an expected bias value which is of theorder of raquo 1=PspikeN 2 ln 2 (Treves amp Panzeri 1995 Panzeri amp Treves1996 Pola Schultz Petersen amp Panzeri 2002 Paninski 2003b) or extrapo-lating to N 1 (Brenner Strong et al 2000 Strong Koberle de Ruytervan Steveninck amp Bialek 1998) Measurement of Ispike in this way providesa model independent benchmark against which we can compare any de-scription of the neuronrsquos input-output relation

Our assumption is that spikes are generated according to a projectiononto a low-dimensional subspace Therefore to characterize relevance of aparticular direction v in the stimulus space we project all of the presentedstimuli onto v and form probability distributions Pvx and Pvxjspike ofprojection values x for the a priori stimulus ensemble and that conditional

230 T Sharpee N Rust and W Bialek

on a spike respectively

Pvx D hplusmnx iexcl s cent vis (23)

Pvxjspike D hplusmnx iexcl s cent vjspikeis (24)

where plusmnx is a delta function In practice both the average hcent cent centis acuteR

ds cent cent centPs over the a priori stimulus ensemble and the average hcent cent cent jspikeisacute

Rds cent cent cent Psjspike over the ensemble conditional on a spike are calculated

by binning the range of projection values x The probability distributionsare then obtained as histograms normalized in a such a way that the sumover all bins gives 1 The mutual information between spike arrival timesand the projection x by analogy with equation 21 is

Iv DZ

dxPvxjspike log2

microPvxjspike

Pvx

para (25)

which is also the Kullback-Leibler divergence D[PvxjspikejjPvx] Noticethat this information is a function of the direction v The information Iv

provides an invariant measure of how much the occurrence of a spike isdetermined by projection on the direction v It is a function only of directionin the stimulus space and does not change when vector v is multiplied bya constant This can be seen by noting that for any probability distributionand any constant c Pcvx D ciexcl1Pvx=c (see also theorem 964 of Cover ampThomas 1991) When evaluated along any vector v Iv middot Ispike The totalinformation Ispike can be recovered along one particular direction only ifv D Oe1 and only if the RS is one-dimensional

By analogy with equation 25 one could also calculate information Iv1

vn along a set of several directions fv1 vng based on the multipointprobability distributions of projection values x1 x2 xn along vectors v1v2 vn of interest

Pv1vn fxigjspike D

nY

iD1

plusmnxi iexcl s cent vijspike

+

s

(26)

Pv1vn fxig D

nY

iD1

plusmnxi iexcl s cent vi

+

s

(27)

If we are successful in nding all of the directions fOeig 1 middot i middot K con-tributing to the input-output relation equation 11 then the informationevaluated in this subspace will be equal to the total information Ispike Whenwe calculate information along a set of K vectors that are slightly off fromthe RS the answer of course is smaller than Ispike and is initially quadraticin small deviations plusmnvi One can therefore hope to nd the RS by maximizinginformation with respect to K vectors simultaneously The information does

Analyzing Neural Responses to Natural Signals 231

not increase if more vectors outside the RS are included For uncorrelatedstimuli any vector or a set of vectors that maximizes Iv belongs to theRS On the other hand as discussed in appendix B the result of optimiza-tion with respect to a number of vectors k lt K may deviate from the RS ifstimuli are correlated To nd the RS we rst maximize Iv and comparethis maximum with Ispike which is estimated according to equation 22 Ifthe difference exceeds that expected from nite sampling corrections weincrement the number of directions with respect to which information issimultaneously maximized

3 Optimization Algorithm

In this section we describe a particular algorithm we used to look for themost informative dimensions in order to nd the relevant subspace Wemake no claim that our choice of the algorithm is most efcient However itdoes give reproducible results for different starting points and spike trainswith differences taken to simulate neural noise Overall choices for an algo-rithm are broader because the information Iv as dened by equation 25is a continuous function whose gradient can be computed We nd (seeappendix C for a derivation)

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para (31)

where

hsjx spikei D1

Pxjspike

Zds splusmnx iexcl s cent vPsjspike (32)

and similarly for hsjxi Since information does not change with the length ofthe vector we have v centrvI D 0 as also can be seen directly from equation 31

As an optimization algorithm we have used a combination of gradientascent and simulated annealing algorithms successive line maximizationswere done along the direction of the gradient (Press Teukolsky Vetterlingamp Flannery 1992) During line maximizations a point with a smaller valueof information was accepted according to Boltzmann statistics with proba-bility exp[IviC1 iexcl Ivi=T] The effective temperature T is reduced byfactor of 1 iexcl sup2T upon completion of each line maximization Parameters ofthe simulated annealing algorithm to be adjusted are the starting tempera-ture T0 and the cooling rate sup2T 1T D iexclsup2TT When maximizing with respectto one vector we used values T0 D 1 and sup2T D 005 When maximizing withrespect to two vectors we either used the cooling schedule with sup2T D 0005and repeated it several times (four times in our case) or allowed the effec-tive temperature T to increase by a factor of 10 upon convergence to a localmaximum (keeping T middot T0 always) while limiting the total number of linemaximizations

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 3: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 225

Figure 1 Schematic illustration of a model with a onendashdimensional relevantsubspace Oe1 is the relevant dimension and Oe2 and Oe3 are irrelevant ones Shownare three example stimuli s s0 and s00 the receptive eld of a model neuronmdashtherelevant dimension Oe1 and our guess v for the relevant dimension Probabilitiesof a spike PspikejscentOe1 and Pspikejscentv are calculatedbyrst projecting all of thestimuli s onto each of the two vectors Oe1 and v respectively and then applyingequations (23 12 11) sequentially Our guess v for the relevant dimension isadjusted during the progress of the algorithm in such a way as to maximize Iv

of equation (25) which makes vector v approach the true relevant dimensionOe1

The RF can be thought of as a template or special direction Oe1 in the stimulusspace1 such that the neuronrsquos response depends on only a projection of agiven stimulus s onto Oe1 although the dependence of the response on this

1 The notation Oe denotes a unit vector since we are interested only in the direction thevector species and not in its length

226 T Sharpee N Rust and W Bialek

projection can be strongly nonlinear (cf Figure 1) In this simple modelthe reverse correlation method (de Boer amp Kuyper 1968 Rieke et al 1997Chichilnisky 2001) can be used to recover the vector Oe1 by analyzing theneuronrsquos responses to gaussian white noise In a more general case theprobability of the response depends on projections si D Oei cent s of the stimuluss on a set of K vectors fOe1 Oe2 OeKg

Pspikejs D Pspike f s1 s2 sK (11)

where Pspikejs is the probability of a spike given a stimulus s and Pspike

is the average ring rate In what follows we will call the subspace spannedby the set of vectors fOe1 Oe2 OeKg the relevant subspace (RS)2 We reiteratethat vectors fOeig 1 middot i middot K may also describe how the time dependence of thestimulus s affects the probability of a spike An example of such a relevantdimension would be a spatiotemporal RF of a visual neuron Although theideas developed below can be used to analyze input-output functions fwith respect to different neural responses such as patterns of spikes in time(de Ruyter van Steveninck amp Bialek 1988 Brenner Strong Koberle Bialekamp de Ruyter van Steveninck 2000 Reinagel amp Reid 2000) for illustrationpurposes we choose a single spike as the response of interest3

Equation 11 in itself is not yet a simplication if the dimensionality K ofthe RS is equal to the dimensionality D of the stimulus space In this articlewe will assume that the neuronrsquos ring is sensitive only to a small numberof stimulus features (K iquest D) While the general idea of searching for low-dimensional structure in high-dimensional data is very old our motivationhere comes from work on the y visual system where it was shown explic-

2 Since the analysis does not depend on a particular choice of a basis within the fullD-dimensional stimulus space for clarity we choose the basis in which the rst K basisvectors span the relevant subspace and the remaining D iexcl K vectors span the irrelevantsubspace

3 We emphasize that our focus here on single spikes is not equivalent to assumingthat the spike train is a Poisson process modulated by the stimulus No matter what thestatistical structure of the spike train is we always can ask what features of the stimulusare relevant for setting the probability of generating a single spike at one moment in timeFrom an information-theoretic point of view asking for stimulus features that capturethe mutual information between the stimulus and the arrival times of single spikes isa well-posed question even if successive spikes do not carry independent informationNote also that spikesrsquo carrying independent information is not the same as spikesrsquo beinggenerated as a Poisson process On the other hand if (for example) different temporalpatterns of spikes carry information about different stimulus features then analysis ofsingle spikes will result in a relevant subspace of artifactually high dimensionality Thusit is important that the approach discussed here carries over without modication to theanalysis of relevant dimensions for the generation of any discrete event such as a patternof spikes across time in one cell or synchronous spikes across a population of cells Fora related discussion of relevant dimensions and spike patterns using covariance matrixmethods see de Ruyter van Steveninck and Bialek (1988) and Aguera y Arcas Fairhalland Bialek (2003)

Analyzing Neural Responses to Natural Signals 227

itly that patterns of action potentials in identied motion-sensitive neuronsare correlated with low-dimensional projections of the high-dimensional vi-sual input (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Bialek amp de Ruyter van Steveninck 2003) The input-output functionf in equation 11 can be strongly nonlinear but it is presumed to dependon only a small number of projections This assumption appears to be lessstringent than that of approximate linearity which one makes when char-acterizing a neuronrsquos response in terms of Wiener kernels (see eg thediscussion in section 213 of Rieke et al 1997) The most difcult part in re-constructing the input-output function is to nd the RS Note that for K gt 1a description in terms of any linear combination of vectors fOe1 Oe2 OeKgis just as valid since we did not make any assumptions as to a particularform of the nonlinear function f

Once the relevant subspace is known the probability Pspikejs becomesa function of only a few parameters and it becomes feasible to map thisfunction experimentally inverting the probability distributions accordingto Bayesrsquo rule

f s1 s2 sK D Ps1 s2 sKjspike

Ps1 s2 sK (12)

If stimuli are chosen from a correlated gaussian noise ensemble then theneural response can be characterized by the spike-triggered covariancemethod (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Schwartz Chichilnisky amp Simoncelli2002 Touryan Lau amp Dan 2002Bialek amp de Ruyter van Steveninck 2003) It can be shown that the dimen-sionality of the RS is equal to the number of nonzero eigenvalues of a matrixgiven by a difference between covariance matrices of all presented stimuliand stimuli conditional on a spike Moreover the RS is spanned by theeigenvectors associated with the nonzero eigenvalues multiplied by the in-verse of the a priori covariance matrix Compared to the reverse correlationmethod we are no longer limited to nding only one of the relevant dimen-sions fOeig 1 middot i middot K Both the reverse correlation and the spikendashtriggeredcovariance method however give rigorously interpretable results only forgaussian distributions of inputs

In this article we investigate whether it is possible to lift the requirementfor stimuli to be gaussian When using natural stimuli which certainly arenongaussian the RS cannot be found by the spike-triggered covariancemethod Similarly the reverse correlation method does not give the correctRF even in the simplest case where the input-output function in equation 11depends on only one projection (see appendix A for a discussion of thispoint) However vectors that span the RS are clearly special directions inthe stimulus space independent of assumptions about Ps This notion canbe quantied by the Shannon information We note that the stimuli s do nothave to lie on a low-dimensional manifold within the overall D-dimensional

228 T Sharpee N Rust and W Bialek

space4 However since we assume that the neuronrsquos input-output functiondepends on a small number of relevant dimensions the ensemble of stimuliconditional on a spike may exhibit clear clustering This makes the proposedmethod of looking for the RS complementary to the clustering of stimuliconditional on a spike done in the information bottleneck method (TishbyPereira amp Bialek 1999 see also Dimitrov amp Miller 2001) Noninformation-based measures of similarity between probability distributions Ps andPsjspike have also been proposed to nd the RS (Paninski 2003a)

To summarize our assumptions

sup2 The sampling of the probability distribution of stimuli Ps is ergodicand stationary across repetitions The probability distribution is notassumed to be gaussian The ensemble of stimuli described by Ps

does not have to lie on a low-dimensional manifold embedded in theoverall D-dimensional space

sup2 We choose a single spike as the response of interest (for illustrationpurposes only) An identical scheme can be applied for example toparticular interspike intervals or to synchronous spikes from a pair ofneurons

sup2 The subspace relevant for generating a spike is low dimensional andEuclidean (cf equation 11)

sup2 The input-output function which is dened within the low-dimen-sional RS can be arbitrarily nonlinear It is obtained experimentally bysampling the probability distributions Ps and Psjspike within theRS

The article is organized as follows In section 2 we discuss how an opti-mization problem can be formulated to nd the RS A particular algorithmused to implement the optimization scheme is described in section 3 In sec-tion 4 we illustrate how the optimization scheme works with natural stimulifor model orientation-sensitive cells with one and two relevant dimensionsmuch like simple and complex cells found in primary visual cortex as wellas for a model auditory neuron responding to natural sounds We also dis-cuss the convergence of our estimates of the RS as a function of data set sizeWe emphasize that our optimization scheme does not rely on any specicstatistical properties of the stimulus ensemble and thus can be used withnatural stimuli

4 If one suspects that neurons are sensitive to low-dimensional features of their inputone might be tempted to analyze neural responses to stimuli that explore only the (puta-tive) relevant subspace perhaps along the line of the subspace reverse correlation method(Ringach et al 1997) Our approach (like the spike-triggered covariance approach) is dif-ferent because it allows the analysis of responses to stimuli that live in the full space andinstead we let the neuron ldquotell usrdquo which low-dimensional subspace is relevant

Analyzing Neural Responses to Natural Signals 229

2 Information as an Objective Function

When analyzing neural responses we compare the a priori probability dis-tribution of all presented stimuli with the probability distribution of stimulithat lead to a spike (de Ruyter van Steveninck amp Bialek 1988) For gaus-sian signals the probability distribution can be characterized by its secondmoment the covariance matrix However an ensemble of natural stimuli isnot gaussian so that in a general case neither second nor any nite num-ber of moments is sufcient to describe the probability distribution In thissituation Shannon information provides a rigorous way of comparing twoprobability distributions The average information carried by the arrivaltime of one spike is given by Brenner Strong et al (2000)

Ispike DZ

dsPsjspike log2

microPsjspike

Ps

para (21)

where ds denotes integration over full Dndashdimensional stimulus space Theinformation per spike as written in equation 21 is difcult to estimate exper-imentally since it requires either sampling of the high-dimensional proba-bility distribution Psjspike or a model of how spikes were generated thatis the knowledge of low-dimensional RS However it is possible to calculateIspike in a model-independent way if stimuli are presented multiple timesto estimate the probability distribution Pspikejs Then

Ispike Dfrac12

Pspikejs

Pspikelog2

microPspikejs

Pspike

parafrac34

s (22)

where the average is taken over all presented stimuli This can be useful inpractice (Brenner Strong et al 2000) because we can replace the ensembleaverage his with a time average and Pspikejs with the time-dependentspike rate rt Note that for a nite data set of N repetitions the obtainedvalue IspikeN will be on average larger than Ispike1 The true value Ispikecan be found by either subtracting an expected bias value which is of theorder of raquo 1=PspikeN 2 ln 2 (Treves amp Panzeri 1995 Panzeri amp Treves1996 Pola Schultz Petersen amp Panzeri 2002 Paninski 2003b) or extrapo-lating to N 1 (Brenner Strong et al 2000 Strong Koberle de Ruytervan Steveninck amp Bialek 1998) Measurement of Ispike in this way providesa model independent benchmark against which we can compare any de-scription of the neuronrsquos input-output relation

Our assumption is that spikes are generated according to a projectiononto a low-dimensional subspace Therefore to characterize relevance of aparticular direction v in the stimulus space we project all of the presentedstimuli onto v and form probability distributions Pvx and Pvxjspike ofprojection values x for the a priori stimulus ensemble and that conditional

230 T Sharpee N Rust and W Bialek

on a spike respectively

Pvx D hplusmnx iexcl s cent vis (23)

Pvxjspike D hplusmnx iexcl s cent vjspikeis (24)

where plusmnx is a delta function In practice both the average hcent cent centis acuteR

ds cent cent centPs over the a priori stimulus ensemble and the average hcent cent cent jspikeisacute

Rds cent cent cent Psjspike over the ensemble conditional on a spike are calculated

by binning the range of projection values x The probability distributionsare then obtained as histograms normalized in a such a way that the sumover all bins gives 1 The mutual information between spike arrival timesand the projection x by analogy with equation 21 is

Iv DZ

dxPvxjspike log2

microPvxjspike

Pvx

para (25)

which is also the Kullback-Leibler divergence D[PvxjspikejjPvx] Noticethat this information is a function of the direction v The information Iv

provides an invariant measure of how much the occurrence of a spike isdetermined by projection on the direction v It is a function only of directionin the stimulus space and does not change when vector v is multiplied bya constant This can be seen by noting that for any probability distributionand any constant c Pcvx D ciexcl1Pvx=c (see also theorem 964 of Cover ampThomas 1991) When evaluated along any vector v Iv middot Ispike The totalinformation Ispike can be recovered along one particular direction only ifv D Oe1 and only if the RS is one-dimensional

By analogy with equation 25 one could also calculate information Iv1

vn along a set of several directions fv1 vng based on the multipointprobability distributions of projection values x1 x2 xn along vectors v1v2 vn of interest

Pv1vn fxigjspike D

nY

iD1

plusmnxi iexcl s cent vijspike

+

s

(26)

Pv1vn fxig D

nY

iD1

plusmnxi iexcl s cent vi

+

s

(27)

If we are successful in nding all of the directions fOeig 1 middot i middot K con-tributing to the input-output relation equation 11 then the informationevaluated in this subspace will be equal to the total information Ispike Whenwe calculate information along a set of K vectors that are slightly off fromthe RS the answer of course is smaller than Ispike and is initially quadraticin small deviations plusmnvi One can therefore hope to nd the RS by maximizinginformation with respect to K vectors simultaneously The information does

Analyzing Neural Responses to Natural Signals 231

not increase if more vectors outside the RS are included For uncorrelatedstimuli any vector or a set of vectors that maximizes Iv belongs to theRS On the other hand as discussed in appendix B the result of optimiza-tion with respect to a number of vectors k lt K may deviate from the RS ifstimuli are correlated To nd the RS we rst maximize Iv and comparethis maximum with Ispike which is estimated according to equation 22 Ifthe difference exceeds that expected from nite sampling corrections weincrement the number of directions with respect to which information issimultaneously maximized

3 Optimization Algorithm

In this section we describe a particular algorithm we used to look for themost informative dimensions in order to nd the relevant subspace Wemake no claim that our choice of the algorithm is most efcient However itdoes give reproducible results for different starting points and spike trainswith differences taken to simulate neural noise Overall choices for an algo-rithm are broader because the information Iv as dened by equation 25is a continuous function whose gradient can be computed We nd (seeappendix C for a derivation)

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para (31)

where

hsjx spikei D1

Pxjspike

Zds splusmnx iexcl s cent vPsjspike (32)

and similarly for hsjxi Since information does not change with the length ofthe vector we have v centrvI D 0 as also can be seen directly from equation 31

As an optimization algorithm we have used a combination of gradientascent and simulated annealing algorithms successive line maximizationswere done along the direction of the gradient (Press Teukolsky Vetterlingamp Flannery 1992) During line maximizations a point with a smaller valueof information was accepted according to Boltzmann statistics with proba-bility exp[IviC1 iexcl Ivi=T] The effective temperature T is reduced byfactor of 1 iexcl sup2T upon completion of each line maximization Parameters ofthe simulated annealing algorithm to be adjusted are the starting tempera-ture T0 and the cooling rate sup2T 1T D iexclsup2TT When maximizing with respectto one vector we used values T0 D 1 and sup2T D 005 When maximizing withrespect to two vectors we either used the cooling schedule with sup2T D 0005and repeated it several times (four times in our case) or allowed the effec-tive temperature T to increase by a factor of 10 upon convergence to a localmaximum (keeping T middot T0 always) while limiting the total number of linemaximizations

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 4: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

226 T Sharpee N Rust and W Bialek

projection can be strongly nonlinear (cf Figure 1) In this simple modelthe reverse correlation method (de Boer amp Kuyper 1968 Rieke et al 1997Chichilnisky 2001) can be used to recover the vector Oe1 by analyzing theneuronrsquos responses to gaussian white noise In a more general case theprobability of the response depends on projections si D Oei cent s of the stimuluss on a set of K vectors fOe1 Oe2 OeKg

Pspikejs D Pspike f s1 s2 sK (11)

where Pspikejs is the probability of a spike given a stimulus s and Pspike

is the average ring rate In what follows we will call the subspace spannedby the set of vectors fOe1 Oe2 OeKg the relevant subspace (RS)2 We reiteratethat vectors fOeig 1 middot i middot K may also describe how the time dependence of thestimulus s affects the probability of a spike An example of such a relevantdimension would be a spatiotemporal RF of a visual neuron Although theideas developed below can be used to analyze input-output functions fwith respect to different neural responses such as patterns of spikes in time(de Ruyter van Steveninck amp Bialek 1988 Brenner Strong Koberle Bialekamp de Ruyter van Steveninck 2000 Reinagel amp Reid 2000) for illustrationpurposes we choose a single spike as the response of interest3

Equation 11 in itself is not yet a simplication if the dimensionality K ofthe RS is equal to the dimensionality D of the stimulus space In this articlewe will assume that the neuronrsquos ring is sensitive only to a small numberof stimulus features (K iquest D) While the general idea of searching for low-dimensional structure in high-dimensional data is very old our motivationhere comes from work on the y visual system where it was shown explic-

2 Since the analysis does not depend on a particular choice of a basis within the fullD-dimensional stimulus space for clarity we choose the basis in which the rst K basisvectors span the relevant subspace and the remaining D iexcl K vectors span the irrelevantsubspace

3 We emphasize that our focus here on single spikes is not equivalent to assumingthat the spike train is a Poisson process modulated by the stimulus No matter what thestatistical structure of the spike train is we always can ask what features of the stimulusare relevant for setting the probability of generating a single spike at one moment in timeFrom an information-theoretic point of view asking for stimulus features that capturethe mutual information between the stimulus and the arrival times of single spikes isa well-posed question even if successive spikes do not carry independent informationNote also that spikesrsquo carrying independent information is not the same as spikesrsquo beinggenerated as a Poisson process On the other hand if (for example) different temporalpatterns of spikes carry information about different stimulus features then analysis ofsingle spikes will result in a relevant subspace of artifactually high dimensionality Thusit is important that the approach discussed here carries over without modication to theanalysis of relevant dimensions for the generation of any discrete event such as a patternof spikes across time in one cell or synchronous spikes across a population of cells Fora related discussion of relevant dimensions and spike patterns using covariance matrixmethods see de Ruyter van Steveninck and Bialek (1988) and Aguera y Arcas Fairhalland Bialek (2003)

Analyzing Neural Responses to Natural Signals 227

itly that patterns of action potentials in identied motion-sensitive neuronsare correlated with low-dimensional projections of the high-dimensional vi-sual input (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Bialek amp de Ruyter van Steveninck 2003) The input-output functionf in equation 11 can be strongly nonlinear but it is presumed to dependon only a small number of projections This assumption appears to be lessstringent than that of approximate linearity which one makes when char-acterizing a neuronrsquos response in terms of Wiener kernels (see eg thediscussion in section 213 of Rieke et al 1997) The most difcult part in re-constructing the input-output function is to nd the RS Note that for K gt 1a description in terms of any linear combination of vectors fOe1 Oe2 OeKgis just as valid since we did not make any assumptions as to a particularform of the nonlinear function f

Once the relevant subspace is known the probability Pspikejs becomesa function of only a few parameters and it becomes feasible to map thisfunction experimentally inverting the probability distributions accordingto Bayesrsquo rule

f s1 s2 sK D Ps1 s2 sKjspike

Ps1 s2 sK (12)

If stimuli are chosen from a correlated gaussian noise ensemble then theneural response can be characterized by the spike-triggered covariancemethod (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Schwartz Chichilnisky amp Simoncelli2002 Touryan Lau amp Dan 2002Bialek amp de Ruyter van Steveninck 2003) It can be shown that the dimen-sionality of the RS is equal to the number of nonzero eigenvalues of a matrixgiven by a difference between covariance matrices of all presented stimuliand stimuli conditional on a spike Moreover the RS is spanned by theeigenvectors associated with the nonzero eigenvalues multiplied by the in-verse of the a priori covariance matrix Compared to the reverse correlationmethod we are no longer limited to nding only one of the relevant dimen-sions fOeig 1 middot i middot K Both the reverse correlation and the spikendashtriggeredcovariance method however give rigorously interpretable results only forgaussian distributions of inputs

In this article we investigate whether it is possible to lift the requirementfor stimuli to be gaussian When using natural stimuli which certainly arenongaussian the RS cannot be found by the spike-triggered covariancemethod Similarly the reverse correlation method does not give the correctRF even in the simplest case where the input-output function in equation 11depends on only one projection (see appendix A for a discussion of thispoint) However vectors that span the RS are clearly special directions inthe stimulus space independent of assumptions about Ps This notion canbe quantied by the Shannon information We note that the stimuli s do nothave to lie on a low-dimensional manifold within the overall D-dimensional

228 T Sharpee N Rust and W Bialek

space4 However since we assume that the neuronrsquos input-output functiondepends on a small number of relevant dimensions the ensemble of stimuliconditional on a spike may exhibit clear clustering This makes the proposedmethod of looking for the RS complementary to the clustering of stimuliconditional on a spike done in the information bottleneck method (TishbyPereira amp Bialek 1999 see also Dimitrov amp Miller 2001) Noninformation-based measures of similarity between probability distributions Ps andPsjspike have also been proposed to nd the RS (Paninski 2003a)

To summarize our assumptions

sup2 The sampling of the probability distribution of stimuli Ps is ergodicand stationary across repetitions The probability distribution is notassumed to be gaussian The ensemble of stimuli described by Ps

does not have to lie on a low-dimensional manifold embedded in theoverall D-dimensional space

sup2 We choose a single spike as the response of interest (for illustrationpurposes only) An identical scheme can be applied for example toparticular interspike intervals or to synchronous spikes from a pair ofneurons

sup2 The subspace relevant for generating a spike is low dimensional andEuclidean (cf equation 11)

sup2 The input-output function which is dened within the low-dimen-sional RS can be arbitrarily nonlinear It is obtained experimentally bysampling the probability distributions Ps and Psjspike within theRS

The article is organized as follows In section 2 we discuss how an opti-mization problem can be formulated to nd the RS A particular algorithmused to implement the optimization scheme is described in section 3 In sec-tion 4 we illustrate how the optimization scheme works with natural stimulifor model orientation-sensitive cells with one and two relevant dimensionsmuch like simple and complex cells found in primary visual cortex as wellas for a model auditory neuron responding to natural sounds We also dis-cuss the convergence of our estimates of the RS as a function of data set sizeWe emphasize that our optimization scheme does not rely on any specicstatistical properties of the stimulus ensemble and thus can be used withnatural stimuli

4 If one suspects that neurons are sensitive to low-dimensional features of their inputone might be tempted to analyze neural responses to stimuli that explore only the (puta-tive) relevant subspace perhaps along the line of the subspace reverse correlation method(Ringach et al 1997) Our approach (like the spike-triggered covariance approach) is dif-ferent because it allows the analysis of responses to stimuli that live in the full space andinstead we let the neuron ldquotell usrdquo which low-dimensional subspace is relevant

Analyzing Neural Responses to Natural Signals 229

2 Information as an Objective Function

When analyzing neural responses we compare the a priori probability dis-tribution of all presented stimuli with the probability distribution of stimulithat lead to a spike (de Ruyter van Steveninck amp Bialek 1988) For gaus-sian signals the probability distribution can be characterized by its secondmoment the covariance matrix However an ensemble of natural stimuli isnot gaussian so that in a general case neither second nor any nite num-ber of moments is sufcient to describe the probability distribution In thissituation Shannon information provides a rigorous way of comparing twoprobability distributions The average information carried by the arrivaltime of one spike is given by Brenner Strong et al (2000)

Ispike DZ

dsPsjspike log2

microPsjspike

Ps

para (21)

where ds denotes integration over full Dndashdimensional stimulus space Theinformation per spike as written in equation 21 is difcult to estimate exper-imentally since it requires either sampling of the high-dimensional proba-bility distribution Psjspike or a model of how spikes were generated thatis the knowledge of low-dimensional RS However it is possible to calculateIspike in a model-independent way if stimuli are presented multiple timesto estimate the probability distribution Pspikejs Then

Ispike Dfrac12

Pspikejs

Pspikelog2

microPspikejs

Pspike

parafrac34

s (22)

where the average is taken over all presented stimuli This can be useful inpractice (Brenner Strong et al 2000) because we can replace the ensembleaverage his with a time average and Pspikejs with the time-dependentspike rate rt Note that for a nite data set of N repetitions the obtainedvalue IspikeN will be on average larger than Ispike1 The true value Ispikecan be found by either subtracting an expected bias value which is of theorder of raquo 1=PspikeN 2 ln 2 (Treves amp Panzeri 1995 Panzeri amp Treves1996 Pola Schultz Petersen amp Panzeri 2002 Paninski 2003b) or extrapo-lating to N 1 (Brenner Strong et al 2000 Strong Koberle de Ruytervan Steveninck amp Bialek 1998) Measurement of Ispike in this way providesa model independent benchmark against which we can compare any de-scription of the neuronrsquos input-output relation

Our assumption is that spikes are generated according to a projectiononto a low-dimensional subspace Therefore to characterize relevance of aparticular direction v in the stimulus space we project all of the presentedstimuli onto v and form probability distributions Pvx and Pvxjspike ofprojection values x for the a priori stimulus ensemble and that conditional

230 T Sharpee N Rust and W Bialek

on a spike respectively

Pvx D hplusmnx iexcl s cent vis (23)

Pvxjspike D hplusmnx iexcl s cent vjspikeis (24)

where plusmnx is a delta function In practice both the average hcent cent centis acuteR

ds cent cent centPs over the a priori stimulus ensemble and the average hcent cent cent jspikeisacute

Rds cent cent cent Psjspike over the ensemble conditional on a spike are calculated

by binning the range of projection values x The probability distributionsare then obtained as histograms normalized in a such a way that the sumover all bins gives 1 The mutual information between spike arrival timesand the projection x by analogy with equation 21 is

Iv DZ

dxPvxjspike log2

microPvxjspike

Pvx

para (25)

which is also the Kullback-Leibler divergence D[PvxjspikejjPvx] Noticethat this information is a function of the direction v The information Iv

provides an invariant measure of how much the occurrence of a spike isdetermined by projection on the direction v It is a function only of directionin the stimulus space and does not change when vector v is multiplied bya constant This can be seen by noting that for any probability distributionand any constant c Pcvx D ciexcl1Pvx=c (see also theorem 964 of Cover ampThomas 1991) When evaluated along any vector v Iv middot Ispike The totalinformation Ispike can be recovered along one particular direction only ifv D Oe1 and only if the RS is one-dimensional

By analogy with equation 25 one could also calculate information Iv1

vn along a set of several directions fv1 vng based on the multipointprobability distributions of projection values x1 x2 xn along vectors v1v2 vn of interest

Pv1vn fxigjspike D

nY

iD1

plusmnxi iexcl s cent vijspike

+

s

(26)

Pv1vn fxig D

nY

iD1

plusmnxi iexcl s cent vi

+

s

(27)

If we are successful in nding all of the directions fOeig 1 middot i middot K con-tributing to the input-output relation equation 11 then the informationevaluated in this subspace will be equal to the total information Ispike Whenwe calculate information along a set of K vectors that are slightly off fromthe RS the answer of course is smaller than Ispike and is initially quadraticin small deviations plusmnvi One can therefore hope to nd the RS by maximizinginformation with respect to K vectors simultaneously The information does

Analyzing Neural Responses to Natural Signals 231

not increase if more vectors outside the RS are included For uncorrelatedstimuli any vector or a set of vectors that maximizes Iv belongs to theRS On the other hand as discussed in appendix B the result of optimiza-tion with respect to a number of vectors k lt K may deviate from the RS ifstimuli are correlated To nd the RS we rst maximize Iv and comparethis maximum with Ispike which is estimated according to equation 22 Ifthe difference exceeds that expected from nite sampling corrections weincrement the number of directions with respect to which information issimultaneously maximized

3 Optimization Algorithm

In this section we describe a particular algorithm we used to look for themost informative dimensions in order to nd the relevant subspace Wemake no claim that our choice of the algorithm is most efcient However itdoes give reproducible results for different starting points and spike trainswith differences taken to simulate neural noise Overall choices for an algo-rithm are broader because the information Iv as dened by equation 25is a continuous function whose gradient can be computed We nd (seeappendix C for a derivation)

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para (31)

where

hsjx spikei D1

Pxjspike

Zds splusmnx iexcl s cent vPsjspike (32)

and similarly for hsjxi Since information does not change with the length ofthe vector we have v centrvI D 0 as also can be seen directly from equation 31

As an optimization algorithm we have used a combination of gradientascent and simulated annealing algorithms successive line maximizationswere done along the direction of the gradient (Press Teukolsky Vetterlingamp Flannery 1992) During line maximizations a point with a smaller valueof information was accepted according to Boltzmann statistics with proba-bility exp[IviC1 iexcl Ivi=T] The effective temperature T is reduced byfactor of 1 iexcl sup2T upon completion of each line maximization Parameters ofthe simulated annealing algorithm to be adjusted are the starting tempera-ture T0 and the cooling rate sup2T 1T D iexclsup2TT When maximizing with respectto one vector we used values T0 D 1 and sup2T D 005 When maximizing withrespect to two vectors we either used the cooling schedule with sup2T D 0005and repeated it several times (four times in our case) or allowed the effec-tive temperature T to increase by a factor of 10 upon convergence to a localmaximum (keeping T middot T0 always) while limiting the total number of linemaximizations

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 5: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 227

itly that patterns of action potentials in identied motion-sensitive neuronsare correlated with low-dimensional projections of the high-dimensional vi-sual input (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Bialek amp de Ruyter van Steveninck 2003) The input-output functionf in equation 11 can be strongly nonlinear but it is presumed to dependon only a small number of projections This assumption appears to be lessstringent than that of approximate linearity which one makes when char-acterizing a neuronrsquos response in terms of Wiener kernels (see eg thediscussion in section 213 of Rieke et al 1997) The most difcult part in re-constructing the input-output function is to nd the RS Note that for K gt 1a description in terms of any linear combination of vectors fOe1 Oe2 OeKgis just as valid since we did not make any assumptions as to a particularform of the nonlinear function f

Once the relevant subspace is known the probability Pspikejs becomesa function of only a few parameters and it becomes feasible to map thisfunction experimentally inverting the probability distributions accordingto Bayesrsquo rule

f s1 s2 sK D Ps1 s2 sKjspike

Ps1 s2 sK (12)

If stimuli are chosen from a correlated gaussian noise ensemble then theneural response can be characterized by the spike-triggered covariancemethod (de Ruyter van Steveninck amp Bialek 1988 Brenner Bialek et al2000 Schwartz Chichilnisky amp Simoncelli2002 Touryan Lau amp Dan 2002Bialek amp de Ruyter van Steveninck 2003) It can be shown that the dimen-sionality of the RS is equal to the number of nonzero eigenvalues of a matrixgiven by a difference between covariance matrices of all presented stimuliand stimuli conditional on a spike Moreover the RS is spanned by theeigenvectors associated with the nonzero eigenvalues multiplied by the in-verse of the a priori covariance matrix Compared to the reverse correlationmethod we are no longer limited to nding only one of the relevant dimen-sions fOeig 1 middot i middot K Both the reverse correlation and the spikendashtriggeredcovariance method however give rigorously interpretable results only forgaussian distributions of inputs

In this article we investigate whether it is possible to lift the requirementfor stimuli to be gaussian When using natural stimuli which certainly arenongaussian the RS cannot be found by the spike-triggered covariancemethod Similarly the reverse correlation method does not give the correctRF even in the simplest case where the input-output function in equation 11depends on only one projection (see appendix A for a discussion of thispoint) However vectors that span the RS are clearly special directions inthe stimulus space independent of assumptions about Ps This notion canbe quantied by the Shannon information We note that the stimuli s do nothave to lie on a low-dimensional manifold within the overall D-dimensional

228 T Sharpee N Rust and W Bialek

space4 However since we assume that the neuronrsquos input-output functiondepends on a small number of relevant dimensions the ensemble of stimuliconditional on a spike may exhibit clear clustering This makes the proposedmethod of looking for the RS complementary to the clustering of stimuliconditional on a spike done in the information bottleneck method (TishbyPereira amp Bialek 1999 see also Dimitrov amp Miller 2001) Noninformation-based measures of similarity between probability distributions Ps andPsjspike have also been proposed to nd the RS (Paninski 2003a)

To summarize our assumptions

sup2 The sampling of the probability distribution of stimuli Ps is ergodicand stationary across repetitions The probability distribution is notassumed to be gaussian The ensemble of stimuli described by Ps

does not have to lie on a low-dimensional manifold embedded in theoverall D-dimensional space

sup2 We choose a single spike as the response of interest (for illustrationpurposes only) An identical scheme can be applied for example toparticular interspike intervals or to synchronous spikes from a pair ofneurons

sup2 The subspace relevant for generating a spike is low dimensional andEuclidean (cf equation 11)

sup2 The input-output function which is dened within the low-dimen-sional RS can be arbitrarily nonlinear It is obtained experimentally bysampling the probability distributions Ps and Psjspike within theRS

The article is organized as follows In section 2 we discuss how an opti-mization problem can be formulated to nd the RS A particular algorithmused to implement the optimization scheme is described in section 3 In sec-tion 4 we illustrate how the optimization scheme works with natural stimulifor model orientation-sensitive cells with one and two relevant dimensionsmuch like simple and complex cells found in primary visual cortex as wellas for a model auditory neuron responding to natural sounds We also dis-cuss the convergence of our estimates of the RS as a function of data set sizeWe emphasize that our optimization scheme does not rely on any specicstatistical properties of the stimulus ensemble and thus can be used withnatural stimuli

4 If one suspects that neurons are sensitive to low-dimensional features of their inputone might be tempted to analyze neural responses to stimuli that explore only the (puta-tive) relevant subspace perhaps along the line of the subspace reverse correlation method(Ringach et al 1997) Our approach (like the spike-triggered covariance approach) is dif-ferent because it allows the analysis of responses to stimuli that live in the full space andinstead we let the neuron ldquotell usrdquo which low-dimensional subspace is relevant

Analyzing Neural Responses to Natural Signals 229

2 Information as an Objective Function

When analyzing neural responses we compare the a priori probability dis-tribution of all presented stimuli with the probability distribution of stimulithat lead to a spike (de Ruyter van Steveninck amp Bialek 1988) For gaus-sian signals the probability distribution can be characterized by its secondmoment the covariance matrix However an ensemble of natural stimuli isnot gaussian so that in a general case neither second nor any nite num-ber of moments is sufcient to describe the probability distribution In thissituation Shannon information provides a rigorous way of comparing twoprobability distributions The average information carried by the arrivaltime of one spike is given by Brenner Strong et al (2000)

Ispike DZ

dsPsjspike log2

microPsjspike

Ps

para (21)

where ds denotes integration over full Dndashdimensional stimulus space Theinformation per spike as written in equation 21 is difcult to estimate exper-imentally since it requires either sampling of the high-dimensional proba-bility distribution Psjspike or a model of how spikes were generated thatis the knowledge of low-dimensional RS However it is possible to calculateIspike in a model-independent way if stimuli are presented multiple timesto estimate the probability distribution Pspikejs Then

Ispike Dfrac12

Pspikejs

Pspikelog2

microPspikejs

Pspike

parafrac34

s (22)

where the average is taken over all presented stimuli This can be useful inpractice (Brenner Strong et al 2000) because we can replace the ensembleaverage his with a time average and Pspikejs with the time-dependentspike rate rt Note that for a nite data set of N repetitions the obtainedvalue IspikeN will be on average larger than Ispike1 The true value Ispikecan be found by either subtracting an expected bias value which is of theorder of raquo 1=PspikeN 2 ln 2 (Treves amp Panzeri 1995 Panzeri amp Treves1996 Pola Schultz Petersen amp Panzeri 2002 Paninski 2003b) or extrapo-lating to N 1 (Brenner Strong et al 2000 Strong Koberle de Ruytervan Steveninck amp Bialek 1998) Measurement of Ispike in this way providesa model independent benchmark against which we can compare any de-scription of the neuronrsquos input-output relation

Our assumption is that spikes are generated according to a projectiononto a low-dimensional subspace Therefore to characterize relevance of aparticular direction v in the stimulus space we project all of the presentedstimuli onto v and form probability distributions Pvx and Pvxjspike ofprojection values x for the a priori stimulus ensemble and that conditional

230 T Sharpee N Rust and W Bialek

on a spike respectively

Pvx D hplusmnx iexcl s cent vis (23)

Pvxjspike D hplusmnx iexcl s cent vjspikeis (24)

where plusmnx is a delta function In practice both the average hcent cent centis acuteR

ds cent cent centPs over the a priori stimulus ensemble and the average hcent cent cent jspikeisacute

Rds cent cent cent Psjspike over the ensemble conditional on a spike are calculated

by binning the range of projection values x The probability distributionsare then obtained as histograms normalized in a such a way that the sumover all bins gives 1 The mutual information between spike arrival timesand the projection x by analogy with equation 21 is

Iv DZ

dxPvxjspike log2

microPvxjspike

Pvx

para (25)

which is also the Kullback-Leibler divergence D[PvxjspikejjPvx] Noticethat this information is a function of the direction v The information Iv

provides an invariant measure of how much the occurrence of a spike isdetermined by projection on the direction v It is a function only of directionin the stimulus space and does not change when vector v is multiplied bya constant This can be seen by noting that for any probability distributionand any constant c Pcvx D ciexcl1Pvx=c (see also theorem 964 of Cover ampThomas 1991) When evaluated along any vector v Iv middot Ispike The totalinformation Ispike can be recovered along one particular direction only ifv D Oe1 and only if the RS is one-dimensional

By analogy with equation 25 one could also calculate information Iv1

vn along a set of several directions fv1 vng based on the multipointprobability distributions of projection values x1 x2 xn along vectors v1v2 vn of interest

Pv1vn fxigjspike D

nY

iD1

plusmnxi iexcl s cent vijspike

+

s

(26)

Pv1vn fxig D

nY

iD1

plusmnxi iexcl s cent vi

+

s

(27)

If we are successful in nding all of the directions fOeig 1 middot i middot K con-tributing to the input-output relation equation 11 then the informationevaluated in this subspace will be equal to the total information Ispike Whenwe calculate information along a set of K vectors that are slightly off fromthe RS the answer of course is smaller than Ispike and is initially quadraticin small deviations plusmnvi One can therefore hope to nd the RS by maximizinginformation with respect to K vectors simultaneously The information does

Analyzing Neural Responses to Natural Signals 231

not increase if more vectors outside the RS are included For uncorrelatedstimuli any vector or a set of vectors that maximizes Iv belongs to theRS On the other hand as discussed in appendix B the result of optimiza-tion with respect to a number of vectors k lt K may deviate from the RS ifstimuli are correlated To nd the RS we rst maximize Iv and comparethis maximum with Ispike which is estimated according to equation 22 Ifthe difference exceeds that expected from nite sampling corrections weincrement the number of directions with respect to which information issimultaneously maximized

3 Optimization Algorithm

In this section we describe a particular algorithm we used to look for themost informative dimensions in order to nd the relevant subspace Wemake no claim that our choice of the algorithm is most efcient However itdoes give reproducible results for different starting points and spike trainswith differences taken to simulate neural noise Overall choices for an algo-rithm are broader because the information Iv as dened by equation 25is a continuous function whose gradient can be computed We nd (seeappendix C for a derivation)

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para (31)

where

hsjx spikei D1

Pxjspike

Zds splusmnx iexcl s cent vPsjspike (32)

and similarly for hsjxi Since information does not change with the length ofthe vector we have v centrvI D 0 as also can be seen directly from equation 31

As an optimization algorithm we have used a combination of gradientascent and simulated annealing algorithms successive line maximizationswere done along the direction of the gradient (Press Teukolsky Vetterlingamp Flannery 1992) During line maximizations a point with a smaller valueof information was accepted according to Boltzmann statistics with proba-bility exp[IviC1 iexcl Ivi=T] The effective temperature T is reduced byfactor of 1 iexcl sup2T upon completion of each line maximization Parameters ofthe simulated annealing algorithm to be adjusted are the starting tempera-ture T0 and the cooling rate sup2T 1T D iexclsup2TT When maximizing with respectto one vector we used values T0 D 1 and sup2T D 005 When maximizing withrespect to two vectors we either used the cooling schedule with sup2T D 0005and repeated it several times (four times in our case) or allowed the effec-tive temperature T to increase by a factor of 10 upon convergence to a localmaximum (keeping T middot T0 always) while limiting the total number of linemaximizations

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 6: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

228 T Sharpee N Rust and W Bialek

space4 However since we assume that the neuronrsquos input-output functiondepends on a small number of relevant dimensions the ensemble of stimuliconditional on a spike may exhibit clear clustering This makes the proposedmethod of looking for the RS complementary to the clustering of stimuliconditional on a spike done in the information bottleneck method (TishbyPereira amp Bialek 1999 see also Dimitrov amp Miller 2001) Noninformation-based measures of similarity between probability distributions Ps andPsjspike have also been proposed to nd the RS (Paninski 2003a)

To summarize our assumptions

sup2 The sampling of the probability distribution of stimuli Ps is ergodicand stationary across repetitions The probability distribution is notassumed to be gaussian The ensemble of stimuli described by Ps

does not have to lie on a low-dimensional manifold embedded in theoverall D-dimensional space

sup2 We choose a single spike as the response of interest (for illustrationpurposes only) An identical scheme can be applied for example toparticular interspike intervals or to synchronous spikes from a pair ofneurons

sup2 The subspace relevant for generating a spike is low dimensional andEuclidean (cf equation 11)

sup2 The input-output function which is dened within the low-dimen-sional RS can be arbitrarily nonlinear It is obtained experimentally bysampling the probability distributions Ps and Psjspike within theRS

The article is organized as follows In section 2 we discuss how an opti-mization problem can be formulated to nd the RS A particular algorithmused to implement the optimization scheme is described in section 3 In sec-tion 4 we illustrate how the optimization scheme works with natural stimulifor model orientation-sensitive cells with one and two relevant dimensionsmuch like simple and complex cells found in primary visual cortex as wellas for a model auditory neuron responding to natural sounds We also dis-cuss the convergence of our estimates of the RS as a function of data set sizeWe emphasize that our optimization scheme does not rely on any specicstatistical properties of the stimulus ensemble and thus can be used withnatural stimuli

4 If one suspects that neurons are sensitive to low-dimensional features of their inputone might be tempted to analyze neural responses to stimuli that explore only the (puta-tive) relevant subspace perhaps along the line of the subspace reverse correlation method(Ringach et al 1997) Our approach (like the spike-triggered covariance approach) is dif-ferent because it allows the analysis of responses to stimuli that live in the full space andinstead we let the neuron ldquotell usrdquo which low-dimensional subspace is relevant

Analyzing Neural Responses to Natural Signals 229

2 Information as an Objective Function

When analyzing neural responses we compare the a priori probability dis-tribution of all presented stimuli with the probability distribution of stimulithat lead to a spike (de Ruyter van Steveninck amp Bialek 1988) For gaus-sian signals the probability distribution can be characterized by its secondmoment the covariance matrix However an ensemble of natural stimuli isnot gaussian so that in a general case neither second nor any nite num-ber of moments is sufcient to describe the probability distribution In thissituation Shannon information provides a rigorous way of comparing twoprobability distributions The average information carried by the arrivaltime of one spike is given by Brenner Strong et al (2000)

Ispike DZ

dsPsjspike log2

microPsjspike

Ps

para (21)

where ds denotes integration over full Dndashdimensional stimulus space Theinformation per spike as written in equation 21 is difcult to estimate exper-imentally since it requires either sampling of the high-dimensional proba-bility distribution Psjspike or a model of how spikes were generated thatis the knowledge of low-dimensional RS However it is possible to calculateIspike in a model-independent way if stimuli are presented multiple timesto estimate the probability distribution Pspikejs Then

Ispike Dfrac12

Pspikejs

Pspikelog2

microPspikejs

Pspike

parafrac34

s (22)

where the average is taken over all presented stimuli This can be useful inpractice (Brenner Strong et al 2000) because we can replace the ensembleaverage his with a time average and Pspikejs with the time-dependentspike rate rt Note that for a nite data set of N repetitions the obtainedvalue IspikeN will be on average larger than Ispike1 The true value Ispikecan be found by either subtracting an expected bias value which is of theorder of raquo 1=PspikeN 2 ln 2 (Treves amp Panzeri 1995 Panzeri amp Treves1996 Pola Schultz Petersen amp Panzeri 2002 Paninski 2003b) or extrapo-lating to N 1 (Brenner Strong et al 2000 Strong Koberle de Ruytervan Steveninck amp Bialek 1998) Measurement of Ispike in this way providesa model independent benchmark against which we can compare any de-scription of the neuronrsquos input-output relation

Our assumption is that spikes are generated according to a projectiononto a low-dimensional subspace Therefore to characterize relevance of aparticular direction v in the stimulus space we project all of the presentedstimuli onto v and form probability distributions Pvx and Pvxjspike ofprojection values x for the a priori stimulus ensemble and that conditional

230 T Sharpee N Rust and W Bialek

on a spike respectively

Pvx D hplusmnx iexcl s cent vis (23)

Pvxjspike D hplusmnx iexcl s cent vjspikeis (24)

where plusmnx is a delta function In practice both the average hcent cent centis acuteR

ds cent cent centPs over the a priori stimulus ensemble and the average hcent cent cent jspikeisacute

Rds cent cent cent Psjspike over the ensemble conditional on a spike are calculated

by binning the range of projection values x The probability distributionsare then obtained as histograms normalized in a such a way that the sumover all bins gives 1 The mutual information between spike arrival timesand the projection x by analogy with equation 21 is

Iv DZ

dxPvxjspike log2

microPvxjspike

Pvx

para (25)

which is also the Kullback-Leibler divergence D[PvxjspikejjPvx] Noticethat this information is a function of the direction v The information Iv

provides an invariant measure of how much the occurrence of a spike isdetermined by projection on the direction v It is a function only of directionin the stimulus space and does not change when vector v is multiplied bya constant This can be seen by noting that for any probability distributionand any constant c Pcvx D ciexcl1Pvx=c (see also theorem 964 of Cover ampThomas 1991) When evaluated along any vector v Iv middot Ispike The totalinformation Ispike can be recovered along one particular direction only ifv D Oe1 and only if the RS is one-dimensional

By analogy with equation 25 one could also calculate information Iv1

vn along a set of several directions fv1 vng based on the multipointprobability distributions of projection values x1 x2 xn along vectors v1v2 vn of interest

Pv1vn fxigjspike D

nY

iD1

plusmnxi iexcl s cent vijspike

+

s

(26)

Pv1vn fxig D

nY

iD1

plusmnxi iexcl s cent vi

+

s

(27)

If we are successful in nding all of the directions fOeig 1 middot i middot K con-tributing to the input-output relation equation 11 then the informationevaluated in this subspace will be equal to the total information Ispike Whenwe calculate information along a set of K vectors that are slightly off fromthe RS the answer of course is smaller than Ispike and is initially quadraticin small deviations plusmnvi One can therefore hope to nd the RS by maximizinginformation with respect to K vectors simultaneously The information does

Analyzing Neural Responses to Natural Signals 231

not increase if more vectors outside the RS are included For uncorrelatedstimuli any vector or a set of vectors that maximizes Iv belongs to theRS On the other hand as discussed in appendix B the result of optimiza-tion with respect to a number of vectors k lt K may deviate from the RS ifstimuli are correlated To nd the RS we rst maximize Iv and comparethis maximum with Ispike which is estimated according to equation 22 Ifthe difference exceeds that expected from nite sampling corrections weincrement the number of directions with respect to which information issimultaneously maximized

3 Optimization Algorithm

In this section we describe a particular algorithm we used to look for themost informative dimensions in order to nd the relevant subspace Wemake no claim that our choice of the algorithm is most efcient However itdoes give reproducible results for different starting points and spike trainswith differences taken to simulate neural noise Overall choices for an algo-rithm are broader because the information Iv as dened by equation 25is a continuous function whose gradient can be computed We nd (seeappendix C for a derivation)

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para (31)

where

hsjx spikei D1

Pxjspike

Zds splusmnx iexcl s cent vPsjspike (32)

and similarly for hsjxi Since information does not change with the length ofthe vector we have v centrvI D 0 as also can be seen directly from equation 31

As an optimization algorithm we have used a combination of gradientascent and simulated annealing algorithms successive line maximizationswere done along the direction of the gradient (Press Teukolsky Vetterlingamp Flannery 1992) During line maximizations a point with a smaller valueof information was accepted according to Boltzmann statistics with proba-bility exp[IviC1 iexcl Ivi=T] The effective temperature T is reduced byfactor of 1 iexcl sup2T upon completion of each line maximization Parameters ofthe simulated annealing algorithm to be adjusted are the starting tempera-ture T0 and the cooling rate sup2T 1T D iexclsup2TT When maximizing with respectto one vector we used values T0 D 1 and sup2T D 005 When maximizing withrespect to two vectors we either used the cooling schedule with sup2T D 0005and repeated it several times (four times in our case) or allowed the effec-tive temperature T to increase by a factor of 10 upon convergence to a localmaximum (keeping T middot T0 always) while limiting the total number of linemaximizations

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 7: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 229

2 Information as an Objective Function

When analyzing neural responses we compare the a priori probability dis-tribution of all presented stimuli with the probability distribution of stimulithat lead to a spike (de Ruyter van Steveninck amp Bialek 1988) For gaus-sian signals the probability distribution can be characterized by its secondmoment the covariance matrix However an ensemble of natural stimuli isnot gaussian so that in a general case neither second nor any nite num-ber of moments is sufcient to describe the probability distribution In thissituation Shannon information provides a rigorous way of comparing twoprobability distributions The average information carried by the arrivaltime of one spike is given by Brenner Strong et al (2000)

Ispike DZ

dsPsjspike log2

microPsjspike

Ps

para (21)

where ds denotes integration over full Dndashdimensional stimulus space Theinformation per spike as written in equation 21 is difcult to estimate exper-imentally since it requires either sampling of the high-dimensional proba-bility distribution Psjspike or a model of how spikes were generated thatis the knowledge of low-dimensional RS However it is possible to calculateIspike in a model-independent way if stimuli are presented multiple timesto estimate the probability distribution Pspikejs Then

Ispike Dfrac12

Pspikejs

Pspikelog2

microPspikejs

Pspike

parafrac34

s (22)

where the average is taken over all presented stimuli This can be useful inpractice (Brenner Strong et al 2000) because we can replace the ensembleaverage his with a time average and Pspikejs with the time-dependentspike rate rt Note that for a nite data set of N repetitions the obtainedvalue IspikeN will be on average larger than Ispike1 The true value Ispikecan be found by either subtracting an expected bias value which is of theorder of raquo 1=PspikeN 2 ln 2 (Treves amp Panzeri 1995 Panzeri amp Treves1996 Pola Schultz Petersen amp Panzeri 2002 Paninski 2003b) or extrapo-lating to N 1 (Brenner Strong et al 2000 Strong Koberle de Ruytervan Steveninck amp Bialek 1998) Measurement of Ispike in this way providesa model independent benchmark against which we can compare any de-scription of the neuronrsquos input-output relation

Our assumption is that spikes are generated according to a projectiononto a low-dimensional subspace Therefore to characterize relevance of aparticular direction v in the stimulus space we project all of the presentedstimuli onto v and form probability distributions Pvx and Pvxjspike ofprojection values x for the a priori stimulus ensemble and that conditional

230 T Sharpee N Rust and W Bialek

on a spike respectively

Pvx D hplusmnx iexcl s cent vis (23)

Pvxjspike D hplusmnx iexcl s cent vjspikeis (24)

where plusmnx is a delta function In practice both the average hcent cent centis acuteR

ds cent cent centPs over the a priori stimulus ensemble and the average hcent cent cent jspikeisacute

Rds cent cent cent Psjspike over the ensemble conditional on a spike are calculated

by binning the range of projection values x The probability distributionsare then obtained as histograms normalized in a such a way that the sumover all bins gives 1 The mutual information between spike arrival timesand the projection x by analogy with equation 21 is

Iv DZ

dxPvxjspike log2

microPvxjspike

Pvx

para (25)

which is also the Kullback-Leibler divergence D[PvxjspikejjPvx] Noticethat this information is a function of the direction v The information Iv

provides an invariant measure of how much the occurrence of a spike isdetermined by projection on the direction v It is a function only of directionin the stimulus space and does not change when vector v is multiplied bya constant This can be seen by noting that for any probability distributionand any constant c Pcvx D ciexcl1Pvx=c (see also theorem 964 of Cover ampThomas 1991) When evaluated along any vector v Iv middot Ispike The totalinformation Ispike can be recovered along one particular direction only ifv D Oe1 and only if the RS is one-dimensional

By analogy with equation 25 one could also calculate information Iv1

vn along a set of several directions fv1 vng based on the multipointprobability distributions of projection values x1 x2 xn along vectors v1v2 vn of interest

Pv1vn fxigjspike D

nY

iD1

plusmnxi iexcl s cent vijspike

+

s

(26)

Pv1vn fxig D

nY

iD1

plusmnxi iexcl s cent vi

+

s

(27)

If we are successful in nding all of the directions fOeig 1 middot i middot K con-tributing to the input-output relation equation 11 then the informationevaluated in this subspace will be equal to the total information Ispike Whenwe calculate information along a set of K vectors that are slightly off fromthe RS the answer of course is smaller than Ispike and is initially quadraticin small deviations plusmnvi One can therefore hope to nd the RS by maximizinginformation with respect to K vectors simultaneously The information does

Analyzing Neural Responses to Natural Signals 231

not increase if more vectors outside the RS are included For uncorrelatedstimuli any vector or a set of vectors that maximizes Iv belongs to theRS On the other hand as discussed in appendix B the result of optimiza-tion with respect to a number of vectors k lt K may deviate from the RS ifstimuli are correlated To nd the RS we rst maximize Iv and comparethis maximum with Ispike which is estimated according to equation 22 Ifthe difference exceeds that expected from nite sampling corrections weincrement the number of directions with respect to which information issimultaneously maximized

3 Optimization Algorithm

In this section we describe a particular algorithm we used to look for themost informative dimensions in order to nd the relevant subspace Wemake no claim that our choice of the algorithm is most efcient However itdoes give reproducible results for different starting points and spike trainswith differences taken to simulate neural noise Overall choices for an algo-rithm are broader because the information Iv as dened by equation 25is a continuous function whose gradient can be computed We nd (seeappendix C for a derivation)

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para (31)

where

hsjx spikei D1

Pxjspike

Zds splusmnx iexcl s cent vPsjspike (32)

and similarly for hsjxi Since information does not change with the length ofthe vector we have v centrvI D 0 as also can be seen directly from equation 31

As an optimization algorithm we have used a combination of gradientascent and simulated annealing algorithms successive line maximizationswere done along the direction of the gradient (Press Teukolsky Vetterlingamp Flannery 1992) During line maximizations a point with a smaller valueof information was accepted according to Boltzmann statistics with proba-bility exp[IviC1 iexcl Ivi=T] The effective temperature T is reduced byfactor of 1 iexcl sup2T upon completion of each line maximization Parameters ofthe simulated annealing algorithm to be adjusted are the starting tempera-ture T0 and the cooling rate sup2T 1T D iexclsup2TT When maximizing with respectto one vector we used values T0 D 1 and sup2T D 005 When maximizing withrespect to two vectors we either used the cooling schedule with sup2T D 0005and repeated it several times (four times in our case) or allowed the effec-tive temperature T to increase by a factor of 10 upon convergence to a localmaximum (keeping T middot T0 always) while limiting the total number of linemaximizations

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 8: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

230 T Sharpee N Rust and W Bialek

on a spike respectively

Pvx D hplusmnx iexcl s cent vis (23)

Pvxjspike D hplusmnx iexcl s cent vjspikeis (24)

where plusmnx is a delta function In practice both the average hcent cent centis acuteR

ds cent cent centPs over the a priori stimulus ensemble and the average hcent cent cent jspikeisacute

Rds cent cent cent Psjspike over the ensemble conditional on a spike are calculated

by binning the range of projection values x The probability distributionsare then obtained as histograms normalized in a such a way that the sumover all bins gives 1 The mutual information between spike arrival timesand the projection x by analogy with equation 21 is

Iv DZ

dxPvxjspike log2

microPvxjspike

Pvx

para (25)

which is also the Kullback-Leibler divergence D[PvxjspikejjPvx] Noticethat this information is a function of the direction v The information Iv

provides an invariant measure of how much the occurrence of a spike isdetermined by projection on the direction v It is a function only of directionin the stimulus space and does not change when vector v is multiplied bya constant This can be seen by noting that for any probability distributionand any constant c Pcvx D ciexcl1Pvx=c (see also theorem 964 of Cover ampThomas 1991) When evaluated along any vector v Iv middot Ispike The totalinformation Ispike can be recovered along one particular direction only ifv D Oe1 and only if the RS is one-dimensional

By analogy with equation 25 one could also calculate information Iv1

vn along a set of several directions fv1 vng based on the multipointprobability distributions of projection values x1 x2 xn along vectors v1v2 vn of interest

Pv1vn fxigjspike D

nY

iD1

plusmnxi iexcl s cent vijspike

+

s

(26)

Pv1vn fxig D

nY

iD1

plusmnxi iexcl s cent vi

+

s

(27)

If we are successful in nding all of the directions fOeig 1 middot i middot K con-tributing to the input-output relation equation 11 then the informationevaluated in this subspace will be equal to the total information Ispike Whenwe calculate information along a set of K vectors that are slightly off fromthe RS the answer of course is smaller than Ispike and is initially quadraticin small deviations plusmnvi One can therefore hope to nd the RS by maximizinginformation with respect to K vectors simultaneously The information does

Analyzing Neural Responses to Natural Signals 231

not increase if more vectors outside the RS are included For uncorrelatedstimuli any vector or a set of vectors that maximizes Iv belongs to theRS On the other hand as discussed in appendix B the result of optimiza-tion with respect to a number of vectors k lt K may deviate from the RS ifstimuli are correlated To nd the RS we rst maximize Iv and comparethis maximum with Ispike which is estimated according to equation 22 Ifthe difference exceeds that expected from nite sampling corrections weincrement the number of directions with respect to which information issimultaneously maximized

3 Optimization Algorithm

In this section we describe a particular algorithm we used to look for themost informative dimensions in order to nd the relevant subspace Wemake no claim that our choice of the algorithm is most efcient However itdoes give reproducible results for different starting points and spike trainswith differences taken to simulate neural noise Overall choices for an algo-rithm are broader because the information Iv as dened by equation 25is a continuous function whose gradient can be computed We nd (seeappendix C for a derivation)

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para (31)

where

hsjx spikei D1

Pxjspike

Zds splusmnx iexcl s cent vPsjspike (32)

and similarly for hsjxi Since information does not change with the length ofthe vector we have v centrvI D 0 as also can be seen directly from equation 31

As an optimization algorithm we have used a combination of gradientascent and simulated annealing algorithms successive line maximizationswere done along the direction of the gradient (Press Teukolsky Vetterlingamp Flannery 1992) During line maximizations a point with a smaller valueof information was accepted according to Boltzmann statistics with proba-bility exp[IviC1 iexcl Ivi=T] The effective temperature T is reduced byfactor of 1 iexcl sup2T upon completion of each line maximization Parameters ofthe simulated annealing algorithm to be adjusted are the starting tempera-ture T0 and the cooling rate sup2T 1T D iexclsup2TT When maximizing with respectto one vector we used values T0 D 1 and sup2T D 005 When maximizing withrespect to two vectors we either used the cooling schedule with sup2T D 0005and repeated it several times (four times in our case) or allowed the effec-tive temperature T to increase by a factor of 10 upon convergence to a localmaximum (keeping T middot T0 always) while limiting the total number of linemaximizations

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 9: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 231

not increase if more vectors outside the RS are included For uncorrelatedstimuli any vector or a set of vectors that maximizes Iv belongs to theRS On the other hand as discussed in appendix B the result of optimiza-tion with respect to a number of vectors k lt K may deviate from the RS ifstimuli are correlated To nd the RS we rst maximize Iv and comparethis maximum with Ispike which is estimated according to equation 22 Ifthe difference exceeds that expected from nite sampling corrections weincrement the number of directions with respect to which information issimultaneously maximized

3 Optimization Algorithm

In this section we describe a particular algorithm we used to look for themost informative dimensions in order to nd the relevant subspace Wemake no claim that our choice of the algorithm is most efcient However itdoes give reproducible results for different starting points and spike trainswith differences taken to simulate neural noise Overall choices for an algo-rithm are broader because the information Iv as dened by equation 25is a continuous function whose gradient can be computed We nd (seeappendix C for a derivation)

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para (31)

where

hsjx spikei D1

Pxjspike

Zds splusmnx iexcl s cent vPsjspike (32)

and similarly for hsjxi Since information does not change with the length ofthe vector we have v centrvI D 0 as also can be seen directly from equation 31

As an optimization algorithm we have used a combination of gradientascent and simulated annealing algorithms successive line maximizationswere done along the direction of the gradient (Press Teukolsky Vetterlingamp Flannery 1992) During line maximizations a point with a smaller valueof information was accepted according to Boltzmann statistics with proba-bility exp[IviC1 iexcl Ivi=T] The effective temperature T is reduced byfactor of 1 iexcl sup2T upon completion of each line maximization Parameters ofthe simulated annealing algorithm to be adjusted are the starting tempera-ture T0 and the cooling rate sup2T 1T D iexclsup2TT When maximizing with respectto one vector we used values T0 D 1 and sup2T D 005 When maximizing withrespect to two vectors we either used the cooling schedule with sup2T D 0005and repeated it several times (four times in our case) or allowed the effec-tive temperature T to increase by a factor of 10 upon convergence to a localmaximum (keeping T middot T0 always) while limiting the total number of linemaximizations

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 10: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

232 T Sharpee N Rust and W Bialek

10shy 4

10shy 2

1000

200

400

600

0 05 10

1

2

3w h ite no ise stim u li

IIsp ike

IIspike

P(II sp

ike )

natu ralis tic s tim u li

(a )

(b )

(c)

Figure 2 The probability distribution of information values in units of the totalinformation per spike in the case of (a) uncorrelated binary noise stimuli (b)correlated gaussian noise with power spectrum of natural scenes and (c) stimuliderived from natural scenes (patches of photos) The distribution was obtainedby calculating information along 105 random vectors for a model cell with onerelevant dimension Note the different scales in the two panels

The problem of maximizing a function often is related to the problemof making a good initial guess It turns out however that the choice of astarting point is much less crucial in cases where the stimuli are correlatedTo illustrate this point we plot in Figure 2 the probability distribution ofinformation along random directions v for both white noise and naturalisticstimuli in a model with one relevant dimension For uncorrelated stimulinot only is information equal to zero for a vector that is perpendicularto the relevant subspace but in addition the derivative is equal to zeroSince a randomly chosen vector has on average a small projection on therelevant subspace (compared to its length) vr=jvj raquo

pn=d the corresponding

information can be found by expanding in vr=jvj

I frac14 v2r

2jvj2

ZdxP Oeir

x

AacuteP0

Oeirx

POeirx

2

[hsOerjspikei iexcl hsOeri]2 (33)

where vector v D vr Oer C vir Oeir is decomposed in its components inside andoutside the RS respectively The average information for a random vectoris therefore raquo hv2

r i=jvj2 D K=DIn cases where stimuli are drawn from a gaussian ensemble with cor-

relations an expression for the information values has a similar structureto equation 33 To see this we transform to Fourier space and normalizeeach Fourier component by the square root of the power spectrum Sk

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 11: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 233

In this new basis both the vectors feig 1 middot i middot K forming the RS and therandomly chosen vector v along which information is being evaluated areto be multiplied by

pSk Thus if we now substitute for the dot product

v2r the convolution weighted by the power spectrum

PKi v curren Oei

2 where

v curren Oei DP

k vkOeikSkpPk v2kSk

pPk Oe2

i kSk (34)

then equation 33 will describe information values along randomly chosenvectors v for correlated gaussian stimuli with the power spectrum SkAlthough both vr and vk are gaussian variables with variance raquo 1=Dthe weighted convolution has not only a much larger variance but is alsostrongly nongaussian (the nongaussian character is due to the normalizingfactor

Pk v2kSk in the denominator of equation 34) As for the variance

it can be estimated as lt v curren Oe12 gtD 4frac14= ln2 D in cases where stimuli aretaken as patches of correlated gaussian noise with the two-dimensionalpower spectrum Sk D A=k2 The large values of the weighted dot productv curren Oei 1 middot i middot K result not only in signicant information values along arandomly chosen vector but also in large magnitudes of the derivative rIwhich is no longer dominated by noise contrary to the case of uncorrelatedstimuli In the end we nd that randomly choosing one of the presentedframes as a starting guess is sufcient

4 Results

We tested the scheme of looking for the most informative dimensions onmodel neurons that respond to stimuli derived from natural scenes andsounds As visual stimuli we used scans across natural scenes which weretaken as black and white photos digitized to 8 bits with no correctionsmade for the camerarsquos light-intensity transformation function Some statis-tical properties of the stimulus set are shown in Figure 3 Qualitatively theyreproduce the known results on the statistics of natural scenes (Ruderman ampBialek 1994 Ruderman 1994 Dong amp Atick 1995 Simoncelli amp Olshausen2001) Most important properties for this study are strong spatial correla-tions as evident from the power spectrum S(k) plotted in Figure 3b anddeviations of the probability distribution from a gaussian one The nongaus-sian character can be seen in Figure 3c where the probability distributionof intensities is shown and in Figure 3d which shows the distribution ofprojections on a Gabor lter (in what follows the units of projections suchas s1 will be given in units of the corresponding standard deviations) Ourgoal is to demonstrate that although the correlations present in the ensem-ble are nongaussian they can be removed successfully from the estimate ofvectors dening the RS

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 12: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

234 T Sharpee N Rust and W Bialek

shy 2 0 210

shy 2

100

200 400 600

100

200

300

400

shy 4 shy 2 0 2 4 10

shy 5

100

105

10shy 1

100

10shy 5

100

105

P(s1 ) P(I)

S(k) s2(I)

(a) (b)

(c) (d)

(Ishy ltIgt ) s(I) (s1shy lts

1gt ) s(s

1 )

ka p

Figure 3 Statistical properties of the visual stimulus ensemble (a) One of thephotos Stimuli would be 30 pound 30 patches taken from the overall photograph(b) The power spectrum in units of light intensity variance frac34 2I averaged overorientation as a function of dimensionless wave vector ka where a is the pixelsize (c) The probability distribution of light intensity in units of frac34 I (d) Theprobability distribution of projections between stimuli and a Gabor lter alsoin units of the corresponding standard deviation frac34 s1

41 A Model Simple Cell Our rst example is based on the propertiesof simple cells found in the primary visual cortex A model phase- andorientation-sensitive cell has a single relevant dimension Oe1 shown in Fig-ure 4a A given stimulus s leads to a spike if the projection s1 D s cent Oe1 reachesa threshold value micro in the presence of noise

Pspikejs

Pspikeacute f s1 D hHs1 iexcl micro C raquo i (41)

where a gaussian random variable raquo of variance frac34 2 models additive noiseand the function Hx D 1 for x gt 0 and zero otherwise Together with theRF Oe1 the parameters micro for threshold and the noise variance frac34 2 determinethe input-output function

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 13: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 235

Figure 4 Analysis of a model simple cell with RF shown in (a) (b) The ldquoexactrdquospike-triggered average vsta (c) An attempt to remove correlations accordingto the reverse correlation method Ciexcl1

a priorivsta (d) The normalized vector Ovmax

found by maximizing information (e) Convergence of the algorithm accordingto information Iv and projection Ovcent Oe1 between normalized vectors as a functionof the inverse effective temperature Tiexcl1 (f) The probability of a spike Pspikejs centOvmax (crosses) is compared to Pspikejs1 used in generating spikes (solid line)Parameters of the model are frac34 D 031 and micro D 184 both given in units ofstandard deviation of s1 which is also the units for the x-axis in f

When the spike-triggered average (STA) or reverse correlation functionis computed from the responses to correlated stimuli the resulting vectorwill be broadened due to spatial correlations present in the stimuli (see Fig-ure 4b) For stimuli that are drawn from a gaussian probability distributionthe effects of correlations could be removed by multiplying vsta by the in-verse of the a priori covariance matrix according to the reverse correlationmethod OvGaussian est Ciexcl1

a priorivsta equation A2 However this proceduretends to amplify noise To separate errors due to neural noise from those

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 14: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

236 T Sharpee N Rust and W Bialek

due to the nongaussian character of correlations note that in a model theeffect of neural noise on our estimate of the STA can be eliminated by aver-aging the presented stimuli weighted with the exact ring rate as opposedto using a histogram of responses to estimate Pspikejs from a nite set oftrials We have used this ldquoexactrdquo STA

vsta DZ

ds sPsjspike D 1Pspike

ZdsPs sPspikejs (42)

in calculations presented in Figures 4b and 4c Even with this noiseless STA(the equivalent of collecting an innite data set) the standard decorrelationprocedure is not valid for nongaussian stimuli and nonlinear input-outputfunctions as discussed in detail in appendix A The result of such a decorre-lation in our example is shown in Figure 4c It clearly is missing some of thestructure in the model lter with projection Oe1 cent OvGaussian est frac14 014 The dis-crepancy is not due to neural noise or nite sampling since the ldquoexactrdquo STAwas decorrelated the absence of noise in the exact STA also means that therewould be no justication for smoothing the results of the decorrelation Thediscrepancy between the true RF and the decorrelated STA increases withthe strength of nonlinearity in the input-output function

In contrast it is possible to obtain a good estimate of the relevant di-mension Oe1 by maximizing information directly (see Figure 4d) A typicalprogress of the simulated annealing algorithm with decreasing temperatureT is shown in Figure 4e There we plot both the information along the vectorand its projection on Oe1 We note that while information I remains almostconstant the value of projection continues to improve Qualitatively this isbecause the probability distributions depend exponentially on informationThe nal value of projection depends on the size of the data set as discussedbelow In the example shown in Figure 4 there were frac14 50 000 spikes withaverage probability of spike frac14 005 per frame and the reconstructed vectorhas a projection Ovmax cent Oe1 D 0920 sect 0006 Having estimated the RF onecan proceed to sample the nonlinear input-output function This is done byconstructing histograms for Ps cent Ovmax and Ps cent Ovmaxjspike of projectionsonto vector Ovmax found by maximizing information and taking their ratioas in equation 12 In Figure 4f we compare Pspikejs cent Ovmax (crosses) withthe probability Pspikejs1 used in the model (solid line)

42 Estimated Deviation from the Optimal Dimension When infor-mation is calculated from a nite data set the (normalized) vector Ov whichmaximizes I will deviate from the true RF Oe1 The deviation plusmnv D OviexclOe1 arisesbecause the probability distributions are estimated from experimental his-tograms and differ from the distributions found in the limit of innite datasize For a simple cell the quality of reconstruction can be characterized bythe projection Ov cent Oe1 D 1 iexcl 1

2plusmnv2 where both Ov and Oe1 are normalized andplusmnv is by denition orthogonal to Oe1 The deviation plusmnv raquo Aiexcl1rI where A is

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 15: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 237

the Hessian of information Its structure is similar to that of a covariancematrix

Aij D1

ln 2

ZdxPxjspike

sup3ddx

lnPxjspike

Px

acute2

pound hsisjjxi iexcl hsijxihsjjxi (43)

When averaged over possible outcomes of N trials the gradient of infor-mation is zero for the optimal direction Here in order to evaluate hplusmnv2i DTr[Aiexcl1hrIrITiAiexcl1] we need to know the variance of the gradient of I As-suming that the probability of generating a spike is independent for differentbins we can estimate hrIirIji raquo Aij=Nspike ln 2 Therefore an expected er-ror in the reconstruction of the optimal lter is inversely proportional tothe number of spikes The corresponding expected value of the projectionbetween the reconstructed vector and the relevant direction Oe1 is given by

Ov cent Oe1 frac14 1 iexcl 12

hplusmnv2i D 1 iexcl Tr0[Aiexcl1]2Nspike ln 2

(44)

where Tr0 means that the trace is taken in the subspace orthogonal to themodel lter5 The estimate equation 44 can be calculated without knowl-edge of the underlying model it is raquo D=2Nspike This behavior should alsohold in cases where the stimulus dimensions are expanded to include timeThe errors are expected to increase in proportion to the increased dimen-sionality In the case of a complex cell with two relevant dimensions theerror is expected to be twice that for a cell with single relevant dimensionalso discussed in section 43

We emphasize that the error estimate according to equation 44 is of thesame order as errors of the reverse correlation method when it is appliedfor gaussian ensembles The latter are given by Tr[Ciexcl1] iexcl Ciexcl1

11 =[2Nspike

h f02s1i] Of course if the reverse correlation method were to be applied

to the nongaussian ensemble the errors would be larger In Figure 5 weshow the result of simulations for various numbers of trials and thereforeNspike The average projection of the normalized reconstructed vector Ov onthe RF Oe1 behaves initially as 1=Nspike (dashed line) For smaller data setsin this case Nspikes raquolt 30 000 corrections raquo Niexcl2

spikes become important forestimating the expected errors of the algorithm Happily these correctionshave a sign such that smaller data sets are more effective than one mighthave expected from the asymptotic calculation This can be veried fromthe expansion Ov cent Oe1 D [1 iexcl plusmnv2]iexcl1=2 frac14 1 iexcl 1

2 hplusmnv2i C 38 hplusmnv4i were only the rst

two terms were taken into account in equation 44

5 By denition plusmnv1 D plusmnv cent Oe1 D 0 and therefore hplusmnv21i Aiexcl1

11 is to be subtracted fromhplusmnv2i Tr[Aiexcl1] Because Oe1 is an eigenvector of A with zero eigenvalue Aiexcl1

11 is inniteTherefore the proper treatment is to take the trace in the subspace orthogonal to Oe1

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 16: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

238 T Sharpee N Rust and W Bialek

0 02 04 06 08x 10

shy 4

075

08

085

09

095

1

Nshy 1spike

e 1 times v m

ax

Figure 5 Projection of vector Ovmax that maximizes information on RF Oe1 is plot-ted as a function of the number of spikes The solid line is a quadratic t in1=Nspike and the dashed line is the leading linear term in 1=Nspike This set ofsimulations was carried out for a model visual neuron with one relevant di-mension from Figure 4a and the input-output function (see equation 41) withparameter values frac34 frac14 061frac34 s1 micro frac14 061frac34 s1 For this model neuron the linearapproximation for the expected error is applicable for Nspike raquogt 30 000

43 A Model Complex Cell A sequence of spikes from a model cell withtwo relevant dimensions was simulated by projecting each of the stimuli onvectors that differ by frac14=2 in their spatial phase taken to mimic properties ofcomplex cells as in Figure 6 A particular frame leads to a spike accordingto a logical OR that is if either js1j or js2j exceeds a threshold value micro in thepresence of noise where s1 D s cent Oe1 s2 D s cent Oe2 Similarly to equation 41

Pspikejs

PspikeD f s1 s2 D hHjs1j iexcl micro iexcl raquo1 _ Hjs2j iexcl micro iexcl raquo2i (45)

where raquo1 and raquo2 are independent gaussian variables The sampling of thisinput-output function by our particular set of natural stimuli is shown inFigure 6c As is well known reverse correlation fails in this case because thespikendashtriggered average stimulus is zero although with gaussian stimulithe spike-triggered covariance method would recover the relevant dimen-sions (Touryan et al 2002) Here we show that searching for maximallyinformative dimensions allows us to recover the relevant subspace evenunder more natural stimulus conditions

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 17: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 239

Figure 6 Analysis of a model complex cell with relevant dimensions Oe1 and Oe2

shown in (a) and (b) respectively Spikes are generated according to an ldquoORrdquoinput-output function f s1 s2 with the threshold micro frac14 061frac34 s1 and noise stan-dard deviation frac34 D 031frac34 s1 (cd) Vectors v1 and v2 found by maximizinginformation Iv1 v2

We start by maximizing information with respect to one vector Contraryto the result in Figure 4e for a simple cell one optimal dimension recoversonly about 60 of the total information per spike (see equation 22) Perhapssurprisingly because of the strong correlations in natural scenes even a pro-jection onto a random vector in the D raquo 103-dimensional stimulus spacehas a high probability of explaining 60 of total information per spike ascan be seen in Figure 2 We therefore go on to maximize information withrespect to two vectors As a result of maximization we obtain two vectors v1and v2 shown in Figure 6 The information along them is Iv1 v2 frac14 090which is within the range of information values obtained along differentlinear combinations of the two model vectors IOe1 Oe2=Ispike D 089 sect 011Therefore the description of neuronrsquos ring in terms of vectors v1 and v2is complete up to the noise level and we do not have to look for extra rel-evant dimensions Practically the number of relevant dimensions can bedetermined by comparing Iv1 v2 to either Ispike or Iv1 v2 v3 the latterbeing the result of maximization with respect to three vectors simultane-ously As mentioned in section 1 information along set a of vectors does notincrease when extra dimensions are added to the relevant subspace There-fore if Iv1 v2 D Iv1 v2 v3 (again up to the noise level) this means that

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 18: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

240 T Sharpee N Rust and W Bialek

there are only two relevant dimensions Using Ispike for comparison withIv1 v2 has the advantage of not having to look for an extra dimensionwhich can be computationally intensive However Ispike might be subjectto larger systematic bias errors than Iv1 v2 v3

Vectors v1 and v2 obtained by maximizing Iv1 v2 are not exactly or-thogonal and are also rotated with respect to Oe1 and Oe2 However the qualityof reconstruction as well as the value of information Iv1 v2 is indepen-dent of a particular choice of basis with the RS The appropriate measure ofsimilarity between the two planes is the dot product of their normals In theexample of Figure 6 OnOe1Oe2 cent Onv1 v2 D 082sect007 where OnOe1Oe2 is a normal tothe plane passing through vectors Oe1 and Oe2 Maximizing information withrespect to two dimensions requires a signicantly slower cooling rate andconsequently longer computational times However the expected error inthe reconstruction 1 iexcl OnOe1 Oe2 cent Onv1 v2 scales as 1=Nspike behavior similarto equation 44 and is roughly twice that for a simple cell given the samenumber of spikes We make vectors v1 and v2 orthogonal to each othersupon completion of the algorithm

44 A Model Auditory Neuron with One Relevant Dimension Be-cause stimuli s are treated as vectors in an abstract space the method oflooking for the most informative dimensions can be applied equally wellto auditory as well as to visual neurons Here we illustrate the method byconsidering a model auditory neuron with one relevant dimension which isshown in Figure 7c and is taken to mimic the properties of cochlear neuronsThe model neuron is probed by two ensembles of naturalistic stimuli oneis a recording of a native Russian speaker reading a piece of Russian proseand the other is a recording of a piece of English prose read by a nativeEnglish speaker Both ensembles are nongaussian and exhibit amplitudedistributions with long nearly exponential tails (see Figure 7a) which arequalitatively similar to those of light intensities in natural scenes (Voss ampClarke 1975 Ruderman 1994) However the power spectrum is different inthe two cases as can be seen in Figure 7b The differences in the correlationstructure in particular lead to different STAs across the two ensembles (cfFigure 7d) Both of the STAs also deviate from the model lter shown inFigure 7c

Despite differences in the probability distributions Ps it is possibleto recover the relevant dimension of the model neuron by maximizing in-formation In Figure 7c we show the two most informative vectors foundby running the algorithm for the two ensembles and replot the model lterfrom Figure 7c to show that the three vectors overlap almost perfectly Thusdifferent nongaussian correlations can be successfully removed to obtain anestimate of the relevant dimension If the most informative vector changeswith the stimulus ensemble this can be interpreted as caused by adaptationto the probability distribution

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 19: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 241

shy 10 0 1010

shy 6

10shy 4

10shy 2

100

5 10 15 20

0001

001

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

0 5 10

shy 01

0

01

shy 5 0 5 0

05

1

11

2

(a) (b)

(c) (d)

(e)

2

1

t(ms)

t(ms)

t(ms)

P(A

mpl

itude

)

Amplitude (units of RMS) frequency (kHz)

S(w)

2

(f)

P(s

pike

| s times

vm

ax )

s times vmax

Figure 7 A model auditory neuron is probed by two natural ensembles of stim-uli a piece of English prose (1) and a piece of of Russian prose (2) The sizeof the stimulus ensemble was the same in both cases and the sampling ratewas 441 kHz (a) The probability distribution of the sound pressure amplitudein units of standard deviation for both ensembles is strongly nongaussian (b)The power spectra for the two ensembles (c) The relevant vector of the modelneuron of dimensionality D D 500 (d) The STA is broadened in both cases butdiffers between the two cases due to differences in the power spectra of the twoensembles (e) Vectors that maximize information for either of the ensemblesoverlap almost perfectly with each other and with the model lter which is alsoreplotted here from c (f) The probability of a spike Pspikejs cent Ovmax (crosses) iscompared to Pspikejs1 used in generating spikes (solid line) The input-outputfunction had parameter values frac34 frac14 09frac34 s1 and micro frac14 18frac34 s1

5 Summary

Features of the stimulus that are most relevant for generating the responseof a neuron can be found by maximizing information between the sequenceof responses and the projection of stimuli on trial vectors within the stim-ulus space Calculated in this manner information becomes a function of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 20: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

242 T Sharpee N Rust and W Bialek

direction in stimulus space Those vectors that maximize the informationand account for the total information per response of interest span the rel-evant subspace The method allows multiple dimensions to be found Thereconstruction of the relevant subspace is done without assuming a partic-ular form of the input-output function It can be strongly nonlinear withinthe relevant subspace and is estimated from experimental histograms foreach trial direction independently Most important this method can be usedwith any stimulus ensemble even those that are strongly nongaussian asin the case of natural signals We have illustrated the method on modelneurons responding to natural scenes and sounds We expect the currentimplementation of the method to be most useful when several most infor-mative vectors (middot 10 depending on their dimensionality) are to be analyzedfor neurons probed by natural scenes This technique could be particularlyuseful in describing sensory processing in poorly understood regions ofhigher-level sensory cortex (such as visual areas V2 V4 and IT and audi-tory cortex beyond A1) where white noise stimulation is known to be lesseffective than naturalistic stimuli

Appendix A Limitations of the Reverse Correlation Method

Here we examine what sort of deviations one can expect when applying thereverse correlation method to natural stimuli even in the model with just onerelevant dimension There are two factors that when combined invalidatethe reverse correlation method the nongaussian character of correlationsand the nonlinearity of the input-output function (Ringach Sapiro amp Shap-ley 1997) In its original formulation (de Boer amp Kuyper 1968) the neuronis probed by white noise and the relevant dimension Oe1 is given by theSTA Oe1 hsrsi If the signals are not white that is the covariance matrixCij D hsisji is not a unit matrix then the STA is a broadened version of theoriginal lter Oe1 This can be seen by noting that for any function Fs ofgaussian variables fsig the identity holds

hsiFsi D hsisjihsj Fsi j acute sj (A1)

When property A1 isapplied to the vector components of the STA hsirsi DCijhjrsi Since we work within the assumption that the ring rate is a(nonlinear) function of projection onto one lter Oe1 rs D rs1 the latteraverage is proportional to the model lter itself hjri D Oe1jhr0s1i Thereforewe arrive at the prescription of the reverse correlation method

Oe1i [Ciexcl1]ijhsjrsi (A2)

The gaussian property is necessary in order to represent the STA as a convo-lution of the covariance matrix Cij of the stimulus ensemble and the model

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 21: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 243

lter To understand how the reconstructed vector obtained according toequation A2 deviates from the relevant one we consider weakly nongaus-sian stimuli with the probability distribution

PnGs D 1Z

P0sesup2H1s (A3)

where P0s is the gaussian probability distribution with covariance matrixC and the normalization factor Z D hesup2H1si The function H1 describesdeviations of the probability distribution from gaussian and therefore wewill set hsiH1i D 0 and hsisjH1i D 0 since these averages can be accountedfor in the gaussian ensemble In what follows we will keep only the rst-order terms in perturbation parameter sup2 Using property A1 we nd theSTA to be given by

hsirinG D hsisjipoundhjri C sup2hrjH1i

curren (A4)

where averages are taken with respect to the gaussian distribution Simi-larly the covariance matrix Cij evaluated with respect to the nongaussianensemble is given by

Cij D1Z

hsisjesup2H1i D hsisji C sup2hsiskihsjkH1i (A5)

so that to the rst order in sup2 hsisji D Cij iexcl sup2CikhsjkH1i Combining thiswith equation A4 we get

hsirinG D const pound Cij Oe1 j C sup2Cijhiexclr iexcl s1hr0i

centjH1i (A6)

The second term in equation A6 prevents the application of the reversecorrelation method for nongaussian signals Indeed if we multiply the STAequation A6 with the inverse of the a priori covariance matrix Cij accordingto the reverse correlation method equation A2 we no longer obtain the RFOe1 The deviation of the obtained answer from the true RF increases with sup2which measures the deviation of the probability distribution from gaussianSince natural stimuli are known to be strongly nongaussian this makes theuse of the reverse correlation problematic when analyzing neural responsesto natural stimuli

The difference in applying the reverse correlation to stimuli drawn froma correlated gaussian ensemble versus a nongaussian one is illustrated inFigures 8b and 8c In the rst case shown in Figure 8b stimuli are drawnfrom a correlated gaussian ensemble with the covariance matrix equal tothat of natural images In the second case shown in Figure 8c the patches ofphotos are taken as stimuli The STA is broadened in both cases Althoughthe two-point correlations are just as strong in the case of gaussian stimuli

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 22: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

244 T Sharpee N Rust and W Bialek

Figure 8 The nongaussian character of correlations present in natural scenesinvalidates the reverse correlation method for neurons with a nonlinear input-output function (a) A model visual neuron has one relevant dimension Oe1 andthe nonlinear input-output function The ldquoexactrdquo STA is used (see equation 42)to separate effects of neural noise from alterations introduced by the methodThe decorrelated ldquoexactrdquo STA is obtained by multiplying the ldquoexactrdquo STA bythe inverse of the covariance matrix according to equation A2 (b) Stimuli aretaken from a correlated gaussian noise ensemble The effect of correlations inSTA can be removed according to equation A2 (c) When patches of photos aretaken as stimuli for the same model neuron as in b the decorrelation proceduregives an altered version of the model lter The two stimulus ensembles havethe same covariance matrix

as they are in the natural stimuli ensemble gaussian correlations can besuccessfully removed from the STA according to equation A2 to obtain themodel lter On the contrary an attempt to use reverse correlation withnatural stimuli results in an altered version of the model lter We reiteratethat for this example the apparent noise in the decorrelated vector is not

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 23: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 245

due to neural noise or nite data sets since the ldquoexactrdquo STA has been used(see equation 42) in all calculations presented in Figures 8 and 9

The reverse correlation method gives the correct answer for any distribu-tion of signals if the probability of generating a spike is a linear function ofsi since then the second term in equation A6 is zero In particular a linearinput-output relation could arise due to a neural noise whose variance ismuch larger than the variance of the signal itself This point is illustrated inFigures 9a 9b and 9c where the reverse correlation method is applied to athreshold input-output function at low moderate and high signal-to-noiseratios For small signal-to-noise ratios where the noise standard deviationis similar to that of projections s1 the threshold nonlinearity in the input-output function is masked by noise and is effectively linear In this limitthe reverse correlation can be applied with the exact STA However for ex-perimentally calculated STA at low signal-to-noise ratios the decorrelationprocedure results in strong noise amplication At higher signal-to-noise ra-tios decorrelation fails due to the nonlinearity of the input-output functionin accordance with equation A6

Appendix B Maxima of Iv What Do They Mean

The relevant subspace of dimensionality K can be found by maximizinginformation simultaneously with respect to K vectors The result of max-imization with respect to a number of vectors that is less than the truedimensionality of the relevant subspace may produce vectors that havecomponents in the irrelevant subspace This happens only in the presenceof correlations in stimuli As an illustration we consider the situation wherethe dimensionality of the relevant subspace K D 2 and vector Oe1 describesthe most informative direction within the relative subspace We show herethat although the gradient of information is perpendicular to both Oe1 andOe2 it may have components outside the relevant subspace Therefore thevector vmax that corresponds to the maximum of Iv will lie outside therelevant subspace We recall from equation 31 that

rIOe1 DZ

ds1Ps1d

ds1

Ps1jspike

Ps1hsjs1 spikei iexcl hsjs1i (B1)

We can rewrite the conditional averages hsjs1i DR

ds2Ps1 s2hsjs1 s2i=Ps1

and hsjs1 spikei DR

ds2 f s1 s2Ps1 s2hsjs1 s2i=Ps1jspike so that

rIOe1 DZ

ds1ds2Ps1 s2hsjs1 s2iPspikejs1 s2 iexcl Pspikejs1

Pspike

pound dds1

lnPs1jspike

Ps1 (B2)

Because we assume that the vector Oe1 is the most informative within therelevant subspace Oe1rI D Oe2rI D 0 so that the integral in equation B2

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 24: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

246 T Sharpee N Rust and W Bialek

Figure 9 Application of the reverse correlation method to a model visual neuronwith one relevant dimension Oe1 and a threshold input-output functionof decreas-ing values of noise variance frac34=frac34s1s frac14 61 061 006 in a b and c respectivelyThe model Pspikejs1 becomes effectively linear when the signal-to-noise ratiois small The reverse correlation can be used together with natural stimuli if theinput-output function is linear Otherwise the deviations between the decorre-lated STA and the model lter increase with the nonlinearity of Pspikejs1

is zero for those directions in which the component of the vector hsjs1 s2ichanges linearly with s1 and s2 For uncorrelated stimuli this is true forall directions so that the most informative vector within the relevant sub-space is also the most informative in the overall stimulus space In thepresence of correlations the gradient may have nonzero components alongsome irrelevant directions if projection of the vector hsjs1 s2i on those di-rections is not a linear function of s1 and s2 By looking for a maximumof information we will therefore be driven outside the relevant subspaceThe deviation of vmax from the relevant subspace is also proportional to the

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 25: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 247

strength of the dependence on the second parameter s2 because of the factor[Ps1 s2jspike=Ps1 s2 iexcl Ps1jspike=Ps1] in the integrand

Appendix C The Gradient of Information

According to expression 25 the information Iv depends on the vector vonly through the probability distributions Pvx and Pvxjspike Thereforewe can express the gradient of information in terms of gradients of thoseprobability distributions

rvI D1

ln 2

Zdx

microln

Pvxjspike

PvxrvPvxjspike

iexcl Pvxjspike

PvxrvPvx

para (C1)

where we took into account thatR

dxPvxjspike D 1 and does not changewith v To nd gradients of the probability distributions we note that

rvPvx D rv

microZdsPsplusmnx iexcl s cent v

paraD iexcl

ZdsPssplusmn0x iexcl s cent v

D iexcl ddx

poundpxhsjxi

curren (C2)

and analogously for Pvxjspike

rvPvxjspike D iexcl ddx

poundpxjspikehsjx spikei

curren (C3)

Substituting expressions C2 and C3 into equation C1 and integrating onceby parts we obtain

rvI DZ

dxPvxpoundhsjx spikei iexcl hsjxi

currencentmicro

ddx

Pvxjspike

Pvx

para

which is expression 31 of the main text

Acknowledgments

We thank KD Miller for many helpful discussions Work at UCSF was sup-ported in part by the Sloan and Swartz foundations and by a training grantfrom the NIH Our collaboration began at the Marine Biological Labora-tory in a course supported by grants from NIMH and the Howard HughesMedical Institute

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 26: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

248 T Sharpee N Rust and W Bialek

References

Aguera y Arcas B Fairhall A L amp Bialek W (2003) Computation in a singleneuron Hodgkin and Huxley revisited Neural Comp 15 1715ndash1749

Baddeley R Abbott L F Booth MCA Sengpiel F Freeman T WakemanE A amp Rolls E T (1997) Responses of neurons in primary and inferiortemporal visual cortices to natural scenes Proc R Soc Lond B 264 1775ndash1783

Barlow H (1961) Possible principles underlying the transformations of sen-sory images In W Rosenblith (Ed) Sensory communication (pp 217ndash234)Cambridge MA MIT Press

Barlow H (2001) Redundancy reduction revisited Network Comput NeuralSyst 12 241ndash253

Bialek W (2002) Thinking about the brain In H Flyvbjerg F Julicher P Or-mos amp F David (Eds) Physics of Biomolecules and Cells (pp 485ndash577) BerlinSpringer-Verlag See also physics02050306

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S P Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Computation 12 1531ndash1552See also physics9902067

Chichilnisky E J (2001) A simple white noise analysis of neuronal light re-sponses Network Comput Neural Syst 12 199ndash213

Creutzfeldt O D amp Northdurft H C (1978) Representation of complex visualstimuli in the brain Naturwissenschaften 65 307ndash318

Cover T M amp Thomas J A (1991) Information theory New York Wileyde Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng

15 169ndash179de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance

of a movement-sensitive neuron in the blowy visual system Coding andinformation transfer in short spike sequences Proc R Soc Lond B 265259ndash265

de Ruyter vanSteveninck R RBorst A amp Bialek W (2000)Real time encodingof motion Answerable questions and questionable answers from the yrsquosvisual system In J M Zanker amp J Zeil (Eds) Motion vision Computationalneural and ecologicalconstraints (pp 279ndash306) New York Springer-Verlag Seealso physics0004060

de Ruyter van Steveninck R R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808 See also cond-mat9603127

6 Where available we give references to the physics endashprint archive which may befound on-line at httparxivorgabs thus Bialek (2002) is available on-line athttparxivorgabsphysics0205030 Published papers may differ from the versionsposted to the archive

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 27: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

Analyzing Neural Responses to Natural Signals 249

Dimitrov A G amp Miller J P (2001) Neural coding and decoding Communi-cation channels and quantization Network Comput Neural Syst 12 441ndash472

Dong D W amp Atick J J (1995) Statistics of natural time-varying imagesNetwork Comput Neural Syst 6 345ndash358

Fairhall A L Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Kara P Reinagel P amp Reid R C (2000) Low response variability in simulta-neously recorded retinal thalamic and cortical neurons Neuron 27 635ndash646

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network Comput Neural Syst 12 317ndash329 Seealso physics0103088

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Paninski L (2003a) Convergence properties of three spikendashtriggered analysistechniques Network Compt in Neural Systems 14 437ndash464

Paninski L (2003b) Estimation of entropy and mutual information NeuralComputation 15 1191ndash1253

Panzeri S amp Treves A (1996) Analytical estimates of limited sampling biasesin different information measures Network Comput Neural Syst 7 87ndash107

Pola G Schultz S R Petersen R amp Panzeri S (2002) A practical guide toinformation analysis of spike trains In R Kotter (Ed) Neuroscience databasesA practical guide (pp 137ndash152) Norwell MA Kluwer

Press W H Teukolsky S A Vetterling W T amp Flannery B P (1992) Numericalrecipes in C The art of scientic computing Cambridge Cambridge UniversityPress

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neurosci 20 5392ndash5400

Rieke F Bodnar D A amp Bialek W (1995) Naturalistic stimuli increase therate and efciency of information transmission byprimary auditory afferentsProc R Soc Lond B 262 259ndash265

Rieke F Warland D de Ruyter van Steveninck R R amp Bialek W (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Ringach D L Hawken M J amp Shapley R (2002) Receptive eld structure ofneurons in monkey visual cortex revealed by stimulation with natural imagesequences Journal of Vision 2 12ndash24

Ringach D L Sapiro G amp Shapley R (1997) A subspace reverse-correlationtechnique for the study of visual neurons Vision Res 37 2455ndash2464

Rolls E T Aggelopoulos N C amp Zheng F (2003) The receptive elds ofinferior temporal cortex neurons in natural scenes J Neurosci 23 339ndash348

Ruderman D L (1994) The statistics of natural images Network Compt NeuralSyst 5 517ndash548

Ruderman D L amp Bialek W (1994) Statistics of natural images Scaling in thewoods Phys Rev Lett 73 814ndash817

Schwartz O Chichilnisky E J amp Simoncelli E (2002) Characterizing neuralgain control using spike-triggered covariance In T G Dietterich S Becker ampZ Ghahramani (Eds) Advances in Neural InformationProcessing (pp 269ndash276)Cambridge MA MIT Press

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003

Page 28: Analyzing Neural Responses to Natural Signals: Maximally ...wbialek/rome/refs/sharpee+al_04.pdfAnalyzing Neural Responses to Natural Signals 225 Figure 1: Schematic illustration of

250 T Sharpee N Rust and W Bialek

Sen K Theunissen F E amp Doupe A J (2001) Feature analysis of naturalsounds in the songbird auditory forebrain J Neurophysiol 86 1445ndash1458

Simoncelli E amp Olshausen B A (2001) Natural image statistics and neuralrepresentation Annu Rev Neurosci 24 1193ndash1216

Smirnakis S M Berry M J Warland D K Bialek W amp Meister M (1996)Adaptation of retinal processing to image contrast and spatial scale Nature386 69ndash73

Smyth D Willmore B Baker G E Thompson I D amp Tolhurst D J (2003)The receptive-eld organization of simple cells in primary visual cortex offerrets under natural scene stimulation J Neurosci 23 4746ndash4759

Stanley G B Li F F amp Dan Y (1999) Reconstruction of natural scenes fromensemble responses in the lateral geniculate nucleus J Neurosci 19 8036ndash8042

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Theunissen F E Sen K amp Doupe A J (2000) Spectral-temporal receptiveelds of nonlinear auditory neurons obtained using natural sounds J Neu-rosci 20 2315ndash2331

Tishby N Pereira F C amp Bialek W (1999)The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedings of the 37th Allerton Conferenceon Communication Control and Computing (pp 368ndash377) Urbana Universityof Illinois See also physics0004057

Touryan J Lau B amp Dan Y (2002) Isolation of relevant visual features fromrandom stimuli for cortical complex cells J Neurosci 22 10811ndash10818

Treves A amp Panzeri S (1995) The upward bias in measures of informationderived from limited data samples Neural Comp 7 399ndash407

von der Twer T amp Macleod D I A (2001) Optimal nonlinear codes for theperception of natural colours Network Comput Neural Syst 12 395ndash407

Vickers N J Christensen T A Baker T amp Hildebrand J G (2001) Odour-plume dynamics inuence the brainrsquos olfactory code Nature 410 466ndash470

Vinje W E amp Gallant J L (2000) Sparse coding and decorrelation in primaryvisual cortex during natural vision Science 287 1273ndash1276

Vinje W E amp Gallant J L (2002) Natural stimulation of the nonclassical re-ceptive eld increases information transmission efciency in V1 J Neurosci22 2904ndash2915

Voss R F amp Clarke J (1975) ldquo1f noiserdquo in music and speech Nature 317ndash318Weliky M Fiser J Hunt R amp Wagner D N (2003) Coding of natural scenes

in primary visual cortex Neuron 37 703ndash718

Received January 21 2003 accepted August 6 2003