arxiv:1205.6598v2 [q-bio.nc] 8 dec 2012 · retinal metric: a stimulus distance measure derived from...

Retinal metric: a stimulus distance measure derived from population neural responses

Gasper Tkacik∗a, Einat Granot-Atedgib, Ronen Segevc and Elad Schneidman†baInstitute of Science and Technology Austria, Am Campus 1, A-3400 Klosterneuburg, Austria

bDepartment of Neurobiology, Weizmann Institute of Science, 76100 Rehovot, IsraelcFaculty of Natural Sciences, Department of Life Sciences and Zlotowski Center for Neuroscience,

Ben Gurion University of the Negev, 84105 Be’er Sheva, Israel(Dated: June 5, 2018)

The ability of the organism to distinguish between various stimuli is limited by the structure andnoise in the population code of its sensory neurons. Here we infer a distance measure on the stimulusspace directly from the recorded activity of 100 neurons in the salamander retina. In contrast topreviously used measures of stimulus similarity, this “neural metric” tells us how distinguishable apair of stimulus clips is to the retina, given the noise in the neural population response. We showthat the retinal distance strongly deviates from Euclidean, or any static metric, yet has a simplestructure: we identify the stimulus features that the neural population is jointly sensitive to, andshow the SVM-like kernel function relating the stimulus and neural response spaces. We show thatthe non-Euclidean nature of the retinal distance has important consequences for neural decoding.

Neural populations convey information about theirstimuli by their joint spiking patterns [1]. At the levelof single cells, the mapping from stimuli to spikes has of-ten been captured by linear-nonlinear (LN) models [2, 3].Geometrically, a single LN neuron can be viewed as aperceptron [4], partitioning the stimulus space into twodomains—one of stimuli that evoke spikes and one ofstimuli that do not—by a decision boundary, or a hyper-plane, determined by the linear feature of the LN model[5–7]. The brain, listening for spikes coming from sucha neuron, will thus interpret stimuli as similar insofar asthey evoke similar spiking patterns. But how does aninteracting population, as a whole, partition the stimu-lus space? Conversely, which stimuli are interpreted asdifferent, or similar, by an interacting population?

Answering these questions is fundamental to our un-derstanding of the neural code and depends criticallyon finding the correct “metric” for sensory stimuli interms of the information that neural populations carry.Since neurons are noisy, repeated presentations of thesame stimulus can result in different neural responses, sothe stimulus/response mapping of the population needsto be described by the probability distribution, P (σ|s)[8]. Two stimuli s1 and s2 may be far apart as mea-sured by a chosen distance function, e.g. Euclidean normD2(s1, s2) = ||s1 − s2||, yet they could evoke responsesdrawn from almost overlapping distributions P (σ|s1) andP (σ|s2), making it nearly impossible for the brain, listen-ing to the spikes σ arriving from the sensory system, totell those stimuli apart. Conversely, the sensory circuitcould be sensitive to specific stimulus changes that have asmall Euclidean norm, emphasizing those particular dif-ferences as an important feature and encoding it in theneural response. We therefore suggest that the biologi-

∗[email protected]†[email protected]

cally relevant distance between pairs of stimuli should bederived from the distance between the response distribu-tions evoked by these stimuli [9, 10].

To characterize the structure of neural distance in alarge population, we recorded extracellularly the activityofN = 100 retinal ganglion cells (RGCs) in the tiger sala-mander using a multi-electrode array [11, 12]. The retinapatch was presented 626 times with a 10 s long segmentof spatially uniform flicker with Gaussian distributed lu-minance drawn independently at 30 Hz (Fig. 1a,b). Timewas discretized into T = 961 bins of 10 ms and the jointresponse of the neurons σ = {σa} was represented ineach bin by a N -bit codeword, where σa = 1 (0) denotedthat the neuron a = 1, . . . , N spiked (did not spike) inthat bin. Since direct sampling of the conditional dis-tribution P (σ|s) is impractical for such a large popula-tion (even with hundreds of repeats, Fig. 1c), we inferreda stimulus-dependent maximum entropy (SDME) modelfor this data that predicts P (σ|s) for each time bin, aswe report in detail in Ref. [13].

Since only differences in retinal responses can guide theorganism’s behavior, the biologically relevant distancebetween stimuli s1 and s2 must be a measure of similar-ity between their corresponding response distributions.We define the retinal distance between the stimuli as thesymmetrized Kullback-Leibler distance between the dis-tributions of responses they elicit,

Dret(s1, s2) = DsymKL (P (σ|s1), P (σ|s2)) (1)

where the symmetrized KL divergence is DsymKL (p, q) =

0.5 (∑x(p(x) log2 p(x)/q(x) + q(x) log2 q(x)/p(x)) [14].

We choose this principled information-theoretic measurebecause it quantifies the difference between stimuliprecisely to the extent that their response distributionsare distinguishable [14, 15]. Once constructed, theanalysis of Dret should help us uncover the fundamentalaspects of the stimulus space, in particular, whether thedistance between pairs of stimuli is determined by a

arX

iv:1

205.

6598

v2 [

q-bi

o.N

C]

8 D

ec 2

012

2

1.0 t [s] 3.0

s(t)

3

t [s] 0

0

3

D2~7 Dret~38

t [s] 0

0

3

D2~7 Dret<0.1

t [s] 0

0

3

D2~14 Dret~56

t [s] 0

0

3

D2~15 Dret<0.1

...

repeat 1

repeat 2

repeat 626

neur

ons

s1 s2 Dret

D2

P( |s1) P( |s2)

1.0 t [s] 3.0

1.0

t [s]

3.0Dret

D2

a)

b)

c)

d)

e)

D2(si,sj) = |si - sj|

Dret(si,sj) = DKL ( P( |si) , P( |sj) )sym

repeats repeats

neur

ons

neur

ons

response to

s1 response to s2

FIG. 1: a) Stimulus segment with two 400 ms stimulus clips s1 (red), s2 (blue). b) Rasters for 100 RGCs shown for 3 examplerepeats; response vectors to s1, s2 shown between dashed lines. c) Measured response rasters to two highlighted stimulus clips(s1, left; s2, right; spikes = white dots; colored bars = neuron firing rates). d) For every pair of timebins in the experiment,the Euclidean distance D2 between the corresponding stimulus clips is shown in the upper diagonal part of the matrix, and theretinal distance Dret in the lower part. e) Four pairs of stimulus clips (green and violet) are selected to demonstrate that D2

(increasing top to bottom) and Dret (increasing left to right) are not monotonically related.

small number of features or stimulus projections.Using the SDME model for P (σ|s) [30], we computed

the retinal distance, Dret, between every pair of stimulusclips in the experiment. Figure 1d shows the profounddifference between the Euclidean and retinal distance forall the pairs in a two-second interval of the stimulus. Inparticular, the same value of Euclidean distance can beobtained from very similar neural responses, or very dif-ferent ones, as shown in Fig. 1e for selected pairs of stim-ulus clips.

To further explore the structure of retinal distance,we used multidimensional scaling (MDS) to project thedistance matrix Dret(si, sj), between all pairs (i, j =1, . . . , T ) of presented stimulus clips, into a low dimen-sional space [16]. This embedding technique does not di-rectly reveal the structure of the stimulus space, but canapproximate its effective dimensionality. Every stimulusclip si is assigned a K-dimensional point ~vi in Euclideanspace, such that

Dret(si, sj) ≈ f(||~vi − ~vj ||), (2)

with f(·) being a monotonic function. It is pos-sible to find such accurate mappings for small K:Fig. 2a shows the strong correspondence between low-dimensional MDS projections and the retinal distance,for different K values. Fig. 2b summarizes the MDSperformance at low orders in terms of the mutual infor-mation it captures about Dret; higher information val-ues correspond to a tighter relationship with less scatter.Figure 2c shows the structure of stimulus space usingMDS with K = 2, which already captures most of thestructure of the stimulus space. The first coordinate ofthe MDS projection, v(1), is strongly correlated with theaverage firing rate in the population: high values corre-

spond to “off-like” stimuli, and small values correspondto flat or “on-like” stimuli that do not drive the neuralpopulation well. The increased sensitivity to “off-like”features is consistent with the known prevalence of OFF-type cells in the salamander retina and in our dataset.Although “on-like” stimuli differ substantially in theirshapes and thus in their Euclidean distances D2, theyare largely indistinguishable for the retina. In contrast,groups of stimuli sharing the same v(1) coordinate (yellow

a)

0 2 4 60

2

4

I(Dre

t ; d)

, bits

b)

0 20 40−30

−20

−10

0

10

20

30

v(1)

v(2)

c)

0 Hz

16 Hz

avg

firin

g ra

te

K

d(i,j) = || vi - vj ||

Dre

t(si, s

j) K=1K=3K=5

FIG. 2: MDS assigns to each stimulus clip si a K-

dimensional vector ~vi = {v(1)i , . . . , v(K)i }. a) Relationship

between Dret (one dot = one pair of stimuli) and the Eu-clidean norm in MDS space, d(i, j) = ||~vi − ~vj ||, gets tighterwith K. b) Information I between Dret and MDS distance das a function of embedding dimension K. c) For K = 2, allT = 961 stimulus clips are shown as points with MDS coordi-nates (v(1), v(2)); shade of red corresponds to mean populationfiring rate (scale at right). Five groups of stimuli are denotedby circles (gray lines = individual stimulus clips, color line =average). Equal distances in the plane correspond to equalDret.

3

0 20 40 600

20

40

60

f( ||vi − vj|| )

Dre

t(si,s

j)

Dret model2d MDS

−5 0 5−20

0

20

40

k1 * s

−5 0 5−20

0

20

40

k2 * s

k1

k2 dk1/dt

v(1)

v(2)

100 ms

100 ms

0 20 40−30

−20

−10

0

10

20

30

predicted v(1)

pred

icte

d v(2

)

0

0.75 I(model; Dret)I(2d MDS; Dret)

Dre

t mod

el g µν gl

obal

met

ric

Eucl

idea

n di

stan

ce

stimulusspace two-dimensional MDS spaceDret modela) b) c) d) e)

A

B

C

D

E

C

E DA

B

CE

AB

FIG. 3: a) Sample clips from the stimulus space (5 highlighted in color). b) A model for Dret maps each stimulus clip s to

a point (v(1), v(2)) in the 2d MDS space. Predictions for v(1,2) are obtained by filtering the stimulus clip s with a linear filter

k1,2 and passing the result through a pointwise nonlinearity (for v(2), the transformation is linear with the slope that increaseswith k1 · s, as shown). c) All T = 961 stimulus clips represented as points in a plane with coordinates predicted by the model

in b (colored squares = highlighted stimuli). d) Given predictions for coordinates ~vi = (v(1)i , v

(2)i ), the Dret for all (∼ 4.6 · 105)

pairs of clips (si, sj) can be predicted using a simple fitted relation in the inset (black line = mean binned model predictions vstrue Dret, shaded area = std of binned predictions). Red line shows Dret computed from true 2d MDS coordinates of Fig. 2c.Two highlighted distances from c denoted by arrows. e) The fraction of information about the true Dret captured by the Dret

model in b, by the best-fit global metric gµν , and by the Euclidean distance, normalized by the success of the K = 2 MDS.

and green) have a similar shape and evoke a similar pop-ulation firing rate, yet are distinguishable because theydiffer along the second coordinate v(2). To the retina,yellow and green groups of stimuli are as distinct fromeach other as the blue and magenta groups at constantv(2) but different v(1).

Figure 3 presents a model that predicts the mappingof an arbitrary stimulus s into v(1), v(2) and thus allowsus to compute Dret. This model, obtained by using sim-ple reverse correlation analysis [3], relies on two coupledlinear-nonlinear transformations (Fig. 3a-d), and iden-tifies two dominant population-level stimulus featuresk1, k2: stimuli are distinguishable only insofar as theirprojections onto k1, k2 differ. The model accurately pre-dicts Dret (Fig. 3e), establishing the relation of Eq. (2)to be d(i, j) = ||~vi − ~vj || ≈ β−1 exp{α−1Dret(si, sj)}(Fig. 3d). Interestingly, this relation, which we did notassume a priori, is exactly the kernel function used inseveral very successful applications of support vector ma-chine (SVM) classification in machine learning, where oneneeds to distinguish between “classes” (here, stimulusclips) based on the distributions over “features” (here,neural responses) that they induce [17]. Our findings in-dicate that the neural population, much like single neu-rons, performs low-order dimensionality reduction on theincoming stimuli; however, unlike single neurons that cansignal only a binary decision in every time bin, the pop-ulation has access to ∼ 2N states which can encode thevariation along the relevant directions with greater pre-cision. This view is consistent with the reported highlyredundant code in the retina [18].

Given the failure of the Euclidean metric to pre-dict stimulus similarity, and the success of the low-dimensional model, we asked whether a general quadratic

form could explain the retinal distance. We thus lookedfor an optimal matrix gµν , such that Dret(si, sj) ≈∑µν(s

(µ)i −s

(µ)j )gµν(s

(ν)i −s

(ν)j ), where µ, ν range over all

40 components of stimulus clip vectors s. Using cross-validated least-squares fitting, we found that the opti-mal gµν substantially outperformed the Euclidean met-ric, yet still only captured ∼ 20% of the structure inDret (Fig. 3e). The best-fitting matrix gµν has a simplestructure that is captured by two eigenvectors, match-ing the pair of population-level stimulus features, k1, k2,independently inferred in Fig. 3b. Despite this, the best-fitting static gµν performs poorly: the eigenvalues corre-sponding to k1, k2 would have to depend on the stimulusin order to approximate well our Dret model.

Our results carry important implications for stimulusdecoding. The accuracy of our model for stimulus sim-ilarity enables us to create new stimuli that are simi-lar to each other up to any desired distance. We usedMonte Carlo simulation to generate ensembles of full-length stimuli, such that each clip from the generatedstimulus is less than Θ distant (as measured by Dret)from the corresponding clip in the original stimulus dis-played in the experiment (Fig. 4a). Our analysis pre-dicts that for small enough Θ, all stimuli from such anensemble are essentially indistinguishable to the retina.In contrast to Euclidean distance, which would allow thegenerated stimuli to fluctuate around the original wave-form equally at every point in time for a given thresh-old Θ, the retinal distance constrains the possible set ofstimuli much more at certain times than at others, re-flecting a preference of the retina for encoding specificfeatures in the stimulus. This is in part due to com-pressive nonlinearities, which squeeze large segments of

4

100 1010

0.8

[Hz]

cohe

renc

e

200ms

Θ=1

Θ=1

0st

da) b)

0Θ=1

Θ=10

Θ=10

Θ=5Θ=2

Θ=1

FIG. 4: a) Two Monte-Carlo generated ensembles of longstimuli (ensemble mean = red; std = shade) where each clipof the generated stimulus is less than Θ = 1, 10 distant fromthe corresponding clip in the original stimulus (blue line).Below: std over the ensembles (Θ = 1, 10), showing how en-sembles get more tightly constrained at particular instantscorresponding to high neural activity (raster, bottom) andsteep stimulus changes. b) The rise in coherence between theoriginal stimulus and the simulated ensembles with decreasingΘ.

stimulus space with small feature overlap into a small vol-ume as measured by retinal distance (Fig. 2c), while em-phasizing stimulus differences with high feature overlap.Consequently, distance models (such as Euclidean dis-tance) with constant metrics are incapable of capturingthe characteristics of the retinal distance. We conjecturethat decoding approaches based on minimizing standarddistance measures (e.g. [19]) may overly penalize devi-ations from aspects of the stimulus that are simply notrepresented in the neural responses, while not emphasiz-ing strongly enough the deviations from population-levelstimulus features identified here.

Understanding the neural code crucially depends onour ability to go beyond single-neuron spatio-temporalreceptive fields, and identify how many, and which, arethe population-level features in an interacting popula-tion. We introduced a novel, biologically relevant dis-tance measure on the space of stimuli based on the ac-tivity of large populations of neurons. This approachextended the single-neuron notions of stimulus featureextraction to the neural population, determining how ahigh-dimensional input space is partitioned and encodedby population responses. Our work thus suggests a prin-cipled alternative to arbitrary norms (like the Euclideannorm) for stimulus similarity and decoding, generalizesprevious attempts to construct metrics for particular in-put spaces from neural responses (e.g. [21, 22]), and com-plements existing work on the dual problem of construct-ing relevant spike-train distance measures [23, 24].

The approach we presented here will be instrumen-tal in the analysis of upcoming experiments, which allowthe recording of large parts of sensory neural circuits, oreven of all the cells encoding some parts of the sensoryscene [25]. This approach can be immediately appliedto other sensory modalities, where it could signal—muchas we have found here—that the “neural metric” devi-ates considerably from our intuitive notions of similarity.

Moreover, it can be extended to sensory domains wherewe lack any obvious notion of similarity, e.g. olfaction,for which there exists no natural distance between chem-ical stimuli [20]. More broadly, as the neural metric isbased on the spiking activity itself, this framework canbe taken beyond sensory modalities, to study perceptualmetrics as well (e.g. [26]) or used to define neural-baseddistances for motor behavior that would be critical forneural prosthesis applications [27–29].

[1] Rieke F, Warland D, de Ruyter van Steveninck RR &Bialek W (1997) Spikes: Exploring the Neural Code. MITPress, Cambridge, MA.

[2] de Boer R & Kuyper P (1968) IEEE Trans Biomed Eng15: 169–79.

[3] Schwartz O, Pillow JW, Rust NC & Simoncelli EP (2006)J Vis 6: 484–507.

[4] Rosenblatt F (1958) Psych Rev 65: 386–408.[5] de Ruyter van Steveninck RR & Bialek W (1998) Proc

Royal Soc B 234: 379–414.[6] Aguera y Arcas B & Fairhall AL (2003) Neural Comput

15: 1789–1807.[7] Bialek W & de Ruyter van Steveninck RR (2005)

arXiv.org:q-bio/0505003.[8] de Ruyter van Steveninck RR, Lewen GD, Strong SP,

Koberle R & Bialek W (1997) Science 275: 1805–8.[9] Shepard RN (1957) Psychometrika 22: 325–345.

[10] Young MP & Yamane S (1992) Science 256: 1327-1331.[11] Meister M, Pine J & Baylor DA (1994) J Neurosci Meth-

ods 51: 95–106.[12] Segev R, Goodhouse J, Puchalla J & Berry MJ 2nd

(2004) Nat Neurosci 7: 1154–61.[13] Granot-Atedgi E, Tkacik G, Segev R & Schneidman E

(2012) arXiv.org:1205.6438.[14] Cover TM & Thomas JA (1991) Elements of Information

Theory. Wiley, New York.[15] Butts DA & Goldman MS (2006) PLoS Biol 4: e92.[16] Borg I & Groenen PJF (2005) Modern Multidimensional

Scaling. Springer, New York.[17] Moreno PJ, Ho PP & Vasconcelos N (2003) A Kullback-

Leibler divergence based kernel for SVM classification inmultimedia applications. Adv Neural Info Proc Syst 16.

[18] Puchalla JL, Schneidman E, Harris RA & Berry MJ 2nd(2005) Neuron 46: 493–504.

[19] Warland DK, Reinagel P & Meister M (1997) J Neuro-phys 78: 2336–2350.

[20] Haddad R, Khan R, Takahashi YK, Mori K, Harel D &Sobel N (2008) Nature Methods 5: 425–429.

[21] Curto C & Itskov V (2008) PLoS Comput Biol 4:e1000205.

[22] Macke JH, Zeck G & Bethge M (2007) Adv Neural InfoProc Syst 20: 969-976.

[23] Victor JD & Purpura KP (1997) Network 8: 127–164.[24] Ahmadian Y et al (2008) Analyzing the primate retinal

code with efficient model-based decoding methods. SfN2008 poster.

[25] Marre O et al (2012) J Neurosci 32: 14859–73.[26] Freeman J & Simoncelli EP (2011) Nat Neuro14: 1195–

1201.

http://arxiv.org/abs/q-bio/0505003

5

[27] Velliste M em et al (2008) Nature 453: 1098–110.[28] Vargas-Irwin CE em et al. (2010) J Neurosci 30: 9659–

9669.[29] Hatsopoulos NG & Donoghue JP (2009) Annu Rev Neu-

rosci 32: 249-266[30] SDME outperforms conditionally-independent models

for P (σ|s), but the low-dimensional qualitative structureof the stimulus space identified in this paper can be ro-bustly reproduced with non-SDME models; see Supple-mentary Information.

arxiv:1205.6598v2 [q-bio.nc] 8 dec 2012 · retinal metric: a stimulus distance measure derived from...

Documents