neural decoding - the university of edinburgh · causal decoding organism faces causal (on-line)...

Neural Decoding

Matthias Hennig

School of Informatics, University of Edinburgh

January 2019

0Acknowledgements: Mark van Rossum, Chris Williams and slides from GatsbyLiam Paninski.

1 / 54

Decoding brain activity

Classi�cationWhich one?

Reconstruction:Homunculus

2 / 54

The Homunculus

Flynculus [Rieke et al., 1996]

3 / 54

Overview

1 Stimulus discrimination, signal detection theory2 Maximum likelihood and MAP decoding3 Bounds and Fisher information4 Spike train decoding and GLMs

4 / 54

Why decoding?

Understanding the neural code.Given spikes, what was the stimulus?What aspects of the stimulus does the system encode? (capacityis limited)What information can be extracted from spike trains:

By “downstream” areas? Homunculus.By the experimenter? Ideal observer analysis.

What is the coding quality?Design of neural prosthetic devices

Related to encoding, but encoding does not answer above questionsexplicitly.

5 / 54

Decoding examples

Hippocampal place cells: how is location encoded?Retinal ganglion cells: what information is sent to the brain? Whatis discarded?Motor cortex: how can we extract as much information as possiblefrom a collection of M1 cells?

6 / 54

Decoding theory

Probability of the stimulus, prior: P(s)Probability of a measured neural response: P(r)Joint probability of stimulus and response: P(r, r)Conditional probabilities: P(r|s), P(s|r)Marginal: P(r) =

∑s P(r|s)P(s)

Note: P(r, s) = P(r|s)P(s) = P(s|r)P(r)

Bayes theorem:

P(s|r) =P(r|s)P(s)

P(r)

Note that we need to know the stimulus statistics.

7 / 54

Example: Discrimination between two stimuli

Test subject report left or right motion (2 alternative forced choice,2AFC). See [Dayan and Abbott, 2002], chapter 3.2.

8 / 54

MT neurons in this task

[Britten et al., 1992]

Some single neurons do as well as animal!Possibility for averaging might be limited due to correlation?Population might still be better/faster? [Cohen and Newsome, 2009]

9 / 54

[Britten et al., 1992] Assuming rate histograms are Gaussian with equalvariance σ2, the discriminability is

d ′ =< r >+ − < r >−

σ

10 / 54

Discriminate between response distributions P(r−) and P(r+).(directions + and −), and discrimination threshold z on firing rate:

Hit rate: β(z) = P(r ≥ z|+)

False alarm rate: α(z) = P(r ≥ z|−)

stimulus correct False+ β 1− β- 1− α α

Probability of correct answer: (β(z) + 1− α(z))/2Can be used to find the optimal z.

11 / 54

ROC curves

Discriminate between response distributions P(r−) and P(r+).The Receiver Operating Characteristic (ROC) gives graphical intuition:

Vary decision threshold and measure error rates.Larger area under curve means better discriminability.Shape relates to underlying distributions.

12 / 54

[Britten et al., 1992]

P(correct) =

∫ 1

0βdα

When responses are Gaussian:

P(correct) =12

erfc(< r >− − < r >+

2σ

)13 / 54

Population decoding

[Dayan and Abbott (2001) after Theunissen and Miller (1991)]

Cricket Cercal System: Information about wind direction is encoded byfour types of neurons (

f (s)

rmax

)= [cos(s − sa)]+

14 / 54

Let ca denote a unit vector in the direction of sa, and v be a unitvector parallel to the wind velocity(

f (s)

rmax

)= [v · ca]+

Crickets are Cartesian, 4 directions 45◦, 135◦, −135◦, −45◦

Population vector is defined as

vpop =4∑

a=1

(r

rmax

)a

ca

15 / 54

Vector method of decoding

[Dayan and Abbott (2001) after Salinas and Abbott (1994)]

16 / 54

Primary Motor Cortex (M1)

Certain neurons in M1 of the monkey can be described by cosinefunctions of arm movement direction (Georgopoulos et al, 1982)Similar to cricket cercal system, but note:

Non-zero offset rates r0(f (s)− r0

rmax

)= v · ca

Non-orthogonal: there are many thousands of M1 neurons thathave arm-movement-related tuning curves

17 / 54

[Dayan and Abbott (2001) after Kandel et al (1991)]

18 / 54

Optimal Decoding

p(s|r) =p(r|s)p(s)

p(r)

Maximum likelihood decoding (ML): s = argmaxs p(r|s)

Maximum a posteriori (MAP): s = argmaxs p(s)p(r|s)

Note these two are equivalent if p(s) is flat.Bayes: mimimize loss

sB = argmins∗

∫s

L(s, s∗)p(s|r)ds

For squared loss L(s, s∗) = (s − s∗)2, optimal s∗ is posteriormean, sB =

∫s p(s|r)s.

19 / 54

Optimal Decoding for the cricket

For the cercal system, assuming indep. noise

p(r|s) =∏

a

p(ra|s)

where each p(ra|s) is modelled as a Gaussian with means andvariancesp(s) is uniform (hence MAP=ML)ML decoding finds a peak of the likelihoodBayesian method finds posterior meanThese methods improve performance over the vector method (butnot that much, due to orthogonality...)

20 / 54

Cricket Cercal System

[Dayan and Abbott (2001) after Salinas and Abbott (1994)]

21 / 54

General Consideration of Population Decoding

[Dayan and Abbott (2001)]

Gaussian tuning curves.

22 / 54

Poisson firing model over time T , count na = raT spikes.

p(r|s) =N∏

a=1

(fa(s)T )na

na!exp(−fa(s)T )

log p(r|s) =N∑

a=1

na log fa(s) + . . .

The terms in . . . are independent of s, and we assume∑

a fa(s) isindependent of s (all neurons sum to the same average firing rate).

23 / 54

ML decoding

sML is stimulus that maximizes log p(r|s), determined by

N∑a=1

raf ′a(sML)

fa(sML)= 0

If all tuning curves are Gaussian fa = A exp[−(s − sa)2/2σ2w ] then

sML =

∑a rasa∑

a ra

which is simple and intuitive, known as Center of Mass (cfpopulation vector)

24 / 54

Accuracy of the estimator

Bias and variance of an estimator sest

best (s) = 〈sest〉 − sσ2

est (s) = 〈(sest − 〈sest〉)2〉〈(s − sest )

2〉 = b2est (s) + σ2

est

Thus for an unbiased estimator, MSE 〈(s − sest )2〉 is given by σ2

est ,the variance of the estimator

25 / 54

Fisher information

Fisher information is a measure of the curvature of the loglikelihood near its peak

IF (s) =

⟨−∂

2 log p(r|s)

∂s2

⟩s

= −∫

drp(r|s)∂2 log p(r|s)

∂s2

(the average is over trials measuring r while s is fixed)Cramér-Rao bound says that for any estimator[Cover and Thomas, 1991]

σ2est ≥

(1 + b′est (s))2

IF (s)

efficient estimator if σ2est =

(1+b′est (s))2

IF (s).

In the bias-free case an efficient estimator σ2est = 1/IF (s).

ML decoder is typically efficient when N →∞.26 / 54

Fisher information

In homogeneous systems IF indep. of s.

More generally Fisher matrix (IF )ij(s) =⟨−∂2 log p(r|s)

∂si∂sj

⟩s.

Taylor expansion of Kullback-LeiblerDKL(P(s),P(s + δs)) ≈

∑ij δsiδsj(IF )ij

Not a Shannon information measure (not in bits), but related toShannon information in special cases,e.g.[Brunel and Nadal, 1998, Yarrow et al., 2012].

27 / 54

Fisher information for a population

For independent Poisson spikers

IF (s) =

⟨−∂

2 log p(r|s)

∂s2

⟩= T

∑a

〈ra〉

((f ′a(s)

fa(s)

)2

− f ′′a (s)

fa(s)

)

For dense, symmetric tuning curves, the second term sums to zero.Using fa(s) = 〈ra〉 we obtain

IF (s) = T∑

a

(f ′a(s))2

fa(s)

For dense fa(s) = Ae−(s−s0+a.ds)2/2σ2w with density ρ = 1/ds, sum

becomes integralIF =

√2πTAρ/σw

28 / 54

For Gaussian tuning curves


Note that Fisher information vanishes at peak as f ′a(s) = 0 there.Can be used to create optimal tuning curves,[Dayan and Abbott, 2002] chapter 3.3.Discriminability d ′ = ∆s

√IF (s) for a small ∆F .

29 / 54

FI predicts human performance


Orientation discrimination for stimuli of different size (different N)Solid line: estimated minimum standard deviation at the CramerRao boundTriangles: human standard deviation as function of stimulus size(expressed in N)

30 / 54

Slope as strategy

From paper on bat echo location [Yovel et al., 2010] )31 / 54

Spike train decoding

Dayan and Abbott §3.4Estimate the stimulus from spike times ti to minimize e.g.〈s(t)− sest (t)〉2

First order reconstruction:

sest (t − τ0) =∑

ti

K (t − ti)− 〈r〉∫

dτK (τ)

The second term ensures that 〈sest (t)〉 = 0Delay τ0 can be included to make decoding easier: predictstimulus at time t − τ0 based on spikes up to time t

32 / 54

Causal decoding

Organism faces causal (on-line) decoding problem.Prediction of the current/future stimulus requires temporalcorrelation of the stimulus. Example: in head-direction systemneural code correlates best with future direction.Requires K (t − ti) = 0 for t ≤ ti .

sest (t − τ0) =∑

ti

K (t − ti)− 〈r〉∫

dτK (τ)

Delay τ0 buys extra time

33 / 54

Causal decoding

Delay τ0 = 160 ms. (B: shifted/causal kernel, C: non-causal kernel)At time t estimate s(t − τ0):Spikes 1..4: contribute because stimulus is correlated (right tail of K)Spikes 5..7: contribute because of τ0Spikes 8, 9,... : have not occurred yet. Stimulus uncorrelated: Kernelfrom STA [Dayan and Abbott (2001)]

34 / 54

Decoding from a GLM

0 25 50 75 100 125 150 175 2002

0

2

4stimulusSTAGLM

0 25 50 75 100 125 150 175 200

0.0

2.5

5.0

7.5

0 25 50 75 100 125 150 175 200Time bin

5

10

s

35 / 54

H1 neuron ofthe flySolid line isreconstructionusing acausalfilterNote,reconstructionquality willdepend onstimulus

[Dayan and Abbott (2001) after Rieke et al (1997)]

36 / 54

MAP estimation

According to Bayes theorem

p(s|r) = p(r |s)p(s)/p(r)

log p(s|r) = log p(r |s) + log p(s) + c

Tractable as long as rhs. is concave, which excludes heavy tailed p(s).Requires to compute:

s = argmaxs log p(s|r)

This is numerically hard, see [Pillow et al., 2011] for methods.

37 / 54

MAP decoding simulation

[Pillow et al., 2011]

38 / 54

Conclusion stimulus reconstruction

Stimulus reconstruction similar to encoding problem. ButResponse is given, can not be choosen to be whiteImposing causality adds realism but reduces quality

The reconstruction problem can be ill-posed. It is not alwayspossible to reconstruct stimulus (cf dictionary). For instance:complex cell.Still, the cell provides information about the stimulus. Could try toread the code, rather than reconstruct the stimulus (e.g. idealobserver)

39 / 54

Readout of Object Identity from Macaque IT Cortex

[Hung et al., 2005]Recording from ∼ 300 sites in the Inferior Temporal (IT) cortexPresent images of 77 stimuli (of different objects) at variouslocations and scales in the visual field.Task is to categorize objects into 8 classes, or identify all 77objectsPredictions based on one-vs-rest linear SVM classifiers, usingdata in 50 ms bins from 100 ms to 300 ms after stimulus onset

40 / 54

[Hung et al., 2005]

41 / 54

What does this tell us?

Performance of classifiers can provide a lower bound on theinformation available in the population activity.Assuming independence, correlations are ignored. Correlationcould limit or enhance information.Distributed representationLinear classifier can plausibly be implemented in neural hardware

42 / 54

4. Population Encoding

Dayan and Abbott §3.3Population encoding uses a large number of neurons to representinformationAdvantage 1: reduction of uncertainty due to neuronal variability(Improves reaction time).Advantage 2: Ability to represent a number of different stimulusattributes simultaneously (e.g. in V1 location and orientation).

43 / 54

Population codes and correlations: Retina

Fit coupled GLM-model (see encoding) to retina data


44 / 54

Population codes and correlations: Retina


45 / 54

Hippocampal Place Cell Decoding

[Brown et al., 1998]Encoding in place cells: modelled as inhomogeneous Poissonprocesses with Gaussian receptive fields (incl. theta oscillations),assuming independenceEncoding as path: random walkBayesian filter decoding: compute posterior and then updateestimate in next time step using information from spikes (combineprior and new information).Non-Bayesian decoding: compute estimate in time steps givenspikes in time window (only current information).

46 / 54

[Brown et al., 1998]

Bayes, maximum likelihood, linear kernel and correlation based.47 / 54

Example: Motor decoding

[Shpigelman et al., 2005]

Rhesus monkey, 43 electrodes in M1Monkey controls cursors on a screen using two manipulanda toperform a centre-out reaching taskPredict hand velocity based on 10 time bins, each of length 100ms in all 43 neurons.Can use linear regression, polynomial regression, Gaussiankernel (support vector regression), spikernel (allows time warping)More sophisticated methods outperform linear regression, butlinear is already decent

State-of-the-art w. Kalman filters [Gilja et al., 2012]

48 / 54

[Shpigelman et al., 2005]

49 / 54

Decoding from your brain

[Miyawaki et al., 2008]

fMRI, linear decoder from voxel activations.50 / 54

Summary

Discrimination between stimuli (or actions) using just recordedspikes is possible and tractable.Full reconstruction of a stimulus is hard, especially when stimulusdimensionality is high.But also unclear to what extent this is this even possible.Decoding can tell us how much information neural activity carriesabout the outside world, it provides an upper bound.BMI applications are on the horizon.

51 / 54

References I

Britten, K. H., Shadlen, M. N., Newsome, W. T., and Movshon, J. A. (1992).The analysis of visual motion: a comparison of neuronal and psychophysical performance.J Neurosci, 12:4745–4765.

Brown, E. N., Frank, L. M., Tang, D., Quirk, M. C., and Wilson, M. A. (1998).A statistical paradigm for neural spike train decoding applied to position prediction fromensemble firing patterns of rat hippocampal place cells.Journal of Neuroscience, 18(18):7411–7425.

Brunel, N. and Nadal, J.-P. (1998).Mutual information, Fisher information, and population coding.Neural Comp., 10:1731–1757.

Cohen, M. R. and Newsome, W. T. (2009).Estimates of the contribution of single neurons to perception depend on timescale andnoise correlation.J Neurosci, 29(20):6635–6648.

Cover, T. M. and Thomas, J. A. (1991).Elements of information theory.Wiley, New York.

Dayan, P. and Abbott, L. F. (2002).Theoretical Neuroscience.MIT press, Cambridge, MA.

52 / 54

References II

Gilja, V., Nuyujukian, P., Chestek, C. A., Cunningham, J. P., Yu, B. M., Fan, J. M.,Churchland, M. M., Kaufman, M. T., Kao, J. C., Ryu, S. I., and Shenoy, K. V. (2012).A high-performance neural prosthesis enabled by control algorithm design.Nat Neurosci, 15(12):1752–1757.

Hung, C. P., Kreiman, G., Poggio, T., and DiCarlo, J. J. (2005).Fast Readout of Object Identity from Macaque Inferior Temporal Cortex.Science, 310:863–866.

Miyawaki, Y., Uchida, H., Yamashita, O., Sato, M.-a., Morito, Y., Tanabe, H. C., Sadato, N.,and Kamitani, Y. (2008).Visual image reconstruction from human brain activity using a combination of multiscalelocal image decoders.Neuron, 60(5):915–929.

Pillow, J. W., Ahmadian, Y., and Paninski, L. (2011).Model-based decoding, information estimation, and change-point detection techniques formultineuron spike trains.Neural computation, 23(1):1–45.

Pillow, J. W., Shlens, J., Paninski, L., Sher, A., Litke, A. M., Chichilnisky, E. J., andSimoncelli, E. P. (2008).Spatio-temporal correlations and visual signalling in a complete neuronal population.Nature, 454(7207):995–999.

53 / 54

References III

Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W. (1996).Spikes: Exploring the neural code.MIT Press, Cambridge.

Shpigelman, L., Singer, Y., Paz, R., and Vaadia, E. (2005).Spikernels: Predicting Arm Movements by Embedding Population Spike Rate Patterns inInner-Product Spaces.Neural Comput, 17:671–690.

Yarrow, S., Challis, E., and Seri?s, P. (2012).Fisher and Shannon information in finite neural populations.Neural Comput, 24(7):1740–1780.

Yovel, Y., Falk, B., Moss, C. F., and Ulanovsky, N. (2010).Optimal localization by pointing off axis.Science, 327(5966):701–704.

54 / 54

neural decoding - the university of edinburgh · causal decoding organism faces causal (on-line)...

Documents