neural decoding - the university of edinburgh · causal decoding organism faces causal (on-line)...
TRANSCRIPT
Neural Decoding
Matthias Hennig
School of Informatics, University of Edinburgh
January 2019
0Acknowledgements: Mark van Rossum, Chris Williams and slides from GatsbyLiam Paninski.
1 / 54
Decoding brain activity
Classi�cationWhich one?
Reconstruction:Homunculus
2 / 54
The Homunculus
Flynculus [Rieke et al., 1996]
3 / 54
Overview
1 Stimulus discrimination, signal detection theory2 Maximum likelihood and MAP decoding3 Bounds and Fisher information4 Spike train decoding and GLMs
4 / 54
Why decoding?
Understanding the neural code.Given spikes, what was the stimulus?What aspects of the stimulus does the system encode? (capacityis limited)What information can be extracted from spike trains:
By “downstream” areas? Homunculus.By the experimenter? Ideal observer analysis.
What is the coding quality?Design of neural prosthetic devices
Related to encoding, but encoding does not answer above questionsexplicitly.
5 / 54
Decoding examples
Hippocampal place cells: how is location encoded?Retinal ganglion cells: what information is sent to the brain? Whatis discarded?Motor cortex: how can we extract as much information as possiblefrom a collection of M1 cells?
6 / 54
Decoding theory
Probability of the stimulus, prior: P(s)Probability of a measured neural response: P(r)Joint probability of stimulus and response: P(r, r)Conditional probabilities: P(r|s), P(s|r)Marginal: P(r) =
∑s P(r|s)P(s)
Note: P(r, s) = P(r|s)P(s) = P(s|r)P(r)
Bayes theorem:
P(s|r) =P(r|s)P(s)
P(r)
Note that we need to know the stimulus statistics.
7 / 54
Example: Discrimination between two stimuli
Test subject report left or right motion (2 alternative forced choice,2AFC). See [Dayan and Abbott, 2002], chapter 3.2.
8 / 54
MT neurons in this task
[Britten et al., 1992]
Some single neurons do as well as animal!Possibility for averaging might be limited due to correlation?Population might still be better/faster? [Cohen and Newsome, 2009]
9 / 54
[Britten et al., 1992] Assuming rate histograms are Gaussian with equalvariance σ2, the discriminability is
d ′ =< r >+ − < r >−
σ
10 / 54
Discriminate between response distributions P(r−) and P(r+).(directions + and −), and discrimination threshold z on firing rate:
Hit rate: β(z) = P(r ≥ z|+)
False alarm rate: α(z) = P(r ≥ z|−)
stimulus correct False+ β 1− β- 1− α α
Probability of correct answer: (β(z) + 1− α(z))/2Can be used to find the optimal z.
11 / 54
ROC curves
Discriminate between response distributions P(r−) and P(r+).The Receiver Operating Characteristic (ROC) gives graphical intuition:
Vary decision threshold and measure error rates.Larger area under curve means better discriminability.Shape relates to underlying distributions.
12 / 54
[Britten et al., 1992]
P(correct) =
∫ 1
0βdα
When responses are Gaussian:
P(correct) =12
erfc(< r >− − < r >+
2σ
)13 / 54
Population decoding
[Dayan and Abbott (2001) after Theunissen and Miller (1991)]
Cricket Cercal System: Information about wind direction is encoded byfour types of neurons (
f (s)
rmax
)= [cos(s − sa)]+
14 / 54
Let ca denote a unit vector in the direction of sa, and v be a unitvector parallel to the wind velocity(
f (s)
rmax
)= [v · ca]+
Crickets are Cartesian, 4 directions 45◦, 135◦, −135◦, −45◦
Population vector is defined as
vpop =4∑
a=1
(r
rmax
)a
ca
15 / 54
Vector method of decoding
[Dayan and Abbott (2001) after Salinas and Abbott (1994)]
16 / 54
Primary Motor Cortex (M1)
Certain neurons in M1 of the monkey can be described by cosinefunctions of arm movement direction (Georgopoulos et al, 1982)Similar to cricket cercal system, but note:
Non-zero offset rates r0(f (s)− r0
rmax
)= v · ca
Non-orthogonal: there are many thousands of M1 neurons thathave arm-movement-related tuning curves
17 / 54
[Dayan and Abbott (2001) after Kandel et al (1991)]
18 / 54
Optimal Decoding
p(s|r) =p(r|s)p(s)
p(r)
Maximum likelihood decoding (ML): s = argmaxs p(r|s)
Maximum a posteriori (MAP): s = argmaxs p(s)p(r|s)
Note these two are equivalent if p(s) is flat.Bayes: mimimize loss
sB = argmins∗
∫s
L(s, s∗)p(s|r)ds
For squared loss L(s, s∗) = (s − s∗)2, optimal s∗ is posteriormean, sB =
∫s p(s|r)s.
19 / 54
Optimal Decoding for the cricket
For the cercal system, assuming indep. noise
p(r|s) =∏
a
p(ra|s)
where each p(ra|s) is modelled as a Gaussian with means andvariancesp(s) is uniform (hence MAP=ML)ML decoding finds a peak of the likelihoodBayesian method finds posterior meanThese methods improve performance over the vector method (butnot that much, due to orthogonality...)
20 / 54
Cricket Cercal System
[Dayan and Abbott (2001) after Salinas and Abbott (1994)]
21 / 54
General Consideration of Population Decoding
[Dayan and Abbott (2001)]
Gaussian tuning curves.
22 / 54
Poisson firing model over time T , count na = raT spikes.
p(r|s) =N∏
a=1
(fa(s)T )na
na!exp(−fa(s)T )
log p(r|s) =N∑
a=1
na log fa(s) + . . .
The terms in . . . are independent of s, and we assume∑
a fa(s) isindependent of s (all neurons sum to the same average firing rate).
23 / 54
ML decoding
sML is stimulus that maximizes log p(r|s), determined by
N∑a=1
raf ′a(sML)
fa(sML)= 0
If all tuning curves are Gaussian fa = A exp[−(s − sa)2/2σ2w ] then
sML =
∑a rasa∑
a ra
which is simple and intuitive, known as Center of Mass (cfpopulation vector)
24 / 54
Accuracy of the estimator
Bias and variance of an estimator sest
best (s) = 〈sest〉 − sσ2
est (s) = 〈(sest − 〈sest〉)2〉〈(s − sest )
2〉 = b2est (s) + σ2
est
Thus for an unbiased estimator, MSE 〈(s − sest )2〉 is given by σ2
est ,the variance of the estimator
25 / 54
Fisher information
Fisher information is a measure of the curvature of the loglikelihood near its peak
IF (s) =
⟨−∂
2 log p(r|s)
∂s2
⟩s
= −∫
drp(r|s)∂2 log p(r|s)
∂s2
(the average is over trials measuring r while s is fixed)Cramér-Rao bound says that for any estimator[Cover and Thomas, 1991]
σ2est ≥
(1 + b′est (s))2
IF (s)
efficient estimator if σ2est =
(1+b′est (s))2
IF (s).
In the bias-free case an efficient estimator σ2est = 1/IF (s).
ML decoder is typically efficient when N →∞.26 / 54
Fisher information
In homogeneous systems IF indep. of s.
More generally Fisher matrix (IF )ij(s) =⟨−∂2 log p(r|s)
∂si∂sj
⟩s.
Taylor expansion of Kullback-LeiblerDKL(P(s),P(s + δs)) ≈
∑ij δsiδsj(IF )ij
Not a Shannon information measure (not in bits), but related toShannon information in special cases,e.g.[Brunel and Nadal, 1998, Yarrow et al., 2012].
27 / 54
Fisher information for a population
For independent Poisson spikers
IF (s) =
⟨−∂
2 log p(r|s)
∂s2
⟩= T
∑a
〈ra〉
((f ′a(s)
fa(s)
)2
− f ′′a (s)
fa(s)
)
For dense, symmetric tuning curves, the second term sums to zero.Using fa(s) = 〈ra〉 we obtain
IF (s) = T∑
a
(f ′a(s))2
fa(s)
For dense fa(s) = Ae−(s−s0+a.ds)2/2σ2w with density ρ = 1/ds, sum
becomes integralIF =
√2πTAρ/σw
28 / 54
For Gaussian tuning curves
[Dayan and Abbott (2001)]
Note that Fisher information vanishes at peak as f ′a(s) = 0 there.Can be used to create optimal tuning curves,[Dayan and Abbott, 2002] chapter 3.3.Discriminability d ′ = ∆s
√IF (s) for a small ∆F .
29 / 54
FI predicts human performance
[Dayan and Abbott (2001)]
Orientation discrimination for stimuli of different size (different N)Solid line: estimated minimum standard deviation at the CramerRao boundTriangles: human standard deviation as function of stimulus size(expressed in N)
30 / 54
Slope as strategy
From paper on bat echo location [Yovel et al., 2010] )31 / 54
Spike train decoding
Dayan and Abbott §3.4Estimate the stimulus from spike times ti to minimize e.g.〈s(t)− sest (t)〉2
First order reconstruction:
sest (t − τ0) =∑
ti
K (t − ti)− 〈r〉∫
dτK (τ)
The second term ensures that 〈sest (t)〉 = 0Delay τ0 can be included to make decoding easier: predictstimulus at time t − τ0 based on spikes up to time t
32 / 54
Causal decoding
Organism faces causal (on-line) decoding problem.Prediction of the current/future stimulus requires temporalcorrelation of the stimulus. Example: in head-direction systemneural code correlates best with future direction.Requires K (t − ti) = 0 for t ≤ ti .
sest (t − τ0) =∑
ti
K (t − ti)− 〈r〉∫
dτK (τ)
Delay τ0 buys extra time
33 / 54
Causal decoding
Delay τ0 = 160 ms. (B: shifted/causal kernel, C: non-causal kernel)At time t estimate s(t − τ0):Spikes 1..4: contribute because stimulus is correlated (right tail of K)Spikes 5..7: contribute because of τ0Spikes 8, 9,... : have not occurred yet. Stimulus uncorrelated: Kernelfrom STA [Dayan and Abbott (2001)]
34 / 54
Decoding from a GLM
0 25 50 75 100 125 150 175 2002
0
2
4stimulusSTAGLM
0 25 50 75 100 125 150 175 200
0.0
2.5
5.0
7.5
0 25 50 75 100 125 150 175 200Time bin
5
10
s
35 / 54
H1 neuron ofthe flySolid line isreconstructionusing acausalfilterNote,reconstructionquality willdepend onstimulus
[Dayan and Abbott (2001) after Rieke et al (1997)]
36 / 54
MAP estimation
According to Bayes theorem
p(s|r) = p(r |s)p(s)/p(r)
log p(s|r) = log p(r |s) + log p(s) + c
Tractable as long as rhs. is concave, which excludes heavy tailed p(s).Requires to compute:
s = argmaxs log p(s|r)
This is numerically hard, see [Pillow et al., 2011] for methods.
37 / 54
MAP decoding simulation
[Pillow et al., 2011]
38 / 54
Conclusion stimulus reconstruction
Stimulus reconstruction similar to encoding problem. ButResponse is given, can not be choosen to be whiteImposing causality adds realism but reduces quality
The reconstruction problem can be ill-posed. It is not alwayspossible to reconstruct stimulus (cf dictionary). For instance:complex cell.Still, the cell provides information about the stimulus. Could try toread the code, rather than reconstruct the stimulus (e.g. idealobserver)
39 / 54
Readout of Object Identity from Macaque IT Cortex
[Hung et al., 2005]Recording from ∼ 300 sites in the Inferior Temporal (IT) cortexPresent images of 77 stimuli (of different objects) at variouslocations and scales in the visual field.Task is to categorize objects into 8 classes, or identify all 77objectsPredictions based on one-vs-rest linear SVM classifiers, usingdata in 50 ms bins from 100 ms to 300 ms after stimulus onset
40 / 54
[Hung et al., 2005]
41 / 54
What does this tell us?
Performance of classifiers can provide a lower bound on theinformation available in the population activity.Assuming independence, correlations are ignored. Correlationcould limit or enhance information.Distributed representationLinear classifier can plausibly be implemented in neural hardware
42 / 54
4. Population Encoding
Dayan and Abbott §3.3Population encoding uses a large number of neurons to representinformationAdvantage 1: reduction of uncertainty due to neuronal variability(Improves reaction time).Advantage 2: Ability to represent a number of different stimulusattributes simultaneously (e.g. in V1 location and orientation).
43 / 54
Population codes and correlations: Retina
Fit coupled GLM-model (see encoding) to retina data
[Pillow et al., 2008]
44 / 54
Population codes and correlations: Retina
[Pillow et al., 2008]
45 / 54
Hippocampal Place Cell Decoding
[Brown et al., 1998]Encoding in place cells: modelled as inhomogeneous Poissonprocesses with Gaussian receptive fields (incl. theta oscillations),assuming independenceEncoding as path: random walkBayesian filter decoding: compute posterior and then updateestimate in next time step using information from spikes (combineprior and new information).Non-Bayesian decoding: compute estimate in time steps givenspikes in time window (only current information).
46 / 54
[Brown et al., 1998]
Bayes, maximum likelihood, linear kernel and correlation based.47 / 54
Example: Motor decoding
[Shpigelman et al., 2005]
Rhesus monkey, 43 electrodes in M1Monkey controls cursors on a screen using two manipulanda toperform a centre-out reaching taskPredict hand velocity based on 10 time bins, each of length 100ms in all 43 neurons.Can use linear regression, polynomial regression, Gaussiankernel (support vector regression), spikernel (allows time warping)More sophisticated methods outperform linear regression, butlinear is already decent
State-of-the-art w. Kalman filters [Gilja et al., 2012]
48 / 54
[Shpigelman et al., 2005]
49 / 54
Decoding from your brain
[Miyawaki et al., 2008]
fMRI, linear decoder from voxel activations.50 / 54
Summary
Discrimination between stimuli (or actions) using just recordedspikes is possible and tractable.Full reconstruction of a stimulus is hard, especially when stimulusdimensionality is high.But also unclear to what extent this is this even possible.Decoding can tell us how much information neural activity carriesabout the outside world, it provides an upper bound.BMI applications are on the horizon.
51 / 54
References I
Britten, K. H., Shadlen, M. N., Newsome, W. T., and Movshon, J. A. (1992).The analysis of visual motion: a comparison of neuronal and psychophysical performance.J Neurosci, 12:4745–4765.
Brown, E. N., Frank, L. M., Tang, D., Quirk, M. C., and Wilson, M. A. (1998).A statistical paradigm for neural spike train decoding applied to position prediction fromensemble firing patterns of rat hippocampal place cells.Journal of Neuroscience, 18(18):7411–7425.
Brunel, N. and Nadal, J.-P. (1998).Mutual information, Fisher information, and population coding.Neural Comp., 10:1731–1757.
Cohen, M. R. and Newsome, W. T. (2009).Estimates of the contribution of single neurons to perception depend on timescale andnoise correlation.J Neurosci, 29(20):6635–6648.
Cover, T. M. and Thomas, J. A. (1991).Elements of information theory.Wiley, New York.
Dayan, P. and Abbott, L. F. (2002).Theoretical Neuroscience.MIT press, Cambridge, MA.
52 / 54
References II
Gilja, V., Nuyujukian, P., Chestek, C. A., Cunningham, J. P., Yu, B. M., Fan, J. M.,Churchland, M. M., Kaufman, M. T., Kao, J. C., Ryu, S. I., and Shenoy, K. V. (2012).A high-performance neural prosthesis enabled by control algorithm design.Nat Neurosci, 15(12):1752–1757.
Hung, C. P., Kreiman, G., Poggio, T., and DiCarlo, J. J. (2005).Fast Readout of Object Identity from Macaque Inferior Temporal Cortex.Science, 310:863–866.
Miyawaki, Y., Uchida, H., Yamashita, O., Sato, M.-a., Morito, Y., Tanabe, H. C., Sadato, N.,and Kamitani, Y. (2008).Visual image reconstruction from human brain activity using a combination of multiscalelocal image decoders.Neuron, 60(5):915–929.
Pillow, J. W., Ahmadian, Y., and Paninski, L. (2011).Model-based decoding, information estimation, and change-point detection techniques formultineuron spike trains.Neural computation, 23(1):1–45.
Pillow, J. W., Shlens, J., Paninski, L., Sher, A., Litke, A. M., Chichilnisky, E. J., andSimoncelli, E. P. (2008).Spatio-temporal correlations and visual signalling in a complete neuronal population.Nature, 454(7207):995–999.
53 / 54
References III
Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W. (1996).Spikes: Exploring the neural code.MIT Press, Cambridge.
Shpigelman, L., Singer, Y., Paz, R., and Vaadia, E. (2005).Spikernels: Predicting Arm Movements by Embedding Population Spike Rate Patterns inInner-Product Spaces.Neural Comput, 17:671–690.
Yarrow, S., Challis, E., and Seri?s, P. (2012).Fisher and Shannon information in finite neural populations.Neural Comput, 24(7):1740–1780.
Yovel, Y., Falk, B., Moss, C. F., and Ulanovsky, N. (2010).Optimal localization by pointing off axis.Science, 327(5966):701–704.
54 / 54