spike sorting i: bijan pesaran new york university

Spike Sorting I:

Bijan Pesaran

New York University

Acknowledgements

• Ken Harris and Samar Mehta at Neuroinformatics course Woods Hole.

Aims

We would like to …

• Monitor the activity of large numbers of neurons simultaneously

• Know which neuron fired when

• Know which neuron is of which type

• Estimate our errors

Primate retinal ganglion cells, courtesy of the lab of Dr. E.J. Chichilnisky

THE PROBLEM: Multiple Neural Signals

-400

-200

0

200

3 msec

4.64 4.66 4.68 4.7 4.72 4.74 4.76

-400

-300

-200

-100

0

100

200

300

Time (sec)

Vol

tage

(A

/D L

evel

s)

0 1 2 3 4 5 6 7 8 9 10

-400

-200

0

200

Time (sec)

Vol

tage

(A

/D L

evel

s)

THE GOAL: Spike Times of Single Neurons

Time (sec)

Spike Detector

4.5 4.55 4.6 4.65 4.7 4.75 4.8 4.85 4.9 4.95 5

Neuron #1 Spikes

Neuron #2 Spikes

-400

-200

0

200

Raw Data

Region from previous slide

THE ‘GRADUATE STUDENT’ ALGORITHM

4.5 4.55 4.6 4.65 4.7 4.75 4.8 4.85 4.9 4.95 5

-400

-300

-200

-100

0

100

200

300

Time (sec)

Vol

tage

(A

/D L

evel

s)

Raw Data

Threshold detector at 32

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

-600

-500

-400

-300

-200

-100

0

100

200

300

Time (msec)

Vol

tage

(A

/D L

evel

s)

Candidate Waveforms

Spike Height vs. Width Plot

-700 -600 -500 -400 -300 -200 -1000

0.2

0.4

0.6

Wid

th (

mse

c)

Height (A/D Levels)

0 10 20 30 40 500

50

100

150

200

250

# of

Inte

rval

s

Time (msec)

Interspike Interval Histogram

1 2 3 40

50

100

150

200

A GENERAL FRAMEWORK

Locate Spikes Preprocess Waveforms

Density Estimation

Spike Classification

Quality Measures

Extracellular Recording Hardware

• You can buy two types of hardware, allowing

• Wide-band continuous recordings

• Filtered, spike-triggered recordings

The Tetrode

• Four microwires twisted into a bundle

• Different neurons will have different amplitudes on the four wires

Raw Data

Spikes

High Pass Filtering

• Local field potential is primarily at low frequencies.

• Spikes are at higher frequencies.

• So use a high pass filter. 800hz cutoff is good.

Filtered Data

Cell 1

Cell 2

Spike Detection

• Locate spikes at times of maximum extracellular negativity

• Exact alignment is important: is it on peak of largest channel or summed channels?

Data Reduction

• We now have a waveform for each spike, for each channel.

• Still too much information!

• Before assigning individual spikes to cells, we must reduce further.

Principal Component Analysis

• Create “feature vector” for each spike.

• Typically takes first 3 PCs for each channel.

• Do you use canonical principal components, or new ones for each file?

“Feature Space”

Cluster Cutting

• Which spikes belong to which neuron?

• Assume a single cluster of spikes in feature space corresponds to a single cell

• Automatic or manual clustering?

Cluster Cutting Methods

• Purely manual – time consuming, leads to high error rates.

• Purely automatic – untrustworthy.

• Hybrid – less time consuming, lowest error rates.

Semi-automatic Clustering

How Do You Know It Works?

• We can split waveforms into clusters, but are we sure they correspond to single cells?

• Simultaneous intra- and extra-cellular recordings allow us to estimate errors.

• Quality measures allow us to guess errors even without simultaneous intracellular recording.

Intra-extra Recording• Simultaneous recording with a wire

tetrode and glass micropipette.

Intra-extra Recording

Extracellular waveform is almost minus derivative of intracellular

Bizarre Extracellular Waveshapes

Model Experiment

Two Types of Error

• Type I error (false positive) – Incorrect inclusion of noise, or spikes of other

cells

• Type II error (false negative)– Omission of true spikes from cluster

• Which is worse? Depends on application…

Manual Clustering Contest

Best Ellipsoid Error Rates

Find ellipsoid that minimizes weighted sum of Type I and Type II errors.

Must evaluate using cross-validation!

Humans vs. B.E.E.R.

Waveshape Helps Separation

Why were human errors higher?

• To understand this, try to understand why clusters have the shape they do

• Simplest possibility: spike waveform is constant, cluster spread comes from background noise

• Are clusters multivariate normal?

Problem: Overlapping Spikes

Problem: Cellular Synchrony

Problem: Bursting

Problem: Misalignment

• When you have a spike whose peak occurs at different times on different channels, it can align on either.

• This causes the cluster to be split in two.

Problem: Dimensionality

Manual clustering only uses 2 dimensions at a time

BEER measure can use all of them

“Semi-Automatic” Clustering

•Uses all dimensions at once

•Errors should be lower

•Still requires human input

Semi-automatic Performance

Software: KlustaKwik• Mixture of Gaussians, unconstrained

covariance matrices

• Speed is crucial

• CEM Algorithm – faster than EM

• Most probabilities not calculated

• Local maxima result in over- and under-clustering

• Split and merge features to tunnel out of local maxima

• Still requires supercomputer resources.

klustakwik.sourceforge.net

Software: Klusters

Recluster Feature

Ergonomic Design

Auto/Cross correlograms

Grouping Assistant

Waveforms

Timecourse

klusters.sourceforge.net

Cluster Quality Measures

• Would like to automatically detect which cells are well isolated.

• BEER measure needs intracellular data, which we don’t have in general.

• Will define two measures that only use extracellular data.

Isolation Distance

Size of ellipsoid within which as many spikes belong to our cluster as not

L_ratio

21ratio clusternoise

L cdf N

False Positives and Negatives

Which Measure to Use?

• Isolation distance correlates with false positive error rates– Measures distance to other clusters

• L_ratio correlates with false negative error rates– Measures number of spikes near cluster

boundary

Conclusions

• Automatic clustering will save time and reduce errors.

• Errors can be as low as ~5%.

• Quality measures give you a feeling of how bad your errors are.

Room for Improvement

• Make it faster

• Improved spike detection and alignment

• Quality measures that estimate % error

• Fully automatic sorting

• Resolve overlapping spikes

Easy

Hard

spike sorting i: bijan pesaran new york university

Documents

previous slide slide

feature space slide

spikes neuron

raw data spikes

individual spikes

single cluster of spikes

spike detection

spikes raw data region