decoding speech with ecog – computational challenges

Decoding Imagined Speech from ECoG

Decoding Speech with ECoG Computational ChallengesChris HoldgrafHelen Wills Neuroscience Institute, UC BerkeleyMention that Im a second year graduate student1Challenge in neuroscienceNeuroscience is a very broad field. It covers everything from gene expression, to a single neuron firing, to activity across the whole brain in humans.As such, one must have a wide range of knowledge and a diverse set of techniques.Often makes it hard to have the best domain-specific knowledge.Mapping the world onto the brainThe trick is to fit some function that links brain activity with the outside world.However, we also want it to be a function that is scientifically meaningful.

Note that my lab studies audition, but this could be almost any kind of stimulus in principle.3Neuroscience/Psychology and computationHistorically, there has been a focus on tightly-controlled experiments and simple questions.Advances in imaging and electrophysiological methods have increased the quality and quantity of data.

Electrocorticography a blend of temporal and spatial resolutionECoG involves the application of electrodes directly to the surface of the brain.This avoid many problems with EEG, while retaining the rich temporal precision of the signal.

Complex and noisy data requires careful methodsECoG is only possible in those with some sort of pathology. Moreover, recording time is short.Data driven methods bad data in = bad models out.

Merging ECoG and Computational MethodsMight be possible to leverage the spatial precision of ECoG to decode the nature of this processing.

7Challenge 1: GLMs in NeuroscienceComputational Challenge #1How to fit a model that is both interpretable and a good fit for the electrodes response.

The parameter space is increasingly complex for more hypotheses.Oftentimes, this is paired with a limited dataset. Especially in ECoG.Regularization and Feature Selection become very importantWant it simple? Use a GLM!Linear models allow us to predict some output with a model that is both interpretable and (relatively) easy to fit.

One problem with thisHowever, the brain assuredly does not vary linearly in response to inputs from the outside world.

Basis functionsInstead, we can decompose an input stimulus as a combination of basis functionsBasically, this entails a non-linear transformation of your stimulus, so that fitting linear models to brain activity makes more sense.

Exploring the brain through basis functions

doghatcarman

Fitting weights with gradient descentWe can find the values for these weights by following the typical least-squares regression approach.Early stopping must be tuned carefully in order to regularize.Full gradient descentCoordinate gradient descentThreshold gradient descent

An application of the GLM for neural decodingNeural DecodingIf you can map stimuli onto brain activity, then you could also map brain activity onto stimuli.Same approach, but now our inputs are values from the electrodes, and the output is sound.Implications in Neural Prostheses and Brain Computer Interfaces

Speech DecodingDecoding with a linear model

Original Spectrogram

Reconstructed Spectrogram High Gamma Neural Signal

Decoding Model=X

Pasley et al. Plos Biology, 2012Decoding Listened SpeechHigh Gamma (60-200 Hz)

Speech Reconstruction from ECoG

91% accuracy in classifying words based on reconstructionMake note of higher density grids being more useful19Challenge 2: From model output to languageChallenge #2Turn a noisy, variable spectrogram reconstruction into linguistic output.

Simpler methods are often not powerful enough to account for these small variationsHow to take advantage of temporal correlations between words / phonemes?How to accomplish this without a ton of data?How to classify this output?

TownDoubtPropertyPencil

22From model output to languageBorrow ideas from the speech recognition literature.Currently using Dynamic Time Warping to match output spectrograms to words.

Dynamic Time WarpingCompute dissimilarity matrix between every pair of elements Find the optimal path in order to minimize the overall accumulated distanceEffectively warps and realigns the two signals

Current output workflow

ReconstructedSpectrogramDTWDoesnt assume any linguistic information language

91% accuracy in classifying words based on reconstructionMake note of higher density grids being more useful

25Where to go from here?Improving the decoder fitClever methods of dealing with finite and noisy datasetsFinding better features (basis functions)Interactions between featuresFitting more complicated modelsInteractions between featuresNonlinear models are useful for engineering, but require much more data

Turning output into reconstructed languageLeverage the spectro-temporal statistics of languageFocus on a classification rather than arbitrary decoding/ch//ks//w//g/

The Big Data AngleRight now, the field of ECoG is in a bit of a transition periodExcitement around using computational methods, but many labs (including my own) dont have the infrastructure and culture to tackle big data problems.That said, we do have the potential to collect increasingly large datasets, once we know what to do with them.Streaming and online problems useful in BCI29The Long-Term GoalCreate a modeling framework that allows us to use ECoG to decode linguistic information.

This decoder should be flexible and generalizable, it should not be limited to a small number of words.It should be robust and insensitive to small perturbations of each word over multiple repetitions.It should allow the patient to interact with the outside world in a way that they couldnt before.

30Fellow DecodersSpecial thanksFrederic Theunissen and co.Jack Gallant and co.STRFLab Team

StphanieBrian

Gerv

Eddie

PeterMake another methods slide that explains the model building in the covert conditionInclude a big ECoG intro picture31Linguistic features for model outputHidden Markov Models allow us to model spectrogram output as a function of hidden statesCapture the probabilistic nature of spectrogram output for a given word, as well as the temporal correlations between components of that word.

/ch//ks//w//g/

Phonemic, lexical, and potentially semantic information to improve fit32Designing stimulus setsData collection is very rare were lucky if we get 2 subjects per month.Need to be clever about how we design our behavioral tasks.Stimuli must be rich, and ideally could be used to answer many different questions.Need to think about what kind of stimuli we need in order to achieve the goal we prioritize.E.g., classification vs. regressionThe Big Data AngleWe have access to some ECoG recordings of a patient simply sitting in a room with a microphone placed nearby.These are often >24 hours long, and include a wide range of sounds and speech.Being able to parse through this data might allow us to fit increasingly complicated models, and vastly improve the speech recognition approach.

decoding speech with ecog – computational challenges

Documents

brain activity

ecog decoding speech

spatial precision of

fitting linear models

data driven methods

gradient descentwe

bad models

noisy data