once size does not fit all: regressor and subject specific techniques for predicting experience in...

12
Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg Stephens & The Princeton EBC Team

Upload: charlene-anderson

Post on 19-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

How do we learn in a very high dimensional setting (~35K voxels) ? Focus on informative areas: choose voxels by correlation thresholding, searchlight Look for global modes: whole brain, PCA, euclidean distance kernel, searchlight kernel without thresholding Advantage: improves stability by pooling over larger areas Disadvantage: correlated noisy areas that do not carry any information may bias the predictor Advantage: ignore areas that are mostly noise Assumes that information is localized, and feature selection method is stable LOCALGLOBAL

TRANSCRIPT

Page 1: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

Once Size Does Not Fit All:Regressor and Subject Specific

Techniques for Predicting Experience in Natural

EnvironmentsDenis Chigirev, Chris Moore, Greg

Stephens & The Princeton EBC Team

Page 2: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

How do we learn in a very high dimensional setting (~35K voxels) ?

Look for linear projection(s):

linear regression, ridge regression, linear SVM

How to control for complexity?

Loss function:

quadratic

linear, hinge

Prior

(regularization)

Create a “look-up table”:

nonlinear kernel methods, kernel ridge regression, RKHS, GP, nonlinear SVM

Need similarity measure between brain states (i.e. kernel) & regularization

Assumes “clustering” of similar states

Advantage: pools together many weak signals

Assumes regressor continuity along paths of data points

weights

similarity measure

LINEAR NONLINEAR

Page 3: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

How do we learn in a very high dimensional setting (~35K voxels) ?

Focus on informative areas:

choose voxels by correlation thresholding, searchlight

Look for global modes:

whole brain, PCA, euclidean distance kernel, searchlight kernel without thresholding

Advantage: improves stability by pooling over larger areas

Disadvantage: correlated noisy areas that do not carry any information may bias the predictor

Advantage: ignore areas that are mostly noise

Assumes that information is localized, and feature selection method is stable

LOCAL GLOBAL

Page 4: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

Different methods emphasize different aspects of the learning problem

Linear Nonlinear

Local Corr. thresh& ridge, searchlight & ridge

Searchlight RKHS

Global PCA & ridge Euclidean RKHS

Page 5: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

Ridge Regression using ALL voxels

Difference of means (centroids):

Linear regression solution:

Ridge regression solution:

w=hxiy

w=C ¡ 1hxiy

w= (C +¸I )¡ 1hxiy

• Regularization allows to use all ~ 30K voxels

• Centroids are well estimated (1st order statistic), but covariance matrix is 2nd order, therefore requires regularization

Page 6: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

Whole Brain Ridge Regression

Keeping only large eigenvalues of covariance matrix (i.e. PCA-type compexity control) is MUCH LESS effective than ridge regularization.

Page 7: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

Reproducing Kernel Hilbert Space (RKHS) T. Poggio

Instead of looking for linear projections (ridge regression, SVM w/ linear kernel), use the measure of similarity between brain states to project the new brain state onto existing ones in feature space.

y(x) = Pi ciK (xi ;x) where (number of

TRs)

(NT R °I +K )c= y

i = 1::NT R

learn “support” coefficients by solving this equation, where represents regularization in feature space.

°c

(aka Kernel Ridge Regression, if use gaussian kernel recover mean GP solution)

We choose where is the distance

between brain states. We use Euclidean distance and searchlight distance.

K (xi ;xj ) = e¡ d2i j =2¾2 di j

Page 8: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

This framework allows the similarity measure between different brain states to be tested for their use in prediction

data predictionHow similar are the brain states?

Learning algorithm

(SVM, RKHS, etc. – choice of regularization and loss )

(euclidean distance, mahalanobis, searchlight, earth movers?)

K (xi ;xj ) = e¡ d2i j =2¾2 y(x) = Pi ciK (xi ;x)

This allows to assess independently the quality of brain state similarity measure and the quality of the learning procedure.

Euclidean measure (default), in practice, performs relatively well.

Page 9: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

Basics of Searchlight

which pair of brain states is further apart?

d2i j = (xi ¡ xj )C ¡ 1(xi ¡ xj )Mahalanobis distance:

more different

less different

(d®ij )2 = (x®i ¡ x®j )C ¡ 1® (x®i ¡ x®j )

Problem: amplifies poorly estimated dimension for whole brain states.

Solution: apply locally to 3x3x3 supervoxel and then sum individual contributions

here is a 3x3x3 “supervoxel”.x®i ;®= 1::NvoxThen the distance between brain states can be computed as a weighted average:

d2i j =P N vox

®=1 b®(d®ij )2

We used to find that this solution is now self-regularizing, i.e. one can take the complexity penalty to zero.

b®=1

Page 10: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

Why might searchlight help? (hint: stability!)

m2

m1

m1

m2

voxel correlation with feature (movie1 & movie2)

Threshold voxel correlation with feature (movie1 & movie2)

searchlight correlation with feature (movie1 & movie2)

Threshold searchlight corr with feature (movie1 & movie2)

m1

The projection learned by linear ridge is only as good as the stability of the underlying voxel correlations with the regressor.

Searchlight distance versus Euclidean distance, tested in RKHS

Page 11: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

Different methods emphasize different aspects of the learning problem

Linear Nonlinear

Local Correlation thresholding, ridge complexity control (Chigirev et al. PBAIC 2006, implemented as part of a public MVPA matlab toolbox)

Weighted searchlight RKHS allows to zoom on areas of interest – future work!

Global SVD trick allows to compute 30k x 30k covariance matrix, ridge regularization outperforms PCA as complexity control.

Eucledian RKHS (Kernel Ridge) may be slightly improved by considering global searchlight kernel as similarity measure, has remarkable self-regularization property.

Page 12: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg

I would like to thank my collaboraters: Chris Moore*, Greg Stephens, Greg Detre, Michael Bannert

as well as Ken Norman and Jon Cohen for supporting Princeton EBC Team.