interpreting predictive models in terms of materials/pr4ni_schrouff_jessica.pdf- interpreting...

Pattern Recognition for NeuroImaging (PR4NI)

Interpreting predictive

models in terms of

anatomically labelled regions

J. Schrouff

Stanford University, CA, USA

J. Schrouff

Stanford University, CA, USA

Data

Feature

extraction/selection

Model

specification

Accuracy and

significance

Model

interpretation

Pattern Recognition for NeuroImaging (PR4NI) 2

Where in the brain is the

information of interest?

i.e. Which brain areas

contribute most to the

model’s prediction?

Data

Feature


Model

specification

Accuracy and

significance

Model

interpretation

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 3

Computing “weight maps”:

- From linear (kernel) methods

- Attributing one weight per feature

- For an SVM model:

𝐰 =

𝑖=1

𝑛

𝑦𝑖𝛼𝑖∗ 𝐱𝐢

J. Schrouff


Thresholding weight maps:

Can we choose to look only at the largest contributions?

- Linear combination 𝑓 x𝒊 = w, 𝐱𝐢 + 𝑏- As in an election

- Each vote counts

110 1 1 8 1 1 1 1

J. Schrouff


Interpreting weight maps:

What do the model parameters represent?

- Voxel 1: 𝑠 𝑛 + 𝑑(𝑛)- Voxel 2: 𝑑(𝑛)

𝑤(1) 𝑤(2)Weights:

Voxel 1 Voxel 2

Haufe et al., 2014: weight patterns versus weight filters

(forward versus backward model)

J. Schrouff


1 -1

Different strategies:

1. A priori knowledge: ROI selection

2. Searchlight

3. Feature selection

4. Sparse algorithms

5. Multiple Kernel Learning (MKL)

6. Permutation tests

7. A posteriori knowledge: weight summarization

J. Schrouff


Different strategies:

1. A priori knowledge: ROI

selection

2. Searchlight



5. Multiple Kernel Learning

(MKL)

6. Permutation tests

7. A posteriori knowledge:

weight summarization

8

Data

Feature


Model

specification

Accuracy and

significance

Model

interpretation

J. Schrouff



e.g. Choosing voxels/features based on literature or on

previous univariate results (independent dataset).

J. Schrouff



+ -Good accuracy? Need an anatomical

hypothesis

Anatomically/ functionally

relevant

Cannot threshold obtained

map

Arbitrary choice of region

number and size

Multiple models = multiple

comparisons

J. Schrouff


2. Searchlight (Kriegeskorte et al., 2006)

J. Schrouff


2. Searchlight (Kriegeskorte et al., 2006)

+ -Threshold for each

neighborhood

Not anatomically/functionally

relevant (spheres)

Not a weight filter Locally multivariate

Arbitrary choice of

neighborhood size

Multiple models = multiple

comparisons

Time-consuming

J. Schrouff


Review: Etzel et al., 2013


Rank the features according to a specific criterion and

select a subset.

Example:

- Univariate t-test

- Recursive Feature Elimination (RFE, De Martino, 2008)

based on SVM weights

J. Schrouff



+ -Data-driven Not anatomically/functionally

relevant (voxel sparse)

Time-consuming: nested CV

Depends on selected

technique: flaws in feature

selection are reflected in

model. user choice must

be informed.

Different strategies = different

patterns with same accuracy.

J. Schrouff



Algorithms with regularization constraints such that the

weight of some features is perfectly null.

Examples:

- LASSO (Tibshirani, 1996)

- Elastic-net (Zou and Hastie, 2005)

J. Schrouff




relevant (voxel sparse)

Embedded in model Stability of selected voxels

(Baldassarre, et al., 2012):

different regularization terms

= different patterns for same

accuracy.

J. Schrouff



Note: stability selection (Varoquaux et al., 2012, Rondina

et al., 2014).


relevant

Embedded in model Stability of selected voxels

(Baldassarre, et al., 2012):

different regularization terms

= different patterns for same

accuracy.

J. Schrouff



Build one kernel per anatomically defined region as input

of a (sparse) MKL algorithm (Filippone, 2012).

𝐾 𝒙, 𝒙′ = 𝑚=1𝑀 𝑑𝑚𝐾𝑚 (𝒙, 𝒙

′)

with 𝑑𝑚 ≥ 0, 𝑚=1𝑀 𝑑𝑚 = 1

J. Schrouff



Decision function of an MKL problem:

𝑓 x𝒊 = 𝑚 w𝒎, 𝐱𝐢 + 𝑏

Considered algorithm (Rakotomamonjy et al., 2008):

min1

2

𝑚

1

𝑑𝑚w𝑚

2 + 𝐶

𝑖

𝜉𝑖

𝑠. 𝑡. 𝑦𝑖

𝑚

𝐰𝒎, 𝐱𝐢 + 𝑏 ≥ 1 − 𝜉𝑖 ∀𝑖

𝜉𝑖 ≥ 0 ∀𝑖,

𝑚

𝑑𝑚 = 1, 𝑑𝑚 ≥ 0 ∀𝑚

19J. Schrouff



Weight map of an MKL model:

𝐰𝒎 = 𝑑𝑚

𝑖=1

𝑛

𝑦𝑖𝛼𝑖 𝐱𝐢

But also 𝑑𝑚

Can be considered as a hierarchical approach

20J. Schrouff



J. Schrouff



+ -Anatomically/ functionally

relevant

Stability of the selected

regions

Used the full pattern to

build the model

Depends on atlas

Thresholded and sorted

list of regions according to

dm

Hierarchical view of the

model parameters

J. Schrouff


6. Permutation test

Build weight maps from permutation tests (e.g. Mourão-

Miranda et al., 2005) and associate a p-value to each

weight.

J. Schrouff


Simulated data.

Image from Gaonkar

and Davatzikos,

2013.

6. Permutation test

+ -Map can be thresholded Computationally expensive


build the model

Need to correct for multiple

comparisons (cannot make

the hypothesis of

independent voxels/weights)

J. Schrouff


Note: analytic approximation of null distribution for SVM

(Gaonkar and Davatzikos, 2013).


(Schrouff et al., 2013)

Build weight maps then summarize weights according to

anatomically defined regions (e.g. based on an atlas).

Mathematically:

𝑁𝑊𝑅𝑂𝐼 = 𝑣 ∈𝑅𝑂𝐼 𝑊𝑣#𝑣 ∈ 𝑅𝑂𝐼

J. Schrouff



AAL atlas (Tzourio-Mazoyer,

2002).

J. Schrouff



+ -Anatomically/ functionally

relevant

No thresholding of the sorted

list of regions


build the model

Depends on atlas

Sorted list of regions

according to their

proportion of NWROI

J. Schrouff


How about electrophysiological recordings?

- All strategies explained previously apply

- For MKL, can be applied on each dimension (i.e.

channels, time point, frequency band) or combination

(e.g. channel + frequency band)

J. Schrouff


How about non-linear models?

- Statistical sensitivity maps (Rasmussen et al., 2011) for

SVM, kernel logistic regression and kernel Fisher

discriminant.

sensitivity map (linear) = (weight filter)2

- Use of non-linear kernels in neuroimaging? (Kragel et

al., 2012)

J. Schrouff


Conclusions:

- Interpreting machine learning based models can be

complex.

- Different strategies exist, but always a trade-off between

different parameters:

• Thresholded list of voxels/regions contributing

• Anatomical/functional relevance

• Multivariate power

• Stability of the pattern

• Computational time

Your (informed) choice!

J. Schrouff


Acknowledgments:

- Belgian American Educational Foundation (BAEF)

- Medical Foundation of the Liège Rotary Club

- Stanford University

- University College London

- Wellcome Trust

Figures created using PRoNTo version 2.0

(www.mlnl.cs.ucl.ac.uk/pronto/).

32J. Schrouff


http://www.mlnl.cs.ucl.ac.uk/pronto/

References:- Baldassarre, L., J. Mourao-Miranda, M. Pontil. «Structured Sparsity Models for

Brain Decoding from fMRI data.» Proceedings of the 2nd conference on Pattern

Recognition in NeuroImaging. 2012.

- Gaonkar, B., C. Davatzikos. «Analytic estimation of statistical significance maps

for support vector machine based multi-variate image analysis and

classification.» NeuroImage 78 (2013): 270-283.

- Etzel, J., A. Jeffrey, M. Zacks, T.S. Braver. «Searchlight analysis: promise,

pitfalls, and potential.» Neuroimage 78 (2013): 261-269.

- Filippone, M., A. Marquand, C. Blain, C. Williams, J. Mourao-Miranda, M.

Girolami. «Probabilistic prediction of neurological disorders with a statistical

assessement of neuroimaging data modalities.» Annals of Applied Statistics 6

(2012): 1883-1905.

- Haufe, S., F. Meinecke, K. Görgen, S. Dhäne, J.-D. Haynes, B. Blankertz, F.

Biessmann. «On the interpretation of weight vectors of linear models in

multivariate neuroimaging.» NeuroImage 87 (2014): 96-110.

- Kragel, P., R. Carter, S. Huettel. «What makes a pattern? Matching decoding

methods to data in multivariate pattern analysis.» Front Neurosci. 6 (2012): 162.

- Kriegeskorte, N., R. Goebel, P. Bandettini. «Information-based functional brain

mapping.» PNAS 103 (2006): 3863-3868.

- Mourão-Miranda, J., A. Bokde, C. Born, H. Hampel, M. Stetter. «Classifying brain

states and determining the discriminating activation patterns: Support Vector

Machine on functional MRI data.» NeuroImage 28 (2005): 980-995.

References:- Rakotomamonjy, A., F. Bach, S. Canu, et Y. Grandvalet. «SimpleMKL.» Journal

of Machine Learning 9 (2008): 2491-2521.

- Rasmussen, P., K. Madsen, T. Lund, L.K. Hansen. «Visualization of nonlinear

kernel models in neuroimaging by sensitivity maps.» NeuroImage 55 (2011):

1120-1131.

- Schrouff, J., J. Cremers, G. Garraux, L. Baldassarre, J. Mourão-Miranda, et C.

Phillips. «Localizing and comparing weight maps generated from linear kernel

machine learning models.» Proceedings of the 3rd workshop on Pattern

Recognition in NeuroImaging. 2013.

- Tibshirani, R. «Regression shrinkage and selection via the lasso.» Journal of the

Royal Statistical Society. 58 (1996): 267-288.

- Tzourio-Mazoyer, N., et al. «Automated Anatomical Labeling of activations in

SPM using a Macroscopic Anatomical Parcellation of the MNI MRI single-subject

brain.» NeuroImage 15 (2002): 273-289.

- Varoquaux, G., A. Gramfort, B. Thirion. «Small-sample brain mapping: sparse

recovery on spatially correlated designs with randomization and clustering.»

Proceedings of ICML (2012). icml.cc/discuss/2012/688.html

- Zou, H., et T. Hastie. «Regularization and variable selection via the elastic net.»

J. R. Statist. Soc. B 67 (2005): 301-320.

interpreting predictive models in terms of materials/pr4ni_schrouff_jessica.pdf- interpreting...

Documents