interpreting predictive models in terms of materials/pr4ni_schrouff_jessica.pdf- interpreting...

34
Pattern Recognition for NeuroImaging (PR4NI) Interpreting predictive models in terms of anatomically labelled regions J. Schrouff Stanford University, CA, USA

Upload: others

Post on 08-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

Pattern Recognition for NeuroImaging (PR4NI)

Interpreting predictive

models in terms of

anatomically labelled regions

J. Schrouff

Stanford University, CA, USA

Page 2: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

J. Schrouff

Stanford University, CA, USA

Data

Feature

extraction/selection

Model

specification

Accuracy and

significance

Model

interpretation

Pattern Recognition for NeuroImaging (PR4NI) 2

Page 3: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

Where in the brain is the

information of interest?

i.e. Which brain areas

contribute most to the

model’s prediction?

Data

Feature

extraction/selection

Model

specification

Accuracy and

significance

Model

interpretation

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 3

Page 4: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

Computing “weight maps”:

- From linear (kernel) methods

- Attributing one weight per feature

- For an SVM model:

𝐰 =

𝑖=1

𝑛

𝑦𝑖𝛼𝑖∗ 𝐱𝐢

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 4

Page 5: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

Thresholding weight maps:

Can we choose to look only at the largest contributions?

- Linear combination 𝑓 x𝒊 = w, 𝐱𝐢 + 𝑏- As in an election

- Each vote counts

110 1 1 8 1 1 1 1

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 5

Page 6: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

Interpreting weight maps:

What do the model parameters represent?

- Voxel 1: 𝑠 𝑛 + 𝑑(𝑛)- Voxel 2: 𝑑(𝑛)

𝑤(1) 𝑤(2)Weights:

Voxel 1 Voxel 2

Haufe et al., 2014: weight patterns versus weight filters

(forward versus backward model)

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 6

1 -1

Page 7: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

Different strategies:

1. A priori knowledge: ROI selection

2. Searchlight

3. Feature selection

4. Sparse algorithms

5. Multiple Kernel Learning (MKL)

6. Permutation tests

7. A posteriori knowledge: weight summarization

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 7

Page 8: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

Different strategies:

1. A priori knowledge: ROI

selection

2. Searchlight

3. Feature selection

4. Sparse algorithms

5. Multiple Kernel Learning

(MKL)

6. Permutation tests

7. A posteriori knowledge:

weight summarization

8

Data

Feature

extraction/selection

Model

specification

Accuracy and

significance

Model

interpretation

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 8

Page 9: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

1. A priori knowledge: ROI selection

e.g. Choosing voxels/features based on literature or on

previous univariate results (independent dataset).

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 9

Page 10: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

1. A priori knowledge: ROI selection

+ -Good accuracy? Need an anatomical

hypothesis

Anatomically/ functionally

relevant

Cannot threshold obtained

map

Arbitrary choice of region

number and size

Multiple models = multiple

comparisons

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 10

Page 11: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

2. Searchlight (Kriegeskorte et al., 2006)

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 11

Page 12: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

2. Searchlight (Kriegeskorte et al., 2006)

+ -Threshold for each

neighborhood

Not anatomically/functionally

relevant (spheres)

Not a weight filter Locally multivariate

Arbitrary choice of

neighborhood size

Multiple models = multiple

comparisons

Time-consuming

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 12

Review: Etzel et al., 2013

Page 13: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

3. Feature selection

Rank the features according to a specific criterion and

select a subset.

Example:

- Univariate t-test

- Recursive Feature Elimination (RFE, De Martino, 2008)

based on SVM weights

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 13

Page 14: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

3. Feature selection

+ -Data-driven Not anatomically/functionally

relevant (voxel sparse)

Time-consuming: nested CV

Depends on selected

technique: flaws in feature

selection are reflected in

model. user choice must

be informed.

Different strategies = different

patterns with same accuracy.

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 14

Page 15: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

4. Sparse algorithms

Algorithms with regularization constraints such that the

weight of some features is perfectly null.

Examples:

- LASSO (Tibshirani, 1996)

- Elastic-net (Zou and Hastie, 2005)

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 15

Page 16: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

4. Sparse algorithms

+ -Data-driven Not anatomically/functionally

relevant (voxel sparse)

Embedded in model Stability of selected voxels

(Baldassarre, et al., 2012):

different regularization terms

= different patterns for same

accuracy.

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 16

Page 17: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

4. Sparse algorithms

Note: stability selection (Varoquaux et al., 2012, Rondina

et al., 2014).

+ -Data-driven Not anatomically/functionally

relevant

Embedded in model Stability of selected voxels

(Baldassarre, et al., 2012):

different regularization terms

= different patterns for same

accuracy.

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 17

Page 18: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

5. Multiple Kernel Learning (MKL)

Build one kernel per anatomically defined region as input

of a (sparse) MKL algorithm (Filippone, 2012).

𝐾 𝒙, 𝒙′ = 𝑚=1𝑀 𝑑𝑚𝐾𝑚 (𝒙, 𝒙

′)

with 𝑑𝑚 ≥ 0, 𝑚=1𝑀 𝑑𝑚 = 1

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 18

Page 19: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

5. Multiple Kernel Learning (MKL)

Decision function of an MKL problem:

𝑓 x𝒊 = 𝑚 w𝒎, 𝐱𝐢 + 𝑏

Considered algorithm (Rakotomamonjy et al., 2008):

min1

2

𝑚

1

𝑑𝑚w𝑚

2 + 𝐶

𝑖

𝜉𝑖

𝑠. 𝑡. 𝑦𝑖

𝑚

𝐰𝒎, 𝐱𝐢 + 𝑏 ≥ 1 − 𝜉𝑖 ∀𝑖

𝜉𝑖 ≥ 0 ∀𝑖,

𝑚

𝑑𝑚 = 1, 𝑑𝑚 ≥ 0 ∀𝑚

19J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 19

Page 20: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

5. Multiple Kernel Learning (MKL)

Weight map of an MKL model:

𝐰𝒎 = 𝑑𝑚

𝑖=1

𝑛

𝑦𝑖𝛼𝑖 𝐱𝐢

But also 𝑑𝑚

Can be considered as a hierarchical approach

20J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 20

Page 21: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

5. Multiple Kernel Learning (MKL)

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 21

Page 22: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

5. Multiple Kernel Learning (MKL)

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 22

Page 23: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

5. Multiple Kernel Learning (MKL)

+ -Anatomically/ functionally

relevant

Stability of the selected

regions

Used the full pattern to

build the model

Depends on atlas

Thresholded and sorted

list of regions according to

dm

Hierarchical view of the

model parameters

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 23

Page 24: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

6. Permutation test

Build weight maps from permutation tests (e.g. Mourão-

Miranda et al., 2005) and associate a p-value to each

weight.

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 24

Simulated data.

Image from Gaonkar

and Davatzikos,

2013.

Page 25: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

6. Permutation test

+ -Map can be thresholded Computationally expensive

Used the full pattern to

build the model

Need to correct for multiple

comparisons (cannot make

the hypothesis of

independent voxels/weights)

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 25

Note: analytic approximation of null distribution for SVM

(Gaonkar and Davatzikos, 2013).

Page 26: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

7. A posteriori knowledge: weight summarization

(Schrouff et al., 2013)

Build weight maps then summarize weights according to

anatomically defined regions (e.g. based on an atlas).

Mathematically:

𝑁𝑊𝑅𝑂𝐼 = 𝑣 ∈𝑅𝑂𝐼 𝑊𝑣#𝑣 ∈ 𝑅𝑂𝐼

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 26

Page 27: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

7. A posteriori knowledge: weight summarization

AAL atlas (Tzourio-Mazoyer,

2002).

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 27

Page 28: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

7. A posteriori knowledge: weight summarization

+ -Anatomically/ functionally

relevant

No thresholding of the sorted

list of regions

Used the full pattern to

build the model

Depends on atlas

Sorted list of regions

according to their

proportion of NWROI

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 28

Page 29: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

How about electrophysiological recordings?

- All strategies explained previously apply

- For MKL, can be applied on each dimension (i.e.

channels, time point, frequency band) or combination

(e.g. channel + frequency band)

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 29

Page 30: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

How about non-linear models?

- Statistical sensitivity maps (Rasmussen et al., 2011) for

SVM, kernel logistic regression and kernel Fisher

discriminant.

sensitivity map (linear) = (weight filter)2

- Use of non-linear kernels in neuroimaging? (Kragel et

al., 2012)

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 30

Page 31: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

Conclusions:

- Interpreting machine learning based models can be

complex.

- Different strategies exist, but always a trade-off between

different parameters:

• Thresholded list of voxels/regions contributing

• Anatomical/functional relevance

• Multivariate power

• Stability of the pattern

• Computational time

Your (informed) choice!

J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 31

Page 32: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

Acknowledgments:

- Belgian American Educational Foundation (BAEF)

- Medical Foundation of the Liège Rotary Club

- Stanford University

- University College London

- Wellcome Trust

Figures created using PRoNTo version 2.0

(www.mlnl.cs.ucl.ac.uk/pronto/).

32J. Schrouff

Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 32

Page 33: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

References:- Baldassarre, L., J. Mourao-Miranda, M. Pontil. «Structured Sparsity Models for

Brain Decoding from fMRI data.» Proceedings of the 2nd conference on Pattern

Recognition in NeuroImaging. 2012.

- Gaonkar, B., C. Davatzikos. «Analytic estimation of statistical significance maps

for support vector machine based multi-variate image analysis and

classification.» NeuroImage 78 (2013): 270-283.

- Etzel, J., A. Jeffrey, M. Zacks, T.S. Braver. «Searchlight analysis: promise,

pitfalls, and potential.» Neuroimage 78 (2013): 261-269.

- Filippone, M., A. Marquand, C. Blain, C. Williams, J. Mourao-Miranda, M.

Girolami. «Probabilistic prediction of neurological disorders with a statistical

assessement of neuroimaging data modalities.» Annals of Applied Statistics 6

(2012): 1883-1905.

- Haufe, S., F. Meinecke, K. Görgen, S. Dhäne, J.-D. Haynes, B. Blankertz, F.

Biessmann. «On the interpretation of weight vectors of linear models in

multivariate neuroimaging.» NeuroImage 87 (2014): 96-110.

- Kragel, P., R. Carter, S. Huettel. «What makes a pattern? Matching decoding

methods to data in multivariate pattern analysis.» Front Neurosci. 6 (2012): 162.

- Kriegeskorte, N., R. Goebel, P. Bandettini. «Information-based functional brain

mapping.» PNAS 103 (2006): 3863-3868.

- Mourão-Miranda, J., A. Bokde, C. Born, H. Hampel, M. Stetter. «Classifying brain

states and determining the discriminating activation patterns: Support Vector

Machine on functional MRI data.» NeuroImage 28 (2005): 980-995.

Page 34: Interpreting predictive models in terms of Materials/PR4NI_Schrouff_Jessica.pdf- Interpreting machine learning based models can be complex. - Different strategies exist, but always

References:- Rakotomamonjy, A., F. Bach, S. Canu, et Y. Grandvalet. «SimpleMKL.» Journal

of Machine Learning 9 (2008): 2491-2521.

- Rasmussen, P., K. Madsen, T. Lund, L.K. Hansen. «Visualization of nonlinear

kernel models in neuroimaging by sensitivity maps.» NeuroImage 55 (2011):

1120-1131.

- Schrouff, J., J. Cremers, G. Garraux, L. Baldassarre, J. Mourão-Miranda, et C.

Phillips. «Localizing and comparing weight maps generated from linear kernel

machine learning models.» Proceedings of the 3rd workshop on Pattern

Recognition in NeuroImaging. 2013.

- Tibshirani, R. «Regression shrinkage and selection via the lasso.» Journal of the

Royal Statistical Society. 58 (1996): 267-288.

- Tzourio-Mazoyer, N., et al. «Automated Anatomical Labeling of activations in

SPM using a Macroscopic Anatomical Parcellation of the MNI MRI single-subject

brain.» NeuroImage 15 (2002): 273-289.

- Varoquaux, G., A. Gramfort, B. Thirion. «Small-sample brain mapping: sparse

recovery on spatially correlated designs with randomization and clustering.»

Proceedings of ICML (2012). icml.cc/discuss/2012/688.html

- Zou, H., et T. Hastie. «Regularization and variable selection via the elastic net.»

J. R. Statist. Soc. B 67 (2005): 301-320.