interpreting predictive models in terms of materials/pr4ni_schrouff_jessica.pdf- interpreting...
TRANSCRIPT
Pattern Recognition for NeuroImaging (PR4NI)
Interpreting predictive
models in terms of
anatomically labelled regions
J. Schrouff
Stanford University, CA, USA
J. Schrouff
Stanford University, CA, USA
Data
Feature
extraction/selection
Model
specification
Accuracy and
significance
Model
interpretation
Pattern Recognition for NeuroImaging (PR4NI) 2
Where in the brain is the
information of interest?
i.e. Which brain areas
contribute most to the
model’s prediction?
Data
Feature
extraction/selection
Model
specification
Accuracy and
significance
Model
interpretation
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 3
Computing “weight maps”:
- From linear (kernel) methods
- Attributing one weight per feature
- For an SVM model:
𝐰 =
𝑖=1
𝑛
𝑦𝑖𝛼𝑖∗ 𝐱𝐢
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 4
Thresholding weight maps:
Can we choose to look only at the largest contributions?
- Linear combination 𝑓 x𝒊 = w, 𝐱𝐢 + 𝑏- As in an election
- Each vote counts
110 1 1 8 1 1 1 1
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 5
Interpreting weight maps:
What do the model parameters represent?
- Voxel 1: 𝑠 𝑛 + 𝑑(𝑛)- Voxel 2: 𝑑(𝑛)
𝑤(1) 𝑤(2)Weights:
Voxel 1 Voxel 2
Haufe et al., 2014: weight patterns versus weight filters
(forward versus backward model)
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 6
1 -1
Different strategies:
1. A priori knowledge: ROI selection
2. Searchlight
3. Feature selection
4. Sparse algorithms
5. Multiple Kernel Learning (MKL)
6. Permutation tests
7. A posteriori knowledge: weight summarization
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 7
Different strategies:
1. A priori knowledge: ROI
selection
2. Searchlight
3. Feature selection
4. Sparse algorithms
5. Multiple Kernel Learning
(MKL)
6. Permutation tests
7. A posteriori knowledge:
weight summarization
8
Data
Feature
extraction/selection
Model
specification
Accuracy and
significance
Model
interpretation
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 8
1. A priori knowledge: ROI selection
e.g. Choosing voxels/features based on literature or on
previous univariate results (independent dataset).
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 9
1. A priori knowledge: ROI selection
+ -Good accuracy? Need an anatomical
hypothesis
Anatomically/ functionally
relevant
Cannot threshold obtained
map
Arbitrary choice of region
number and size
Multiple models = multiple
comparisons
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 10
2. Searchlight (Kriegeskorte et al., 2006)
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 11
2. Searchlight (Kriegeskorte et al., 2006)
+ -Threshold for each
neighborhood
Not anatomically/functionally
relevant (spheres)
Not a weight filter Locally multivariate
Arbitrary choice of
neighborhood size
Multiple models = multiple
comparisons
Time-consuming
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 12
Review: Etzel et al., 2013
3. Feature selection
Rank the features according to a specific criterion and
select a subset.
Example:
- Univariate t-test
- Recursive Feature Elimination (RFE, De Martino, 2008)
based on SVM weights
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 13
3. Feature selection
+ -Data-driven Not anatomically/functionally
relevant (voxel sparse)
Time-consuming: nested CV
Depends on selected
technique: flaws in feature
selection are reflected in
model. user choice must
be informed.
Different strategies = different
patterns with same accuracy.
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 14
4. Sparse algorithms
Algorithms with regularization constraints such that the
weight of some features is perfectly null.
Examples:
- LASSO (Tibshirani, 1996)
- Elastic-net (Zou and Hastie, 2005)
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 15
4. Sparse algorithms
+ -Data-driven Not anatomically/functionally
relevant (voxel sparse)
Embedded in model Stability of selected voxels
(Baldassarre, et al., 2012):
different regularization terms
= different patterns for same
accuracy.
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 16
4. Sparse algorithms
Note: stability selection (Varoquaux et al., 2012, Rondina
et al., 2014).
+ -Data-driven Not anatomically/functionally
relevant
Embedded in model Stability of selected voxels
(Baldassarre, et al., 2012):
different regularization terms
= different patterns for same
accuracy.
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 17
5. Multiple Kernel Learning (MKL)
Build one kernel per anatomically defined region as input
of a (sparse) MKL algorithm (Filippone, 2012).
𝐾 𝒙, 𝒙′ = 𝑚=1𝑀 𝑑𝑚𝐾𝑚 (𝒙, 𝒙
′)
with 𝑑𝑚 ≥ 0, 𝑚=1𝑀 𝑑𝑚 = 1
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 18
5. Multiple Kernel Learning (MKL)
Decision function of an MKL problem:
𝑓 x𝒊 = 𝑚 w𝒎, 𝐱𝐢 + 𝑏
Considered algorithm (Rakotomamonjy et al., 2008):
min1
2
𝑚
1
𝑑𝑚w𝑚
2 + 𝐶
𝑖
𝜉𝑖
𝑠. 𝑡. 𝑦𝑖
𝑚
𝐰𝒎, 𝐱𝐢 + 𝑏 ≥ 1 − 𝜉𝑖 ∀𝑖
𝜉𝑖 ≥ 0 ∀𝑖,
𝑚
𝑑𝑚 = 1, 𝑑𝑚 ≥ 0 ∀𝑚
19J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 19
5. Multiple Kernel Learning (MKL)
Weight map of an MKL model:
𝐰𝒎 = 𝑑𝑚
𝑖=1
𝑛
𝑦𝑖𝛼𝑖 𝐱𝐢
But also 𝑑𝑚
Can be considered as a hierarchical approach
20J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 20
5. Multiple Kernel Learning (MKL)
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 21
5. Multiple Kernel Learning (MKL)
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 22
5. Multiple Kernel Learning (MKL)
+ -Anatomically/ functionally
relevant
Stability of the selected
regions
Used the full pattern to
build the model
Depends on atlas
Thresholded and sorted
list of regions according to
dm
Hierarchical view of the
model parameters
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 23
6. Permutation test
Build weight maps from permutation tests (e.g. Mourão-
Miranda et al., 2005) and associate a p-value to each
weight.
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 24
Simulated data.
Image from Gaonkar
and Davatzikos,
2013.
6. Permutation test
+ -Map can be thresholded Computationally expensive
Used the full pattern to
build the model
Need to correct for multiple
comparisons (cannot make
the hypothesis of
independent voxels/weights)
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 25
Note: analytic approximation of null distribution for SVM
(Gaonkar and Davatzikos, 2013).
7. A posteriori knowledge: weight summarization
(Schrouff et al., 2013)
Build weight maps then summarize weights according to
anatomically defined regions (e.g. based on an atlas).
Mathematically:
𝑁𝑊𝑅𝑂𝐼 = 𝑣 ∈𝑅𝑂𝐼 𝑊𝑣#𝑣 ∈ 𝑅𝑂𝐼
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 26
7. A posteriori knowledge: weight summarization
AAL atlas (Tzourio-Mazoyer,
2002).
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 27
7. A posteriori knowledge: weight summarization
+ -Anatomically/ functionally
relevant
No thresholding of the sorted
list of regions
Used the full pattern to
build the model
Depends on atlas
Sorted list of regions
according to their
proportion of NWROI
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 28
How about electrophysiological recordings?
- All strategies explained previously apply
- For MKL, can be applied on each dimension (i.e.
channels, time point, frequency band) or combination
(e.g. channel + frequency band)
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 29
How about non-linear models?
- Statistical sensitivity maps (Rasmussen et al., 2011) for
SVM, kernel logistic regression and kernel Fisher
discriminant.
sensitivity map (linear) = (weight filter)2
- Use of non-linear kernels in neuroimaging? (Kragel et
al., 2012)
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 30
Conclusions:
- Interpreting machine learning based models can be
complex.
- Different strategies exist, but always a trade-off between
different parameters:
• Thresholded list of voxels/regions contributing
• Anatomical/functional relevance
• Multivariate power
• Stability of the pattern
• Computational time
Your (informed) choice!
J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 31
Acknowledgments:
- Belgian American Educational Foundation (BAEF)
- Medical Foundation of the Liège Rotary Club
- Stanford University
- University College London
- Wellcome Trust
Figures created using PRoNTo version 2.0
(www.mlnl.cs.ucl.ac.uk/pronto/).
32J. Schrouff
Stanford University, CA, USAPattern Recognition for NeuroImaging (PR4NI) 32
References:- Baldassarre, L., J. Mourao-Miranda, M. Pontil. «Structured Sparsity Models for
Brain Decoding from fMRI data.» Proceedings of the 2nd conference on Pattern
Recognition in NeuroImaging. 2012.
- Gaonkar, B., C. Davatzikos. «Analytic estimation of statistical significance maps
for support vector machine based multi-variate image analysis and
classification.» NeuroImage 78 (2013): 270-283.
- Etzel, J., A. Jeffrey, M. Zacks, T.S. Braver. «Searchlight analysis: promise,
pitfalls, and potential.» Neuroimage 78 (2013): 261-269.
- Filippone, M., A. Marquand, C. Blain, C. Williams, J. Mourao-Miranda, M.
Girolami. «Probabilistic prediction of neurological disorders with a statistical
assessement of neuroimaging data modalities.» Annals of Applied Statistics 6
(2012): 1883-1905.
- Haufe, S., F. Meinecke, K. Görgen, S. Dhäne, J.-D. Haynes, B. Blankertz, F.
Biessmann. «On the interpretation of weight vectors of linear models in
multivariate neuroimaging.» NeuroImage 87 (2014): 96-110.
- Kragel, P., R. Carter, S. Huettel. «What makes a pattern? Matching decoding
methods to data in multivariate pattern analysis.» Front Neurosci. 6 (2012): 162.
- Kriegeskorte, N., R. Goebel, P. Bandettini. «Information-based functional brain
mapping.» PNAS 103 (2006): 3863-3868.
- Mourão-Miranda, J., A. Bokde, C. Born, H. Hampel, M. Stetter. «Classifying brain
states and determining the discriminating activation patterns: Support Vector
Machine on functional MRI data.» NeuroImage 28 (2005): 980-995.
References:- Rakotomamonjy, A., F. Bach, S. Canu, et Y. Grandvalet. «SimpleMKL.» Journal
of Machine Learning 9 (2008): 2491-2521.
- Rasmussen, P., K. Madsen, T. Lund, L.K. Hansen. «Visualization of nonlinear
kernel models in neuroimaging by sensitivity maps.» NeuroImage 55 (2011):
1120-1131.
- Schrouff, J., J. Cremers, G. Garraux, L. Baldassarre, J. Mourão-Miranda, et C.
Phillips. «Localizing and comparing weight maps generated from linear kernel
machine learning models.» Proceedings of the 3rd workshop on Pattern
Recognition in NeuroImaging. 2013.
- Tibshirani, R. «Regression shrinkage and selection via the lasso.» Journal of the
Royal Statistical Society. 58 (1996): 267-288.
- Tzourio-Mazoyer, N., et al. «Automated Anatomical Labeling of activations in
SPM using a Macroscopic Anatomical Parcellation of the MNI MRI single-subject
brain.» NeuroImage 15 (2002): 273-289.
- Varoquaux, G., A. Gramfort, B. Thirion. «Small-sample brain mapping: sparse
recovery on spatially correlated designs with randomization and clustering.»
Proceedings of ICML (2012). icml.cc/discuss/2012/688.html
- Zou, H., et T. Hastie. «Regularization and variable selection via the elastic net.»
J. R. Statist. Soc. B 67 (2005): 301-320.