statistical inference and random field theory will penny spm short course, london, may 2003 will...
TRANSCRIPT
Statistical Inference and RandomStatistical Inference and RandomField TheoryField Theory
Statistical Inference and RandomStatistical Inference and RandomField TheoryField Theory
Will PennyWill PennySPM short course, London, May 2003SPM short course, London, May 2003
Will PennyWill PennySPM short course, London, May 2003SPM short course, London, May 2003
M.Brett et al. Introduction to Random FieldTheory, To appear in HBF, 2nd Edition.
realignment &motion
correction
smoothing
normalisation
General Linear Modelmodel fittingstatistic image
corrected p-values
image data parameterestimatesdesign
matrix
anatomicalreference
kernel
StatisticalParametric Map
Random Field Theory
OverviewOverviewOverviewOverview
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results +FDR ?+FDR ?
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results +FDR ?+FDR ?
OverviewOverviewOverviewOverview
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
Inference at a single voxelInference at a single voxelInference at a single voxelInference at a single voxel
= p(t>u|H)
NULL hypothesis, H: activation is zero
u=2t-distribution
p-value: probability of getting a value of t at least as extreme as u. If is small we reject the null hypothesis.
Sensitivity and SpecificitySensitivity and SpecificitySensitivity and SpecificitySensitivity and Specificity
H True (o) TN FP
H False (x) FN TP
Don’tReject
Reject
ACTION
TRUTH
o o o o o o o x x x o o x x x o x x x x
u1 u2
Sens=10/10=100%Spec=7/10=70%
At u1
Eg. t-scoresfrom regionsthat truly do and do not activate
Sens=7/10=70%Spec=9/10=90%
At u2
Sensitivity = TP/(TP+FN) = Specificity = TN/(TN+FP)= 1 - FP = Type I error or ‘error’FN = Type II error = p-value/FP rate/error rate/significance level = power
Inference at a single voxelInference at a single voxelInference at a single voxelInference at a single voxel
= p(t>u|H)
NULL hypothesis, H: activation is zero
u=2t-distribution
We can choose u to ensurea voxel-wise significance level of
his is called an ‘uncorrected’ p-value, forreasons we’ll see later.
We can then plot a map of above thresholdvoxels.
Inference for ImagesInference for ImagesInference for ImagesInference for Images
Signal
Signal+Noise
Noise
11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5%
Use of ‘uncorrected’ p-value, =0.1
Percentage of Null Pixels that are False Positives
Using an ‘uncorrected’ p-value of 0.1 will lead us to conclude on average that 10% of voxels are active when they are not.
This is clearly undesirable. To correct for this we can define a null hypothesis for images of statistics.
• FamilyFamily of hypotheses of hypotheses– HHk k k k = {1,…, = {1,…,KK}}
– HH = = HH11 HH22 … … HHkk HHKK
• FamilyFamily of hypotheses of hypotheses– HHk k k k = {1,…, = {1,…,KK}}
– HH = = HH11 HH22 … … HHkk HHKK
Family-wise Null HypothesisFamily-wise Null HypothesisFamily-wise Null HypothesisFamily-wise Null Hypothesis
FAMILY-WISE NULL HYPOTHESIS:Activation is zero everywhere
If we reject a voxel null hypothesisat any voxel, we reject the family-wiseNull hypothesis
A FP anywhere gives a FamilyWise Error (FWE)
Family-wise error rate = ‘corrected’ p-value
Use of ‘uncorrected’ p-value, =0.1
FWE
Use of ‘corrected’ p-value, =0.1
The Bonferroni correctionThe Bonferroni correctionThe Bonferroni correctionThe Bonferroni correction
Given a family of N independent voxels and a voxel-wise error rate v Given a family of N independent voxels and a voxel-wise error rate v
the Family-Wise Error rate (FWE) or ‘corrected’ error rate isthe Family-Wise Error rate (FWE) or ‘corrected’ error rate is
αα = 1 – (1-v) = 1 – (1-v)NN
~ Nv
Therefore, to ensure a particular FWE we choose
v = α / N
A Bonferroni correction is appropriate for independent tests
If v=0.05 then over100 voxels we’ll get5 voxel-wise type I errors. But we’ll get a much higherα. To ensure α=0.05we need v=0.0005 !
A correction for multiple comparisons
The Bonferroni correctionThe Bonferroni correctionThe Bonferroni correctionThe Bonferroni correction
Independent Voxels Spatially Correlated Voxels
Bonferroni is too conservative for brain images
OverviewOverviewOverviewOverview
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
Random Field TheoryRandom Field TheoryRandom Field TheoryRandom Field Theory
• Consider a statistic image as a lattice representation of a Consider a statistic image as a lattice representation of a continuous random fieldcontinuous random field
• Use results from continuous random field theoryUse results from continuous random field theory
• Consider a statistic image as a lattice representation of a Consider a statistic image as a lattice representation of a continuous random fieldcontinuous random field
• Use results from continuous random field theoryUse results from continuous random field theory
Lattice representation
Euler Characteristic (EC)Euler Characteristic (EC)Euler Characteristic (EC)Euler Characteristic (EC)
Topological measureTopological measure– threshold an image at threshold an image at uu
– excursion set excursion set uu
uu) == # blobs - # holes# blobs - # holes
- At high u, At high u, uu) == # blobs # blobs
Reject HReject HΩΩ if if Euler char non-zero Euler char non-zero
αα Pr( Pr(uu) > 0 ) > 0 )
Expected Euler char Expected Euler char pp–value–value
(at high u)(at high u)
αα EE[[uu)]]
Topological measureTopological measure– threshold an image at threshold an image at uu
– excursion set excursion set uu
uu) == # blobs - # holes# blobs - # holes
- At high u, At high u, uu) == # blobs # blobs
Reject HReject HΩΩ if if Euler char non-zero Euler char non-zero
αα Pr( Pr(uu) > 0 ) > 0 )
Expected Euler char Expected Euler char pp–value–value
(at high u)(at high u)
αα EE[[uu)]]
Example – 2D Gaussian imagesExample – 2D Gaussian imagesExample – 2D Gaussian imagesExample – 2D Gaussian images
αα = R (4 ln 2) (2 = R (4 ln 2) (2ππ) ) -3/2-3/2 u exp (-u u exp (-u22/2)/2)
Voxel-wise threshold, u
Number of Resolution Elements (RESELS), R
N=100x100 voxels, Smoothness FWHM=10, gives R=10x10=100
Example – 2D Gaussian imagesExample – 2D Gaussian imagesExample – 2D Gaussian imagesExample – 2D Gaussian images
αα = R (4 ln 2) (2 = R (4 ln 2) (2ππ) ) -3/2-3/2 u exp (-u u exp (-u22/2)/2)
For R=100 and α=0.05RFT gives u=3.8
Using R=100 in a Bonferroni correction gives u=3.3
Friston et al. (1991) J. Cer. Bl. Fl. M.
DevelopmentsDevelopmentsDevelopmentsDevelopments
Friston et al. (1991) J. Cer. Bl. Fl. M. (Not EC Method)
2D Gaussian fields
3D Gaussian fields
3D t-fieldsWorsley et al. (1992) J. Cer. Bl. Fl. M.
Worsley et al. (1993) Quant. Brain. Func.
Restricted search regionsRestricted search regionsRestricted search regionsRestricted search regions
Box has16 markers
Frame has 32 markers
Box and frame havesame number of voxels
• General form for expected Euler characteristicGeneral form for expected Euler characteristic• 22, , FF, & , & tt fields fields •• restricted search regions restricted search regions
αα = = R Rd d (()) d d ((uu))
• General form for expected Euler characteristicGeneral form for expected Euler characteristic• 22, , FF, & , & tt fields fields •• restricted search regions restricted search regions
αα = = R Rd d (()) d d ((uu))
Unified TheoryUnified TheoryUnified TheoryUnified Theory
Rd (): RESEL count; depends on
the search region – how big, how
smooth, what shape ?
d (): EC density; depends on
type of field (eg. Gaussian, t) and thethreshold, u.
Au
Worsley et al. (1996), HBM
• General form for expected Euler characteristicGeneral form for expected Euler characteristic• 22, , FF, & , & tt fields fields •• restricted search regions restricted search regions
αα = = R Rd d (()) d d ((uu))
• General form for expected Euler characteristicGeneral form for expected Euler characteristic• 22, , FF, & , & tt fields fields •• restricted search regions restricted search regions
αα = = R Rd d (()) d d ((uu))
Unified TheoryUnified TheoryUnified TheoryUnified Theory
Rd (): RESEL count
R0() = () Euler characteristic of
R1() = resel diameter
R2() = resel surface area
R3() = resel volume
d (u): d-dimensional EC density –
E.g. Gaussian RF:
0(u) = 1- (u)
1(u) = (4 ln2)1/2 exp(-u2/2) / (2)
2(u) = (4 ln2) exp(-u2/2) / (2)3/2
3(u) = (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2)2
4(u) = (4 ln2)2 (u3 -3u) exp(-u2/2) / (2)5/2
Au
Worsley et al. (1996), HBM
Resel Counts for Brain StructuresResel Counts for Brain StructuresResel Counts for Brain StructuresResel Counts for Brain Structures
FWHM=20mm (1) Threshold depends on Search Volume(2) Surface area makes a large contribution
OverviewOverviewOverviewOverview
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
Functional Imaging DataFunctional Imaging DataFunctional Imaging DataFunctional Imaging Data
• The Random Fields are the component fields,The Random Fields are the component fields,
Y = Xw +E, e=E/Y = Xw +E, e=E/σσ
• We can only We can only estimateestimate the component fields, using the component fields, using
estimates of w and estimates of w and σσ
• To apply RFT we need the RESEL count which To apply RFT we need the RESEL count which requires smoothness estimatesrequires smoothness estimates
• The Random Fields are the component fields,The Random Fields are the component fields,
Y = Xw +E, e=E/Y = Xw +E, e=E/σσ
• We can only We can only estimateestimate the component fields, using the component fields, using
estimates of w and estimates of w and σσ
• To apply RFT we need the RESEL count which To apply RFT we need the RESEL count which requires smoothness estimatesrequires smoothness estimates
Component fieldsComponent fieldsComponent fieldsComponent fields
= +Y X
componentfields
data matrixd
esig
n m
atri
xparameters += ?
voxelsvoxels
scansscans
errors ?
variance
Estimated component fieldsEstimated component fieldsEstimated component fieldsEstimated component fields
data matrix
des
ign
mat
rix
parameters errors+ ?= ?voxelsvoxels
scansscans
estimate
^
residuals
estimatedcomponent
fields
parameterestimates
estimated variance
=
Each row isan estimatedcomponent field
ze
ze
ye
ze
xe
ze
ye
ye
ye
xe
ze
xe
ye
xe
xe
var,cov,cov
,covvar,cov
,cov,covvar
Smoothness EstimationSmoothness EstimationSmoothness EstimationSmoothness Estimation
• Roughness Roughness ||||
• Point Response Function Point Response Function PRFPRF
• Roughness Roughness ||||
• Point Response Function Point Response Function PRFPRF
• Gaussian Gaussian PRFPRF
ffxx 0 0
ffyy00 0 ffzz
|||| = (4ln(2)) = (4ln(2))3/23/2 / (f / (fxx f fyy f fzz))
• RESEL COUNTRESEL COUNT
RR33(() = ) = (()) / (f / (fxx f fyy f fzz))
αα = R = R33(() (4ln(2))) (4ln(2))3/23/2 ( (u u 2 2 -1) exp(--1) exp(-u u 22/2) / (2/2) / (2))22
• Gaussian Gaussian PRFPRF
ffxx 0 0
ffyy00 0 ffzz
|||| = (4ln(2)) = (4ln(2))3/23/2 / (f / (fxx f fyy f fzz))
• RESEL COUNTRESEL COUNT
RR33(() = ) = (()) / (f / (fxx f fyy f fzz))
αα = R = R33(() (4ln(2))) (4ln(2))3/23/2 ( (u u 2 2 -1) exp(--1) exp(-u u 22/2) / (2/2) / (2))22
Approximate the peak of the Covariance function with a Gaussian
RFT AssumptionsRFT AssumptionsRFT AssumptionsRFT Assumptions
• Model fit & assumptionsModel fit & assumptions– valid distributional resultsvalid distributional results
• Multivariate normalityMultivariate normality– of of componentcomponent images images
• Covariance function of Covariance function of componentcomponent images must be images must be
- Stationary- Stationary (pre SPM99)(pre SPM99)
- - Can be nonstationaryCan be nonstationary
(SPM99 onwards)(SPM99 onwards)
- Twice differentiable- Twice differentiable
• Model fit & assumptionsModel fit & assumptions– valid distributional resultsvalid distributional results
• Multivariate normalityMultivariate normality– of of componentcomponent images images
• Covariance function of Covariance function of componentcomponent images must be images must be
- Stationary- Stationary (pre SPM99)(pre SPM99)
- - Can be nonstationaryCan be nonstationary
(SPM99 onwards)(SPM99 onwards)
- Twice differentiable- Twice differentiable
SmoothnessSmoothnesssmoothness » voxel sizesmoothness » voxel size
lattice approximationlattice approximation
smoothness estimationsmoothness estimation
practicallypracticallyFWHMFWHM 3 3 VoxDimVoxDim
otherwiseotherwiseconservativeconservative
Typical applied smoothing:Typical applied smoothing:
Single Subj fMRI: 6mmSingle Subj fMRI: 6mm
PET: 12mmPET: 12mm
Multi Subj fMRI: 8-12mmMulti Subj fMRI: 8-12mm
PET: 16mm PET: 16mm
SmoothnessSmoothnesssmoothness » voxel sizesmoothness » voxel size
lattice approximationlattice approximation
smoothness estimationsmoothness estimation
practicallypracticallyFWHMFWHM 3 3 VoxDimVoxDim
otherwiseotherwiseconservativeconservative
Typical applied smoothing:Typical applied smoothing:
Single Subj fMRI: 6mmSingle Subj fMRI: 6mm
PET: 12mmPET: 12mm
Multi Subj fMRI: 8-12mmMulti Subj fMRI: 8-12mm
PET: 16mm PET: 16mm
OverviewOverviewOverviewOverview
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
Cluster and Set-level InferenceCluster and Set-level InferenceCluster and Set-level InferenceCluster and Set-level Inference
• We can increase sensitivity by trading off anatomical specificityWe can increase sensitivity by trading off anatomical specificity
• Given a voxel level threshold u, we can computeGiven a voxel level threshold u, we can compute the likelihood (under the null hypothesis) of getting n or more connected the likelihood (under the null hypothesis) of getting n or more connected
components in the excursion set ie. a cluster containing at least n voxelscomponents in the excursion set ie. a cluster containing at least n voxels
CLUSTER-LEVEL INFERENCECLUSTER-LEVEL INFERENCE
• Similarly, we can compute the likelihood of getting cSimilarly, we can compute the likelihood of getting c clusters each having at least n voxelsclusters each having at least n voxels
SET-LEVEL INFERENCESET-LEVEL INFERENCE
• We can increase sensitivity by trading off anatomical specificityWe can increase sensitivity by trading off anatomical specificity
• Given a voxel level threshold u, we can computeGiven a voxel level threshold u, we can compute the likelihood (under the null hypothesis) of getting n or more connected the likelihood (under the null hypothesis) of getting n or more connected
components in the excursion set ie. a cluster containing at least n voxelscomponents in the excursion set ie. a cluster containing at least n voxels
CLUSTER-LEVEL INFERENCECLUSTER-LEVEL INFERENCE
• Similarly, we can compute the likelihood of getting cSimilarly, we can compute the likelihood of getting c clusters each having at least n voxelsclusters each having at least n voxels
SET-LEVEL INFERENCESET-LEVEL INFERENCE
Weak vs Strong control over FWE
Levels of inferenceLevels of inferenceLevels of inferenceLevels of inference
set-levelset-levelP(c P(c 3 | n 3 | n 12, u 12, u 3.09) = 3.09) =
0.0190.019
cluster-levelcluster-levelP(c P(c 1 | n 1 | n 82, t 82, t 3.09) = 0.029 (corrected) 3.09) = 0.029 (corrected)
n=82n=82
n=32n=32
n=1n=122
voxel-levelvoxel-levelP(c P(c 1 | n > 0, t 1 | n > 0, t 4.37) = 0.048 (corrected) 4.37) = 0.048 (corrected)
At least onecluster withunspecifiednumber of voxels abovethreshold
At least one cluster with at least 82 voxels above threshold
At least 3 clusters abovethreshold
OverviewOverviewOverviewOverview
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
1.1. TerminologyTerminology
2.2. TheoryTheory
3.3. Imaging DataImaging Data
4.4. Levels of InferenceLevels of Inference
5. 5. SPM ResultsSPM Results
SPM99 results ISPM99 results ISPM99 results ISPM99 results I
ActivationsSignificant atCluster levelBut not atVoxel Level
SPM99 results IISPM99 results IISPM99 results IISPM99 results II
Activations Significant atVoxel andCluster level
SPM results...SPM results...SPM results...SPM results...
False Discovery RateFalse Discovery RateFalse Discovery RateFalse Discovery Rate
H True (o) TN FP
H False (x) FN TP
Don’tReject
Reject
ACTION
TRUTH
o o o o o o o x x x o o x x x o x x x x
u1 u2
FDR=3/13=23%=3/10=30%
At u1
Eg. t-scoresfrom regionsthat truly do and do not activate
FDR=1/8=13%=1/10=10%
At u2
FDR = FP/(FP+TP)
= FP/(FP+TN)
False Discovery RateFalse Discovery RateIllustration:Illustration:
False Discovery RateFalse Discovery RateIllustration:Illustration:
Signal
Signal+Noise
Noise
FWE
Control of Familywise Error Rate at 10%
Occurrence of Familywise Error
6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7%
Control of False Discovery Rate at 10%
Percentage of Activated Pixels that are False Positives
SummarySummarySummarySummary
• We should correct for multiple comparisonsWe should correct for multiple comparisons
• We can use Random Field Theory (RFT)We can use Random Field Theory (RFT)
• RFT requires (i) a good lattice approximation to underlying RFT requires (i) a good lattice approximation to underlying multivariate Gaussian fields, (ii) that these fields are continuous multivariate Gaussian fields, (ii) that these fields are continuous with a twice differentiable correlation functionwith a twice differentiable correlation function
• To a first approximation, RFT is a Bonferroni correction using To a first approximation, RFT is a Bonferroni correction using RESELS.RESELS.
• We only need to correct for the volume of interest.We only need to correct for the volume of interest.
• Depending on nature of signal we can trade-off anatomical Depending on nature of signal we can trade-off anatomical specificity for signal sensitivity with the use of cluster-level specificity for signal sensitivity with the use of cluster-level inference. inference.
• We should correct for multiple comparisonsWe should correct for multiple comparisons
• We can use Random Field Theory (RFT)We can use Random Field Theory (RFT)
• RFT requires (i) a good lattice approximation to underlying RFT requires (i) a good lattice approximation to underlying multivariate Gaussian fields, (ii) that these fields are continuous multivariate Gaussian fields, (ii) that these fields are continuous with a twice differentiable correlation functionwith a twice differentiable correlation function
• To a first approximation, RFT is a Bonferroni correction using To a first approximation, RFT is a Bonferroni correction using RESELS.RESELS.
• We only need to correct for the volume of interest.We only need to correct for the volume of interest.
• Depending on nature of signal we can trade-off anatomical Depending on nature of signal we can trade-off anatomical specificity for signal sensitivity with the use of cluster-level specificity for signal sensitivity with the use of cluster-level inference. inference.