statistical inference and random field theory will penny spm short course, london, may 2003 will...

Statistical Inference and RandomStatistical Inference and RandomField TheoryField Theory

Statistical Inference and RandomStatistical Inference and RandomField TheoryField Theory

Will PennyWill PennySPM short course, London, May 2003SPM short course, London, May 2003

Will PennyWill PennySPM short course, London, May 2003SPM short course, London, May 2003

M.Brett et al. Introduction to Random FieldTheory, To appear in HBF, 2nd Edition.

realignment &motion

correction

smoothing

normalisation

General Linear Modelmodel fittingstatistic image

corrected p-values

image data parameterestimatesdesign

matrix

anatomicalreference

kernel

StatisticalParametric Map

Random Field Theory

OverviewOverviewOverviewOverview

1.1. TerminologyTerminology

2.2. TheoryTheory

3.3. Imaging DataImaging Data

4.4. Levels of InferenceLevels of Inference

5. 5. SPM ResultsSPM Results +FDR ?+FDR ?


2.2. TheoryTheory



5. 5. SPM ResultsSPM Results +FDR ?+FDR ?



2.2. TheoryTheory



5. 5. SPM ResultsSPM Results


2.2. TheoryTheory




Inference at a single voxelInference at a single voxelInference at a single voxelInference at a single voxel

= p(t>u|H)

NULL hypothesis, H: activation is zero

u=2t-distribution

p-value: probability of getting a value of t at least as extreme as u. If is small we reject the null hypothesis.

Sensitivity and SpecificitySensitivity and SpecificitySensitivity and SpecificitySensitivity and Specificity

H True (o) TN FP

H False (x) FN TP

Don’tReject

Reject

ACTION

TRUTH

o o o o o o o x x x o o x x x o x x x x

u1 u2

Sens=10/10=100%Spec=7/10=70%

At u1

Eg. t-scoresfrom regionsthat truly do and do not activate

Sens=7/10=70%Spec=9/10=90%

At u2

Sensitivity = TP/(TP+FN) = Specificity = TN/(TN+FP)= 1 - FP = Type I error or ‘error’FN = Type II error = p-value/FP rate/error rate/significance level = power

Inference at a single voxelInference at a single voxelInference at a single voxelInference at a single voxel

= p(t>u|H)

NULL hypothesis, H: activation is zero

u=2t-distribution

We can choose u to ensurea voxel-wise significance level of

his is called an ‘uncorrected’ p-value, forreasons we’ll see later.

We can then plot a map of above thresholdvoxels.

Inference for ImagesInference for ImagesInference for ImagesInference for Images

Signal

Signal+Noise

Noise

11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5%

Use of ‘uncorrected’ p-value, =0.1

Percentage of Null Pixels that are False Positives

Using an ‘uncorrected’ p-value of 0.1 will lead us to conclude on average that 10% of voxels are active when they are not.

This is clearly undesirable. To correct for this we can define a null hypothesis for images of statistics.

• FamilyFamily of hypotheses of hypotheses– HHk k k k = {1,…, = {1,…,KK}}

– HH = = HH11 HH22 … … HHkk HHKK

• FamilyFamily of hypotheses of hypotheses– HHk k k k = {1,…, = {1,…,KK}}

– HH = = HH11 HH22 … … HHkk HHKK

Family-wise Null HypothesisFamily-wise Null HypothesisFamily-wise Null HypothesisFamily-wise Null Hypothesis

FAMILY-WISE NULL HYPOTHESIS:Activation is zero everywhere

If we reject a voxel null hypothesisat any voxel, we reject the family-wiseNull hypothesis

A FP anywhere gives a FamilyWise Error (FWE)

Family-wise error rate = ‘corrected’ p-value

Use of ‘uncorrected’ p-value, =0.1

FWE

Use of ‘corrected’ p-value, =0.1

The Bonferroni correctionThe Bonferroni correctionThe Bonferroni correctionThe Bonferroni correction

Given a family of N independent voxels and a voxel-wise error rate v Given a family of N independent voxels and a voxel-wise error rate v

the Family-Wise Error rate (FWE) or ‘corrected’ error rate isthe Family-Wise Error rate (FWE) or ‘corrected’ error rate is

αα = 1 – (1-v) = 1 – (1-v)NN

~ Nv

Therefore, to ensure a particular FWE we choose

v = α / N

A Bonferroni correction is appropriate for independent tests

If v=0.05 then over100 voxels we’ll get5 voxel-wise type I errors. But we’ll get a much higherα. To ensure α=0.05we need v=0.0005 !

A correction for multiple comparisons

The Bonferroni correctionThe Bonferroni correctionThe Bonferroni correctionThe Bonferroni correction

Independent Voxels Spatially Correlated Voxels

Bonferroni is too conservative for brain images



2.2. TheoryTheory





2.2. TheoryTheory




Random Field TheoryRandom Field TheoryRandom Field TheoryRandom Field Theory

• Consider a statistic image as a lattice representation of a Consider a statistic image as a lattice representation of a continuous random fieldcontinuous random field

• Use results from continuous random field theoryUse results from continuous random field theory

• Consider a statistic image as a lattice representation of a Consider a statistic image as a lattice representation of a continuous random fieldcontinuous random field

• Use results from continuous random field theoryUse results from continuous random field theory

Lattice representation

Euler Characteristic (EC)Euler Characteristic (EC)Euler Characteristic (EC)Euler Characteristic (EC)

Topological measureTopological measure– threshold an image at threshold an image at uu

– excursion set excursion set uu

uu) == # blobs - # holes# blobs - # holes

- At high u, At high u, uu) == # blobs # blobs

Reject HReject HΩΩ if if Euler char non-zero Euler char non-zero

αα Pr( Pr(uu) > 0 ) > 0 )

Expected Euler char Expected Euler char pp–value–value

(at high u)(at high u)

αα EE[[uu)]]

Topological measureTopological measure– threshold an image at threshold an image at uu

– excursion set excursion set uu

uu) == # blobs - # holes# blobs - # holes

- At high u, At high u, uu) == # blobs # blobs

Reject HReject HΩΩ if if Euler char non-zero Euler char non-zero

αα Pr( Pr(uu) > 0 ) > 0 )

Expected Euler char Expected Euler char pp–value–value

(at high u)(at high u)

αα EE[[uu)]]

Example – 2D Gaussian imagesExample – 2D Gaussian imagesExample – 2D Gaussian imagesExample – 2D Gaussian images

αα = R (4 ln 2) (2 = R (4 ln 2) (2ππ) ) -3/2-3/2 u exp (-u u exp (-u22/2)/2)

Voxel-wise threshold, u

Number of Resolution Elements (RESELS), R

N=100x100 voxels, Smoothness FWHM=10, gives R=10x10=100

Example – 2D Gaussian imagesExample – 2D Gaussian imagesExample – 2D Gaussian imagesExample – 2D Gaussian images

αα = R (4 ln 2) (2 = R (4 ln 2) (2ππ) ) -3/2-3/2 u exp (-u u exp (-u22/2)/2)

For R=100 and α=0.05RFT gives u=3.8

Using R=100 in a Bonferroni correction gives u=3.3

Friston et al. (1991) J. Cer. Bl. Fl. M.

DevelopmentsDevelopmentsDevelopmentsDevelopments

Friston et al. (1991) J. Cer. Bl. Fl. M. (Not EC Method)

2D Gaussian fields

3D Gaussian fields

3D t-fieldsWorsley et al. (1992) J. Cer. Bl. Fl. M.

Worsley et al. (1993) Quant. Brain. Func.

Restricted search regionsRestricted search regionsRestricted search regionsRestricted search regions

Box has16 markers

Frame has 32 markers

Box and frame havesame number of voxels

• General form for expected Euler characteristicGeneral form for expected Euler characteristic• 22, , FF, & , & tt fields fields •• restricted search regions restricted search regions

αα = = R Rd d (()) d d ((uu))


αα = = R Rd d (()) d d ((uu))

Unified TheoryUnified TheoryUnified TheoryUnified Theory

Rd (): RESEL count; depends on

the search region – how big, how

smooth, what shape ?

d (): EC density; depends on

type of field (eg. Gaussian, t) and thethreshold, u.

Au

Worsley et al. (1996), HBM


αα = = R Rd d (()) d d ((uu))


αα = = R Rd d (()) d d ((uu))

Unified TheoryUnified TheoryUnified TheoryUnified Theory

Rd (): RESEL count

R0() = () Euler characteristic of

R1() = resel diameter

R2() = resel surface area

R3() = resel volume

d (u): d-dimensional EC density –

E.g. Gaussian RF:

0(u) = 1- (u)

1(u) = (4 ln2)1/2 exp(-u2/2) / (2)

2(u) = (4 ln2) exp(-u2/2) / (2)3/2

3(u) = (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2)2

4(u) = (4 ln2)2 (u3 -3u) exp(-u2/2) / (2)5/2

Au

Worsley et al. (1996), HBM

Resel Counts for Brain StructuresResel Counts for Brain StructuresResel Counts for Brain StructuresResel Counts for Brain Structures

FWHM=20mm (1) Threshold depends on Search Volume(2) Surface area makes a large contribution



2.2. TheoryTheory





2.2. TheoryTheory




Functional Imaging DataFunctional Imaging DataFunctional Imaging DataFunctional Imaging Data

• The Random Fields are the component fields,The Random Fields are the component fields,

Y = Xw +E, e=E/Y = Xw +E, e=E/σσ

• We can only We can only estimateestimate the component fields, using the component fields, using

estimates of w and estimates of w and σσ

• To apply RFT we need the RESEL count which To apply RFT we need the RESEL count which requires smoothness estimatesrequires smoothness estimates

• The Random Fields are the component fields,The Random Fields are the component fields,

Y = Xw +E, e=E/Y = Xw +E, e=E/σσ

• We can only We can only estimateestimate the component fields, using the component fields, using

estimates of w and estimates of w and σσ

• To apply RFT we need the RESEL count which To apply RFT we need the RESEL count which requires smoothness estimatesrequires smoothness estimates

Component fieldsComponent fieldsComponent fieldsComponent fields

= +Y X

componentfields

data matrixd

esig

n m

atri

xparameters += ?

voxelsvoxels

scansscans

errors ?

variance

Estimated component fieldsEstimated component fieldsEstimated component fieldsEstimated component fields

data matrix

des

ign

mat

rix

parameters errors+ ?= ?voxelsvoxels

scansscans

estimate

^

residuals

estimatedcomponent

fields

parameterestimates

estimated variance

=

Each row isan estimatedcomponent field

ze

ze

ye

ze

xe

ze

ye

ye

ye

xe

ze

xe

ye

xe

xe

var,cov,cov

,covvar,cov

,cov,covvar

Smoothness EstimationSmoothness EstimationSmoothness EstimationSmoothness Estimation

• Roughness Roughness ||||

• Point Response Function Point Response Function PRFPRF

• Roughness Roughness ||||

• Point Response Function Point Response Function PRFPRF

• Gaussian Gaussian PRFPRF

ffxx 0 0

ffyy00 0 ffzz

|||| = (4ln(2)) = (4ln(2))3/23/2 / (f / (fxx f fyy f fzz))

• RESEL COUNTRESEL COUNT

RR33(() = ) = (()) / (f / (fxx f fyy f fzz))

αα = R = R33(() (4ln(2))) (4ln(2))3/23/2 ( (u u 2 2 -1) exp(--1) exp(-u u 22/2) / (2/2) / (2))22

• Gaussian Gaussian PRFPRF

ffxx 0 0

ffyy00 0 ffzz

|||| = (4ln(2)) = (4ln(2))3/23/2 / (f / (fxx f fyy f fzz))

• RESEL COUNTRESEL COUNT

RR33(() = ) = (()) / (f / (fxx f fyy f fzz))

αα = R = R33(() (4ln(2))) (4ln(2))3/23/2 ( (u u 2 2 -1) exp(--1) exp(-u u 22/2) / (2/2) / (2))22

Approximate the peak of the Covariance function with a Gaussian

RFT AssumptionsRFT AssumptionsRFT AssumptionsRFT Assumptions

• Model fit & assumptionsModel fit & assumptions– valid distributional resultsvalid distributional results

• Multivariate normalityMultivariate normality– of of componentcomponent images images

• Covariance function of Covariance function of componentcomponent images must be images must be

- Stationary- Stationary (pre SPM99)(pre SPM99)

- - Can be nonstationaryCan be nonstationary

(SPM99 onwards)(SPM99 onwards)

- Twice differentiable- Twice differentiable

• Model fit & assumptionsModel fit & assumptions– valid distributional resultsvalid distributional results

• Multivariate normalityMultivariate normality– of of componentcomponent images images

• Covariance function of Covariance function of componentcomponent images must be images must be

- Stationary- Stationary (pre SPM99)(pre SPM99)

- - Can be nonstationaryCan be nonstationary

(SPM99 onwards)(SPM99 onwards)

- Twice differentiable- Twice differentiable

SmoothnessSmoothnesssmoothness » voxel sizesmoothness » voxel size

lattice approximationlattice approximation

smoothness estimationsmoothness estimation

practicallypracticallyFWHMFWHM 3 3 VoxDimVoxDim

otherwiseotherwiseconservativeconservative

Typical applied smoothing:Typical applied smoothing:

Single Subj fMRI: 6mmSingle Subj fMRI: 6mm

PET: 12mmPET: 12mm

Multi Subj fMRI: 8-12mmMulti Subj fMRI: 8-12mm

PET: 16mm PET: 16mm

SmoothnessSmoothnesssmoothness » voxel sizesmoothness » voxel size

lattice approximationlattice approximation

smoothness estimationsmoothness estimation

practicallypracticallyFWHMFWHM 3 3 VoxDimVoxDim

otherwiseotherwiseconservativeconservative

Typical applied smoothing:Typical applied smoothing:

Single Subj fMRI: 6mmSingle Subj fMRI: 6mm

PET: 12mmPET: 12mm

Multi Subj fMRI: 8-12mmMulti Subj fMRI: 8-12mm

PET: 16mm PET: 16mm



2.2. TheoryTheory





2.2. TheoryTheory




Cluster and Set-level InferenceCluster and Set-level InferenceCluster and Set-level InferenceCluster and Set-level Inference

• We can increase sensitivity by trading off anatomical specificityWe can increase sensitivity by trading off anatomical specificity

• Given a voxel level threshold u, we can computeGiven a voxel level threshold u, we can compute the likelihood (under the null hypothesis) of getting n or more connected the likelihood (under the null hypothesis) of getting n or more connected

components in the excursion set ie. a cluster containing at least n voxelscomponents in the excursion set ie. a cluster containing at least n voxels

CLUSTER-LEVEL INFERENCECLUSTER-LEVEL INFERENCE

• Similarly, we can compute the likelihood of getting cSimilarly, we can compute the likelihood of getting c clusters each having at least n voxelsclusters each having at least n voxels

SET-LEVEL INFERENCESET-LEVEL INFERENCE

• We can increase sensitivity by trading off anatomical specificityWe can increase sensitivity by trading off anatomical specificity

• Given a voxel level threshold u, we can computeGiven a voxel level threshold u, we can compute the likelihood (under the null hypothesis) of getting n or more connected the likelihood (under the null hypothesis) of getting n or more connected

components in the excursion set ie. a cluster containing at least n voxelscomponents in the excursion set ie. a cluster containing at least n voxels

CLUSTER-LEVEL INFERENCECLUSTER-LEVEL INFERENCE

• Similarly, we can compute the likelihood of getting cSimilarly, we can compute the likelihood of getting c clusters each having at least n voxelsclusters each having at least n voxels

SET-LEVEL INFERENCESET-LEVEL INFERENCE

Weak vs Strong control over FWE

Levels of inferenceLevels of inferenceLevels of inferenceLevels of inference

set-levelset-levelP(c P(c 3 | n 3 | n 12, u 12, u 3.09) = 3.09) =

0.0190.019

cluster-levelcluster-levelP(c P(c 1 | n 1 | n 82, t 82, t 3.09) = 0.029 (corrected) 3.09) = 0.029 (corrected)

n=82n=82

n=32n=32

n=1n=122

voxel-levelvoxel-levelP(c P(c 1 | n > 0, t 1 | n > 0, t 4.37) = 0.048 (corrected) 4.37) = 0.048 (corrected)

At least onecluster withunspecifiednumber of voxels abovethreshold

At least one cluster with at least 82 voxels above threshold

At least 3 clusters abovethreshold



2.2. TheoryTheory





2.2. TheoryTheory




SPM99 results ISPM99 results ISPM99 results ISPM99 results I

ActivationsSignificant atCluster levelBut not atVoxel Level

SPM99 results IISPM99 results IISPM99 results IISPM99 results II

Activations Significant atVoxel andCluster level

SPM results...SPM results...SPM results...SPM results...

False Discovery RateFalse Discovery RateFalse Discovery RateFalse Discovery Rate

H True (o) TN FP

H False (x) FN TP

Don’tReject

Reject

ACTION

TRUTH

o o o o o o o x x x o o x x x o x x x x

u1 u2

FDR=3/13=23%=3/10=30%

At u1

Eg. t-scoresfrom regionsthat truly do and do not activate

FDR=1/8=13%=1/10=10%

At u2

FDR = FP/(FP+TP)

= FP/(FP+TN)

False Discovery RateFalse Discovery RateIllustration:Illustration:

False Discovery RateFalse Discovery RateIllustration:Illustration:

Signal

Signal+Noise

Noise

FWE

Control of Familywise Error Rate at 10%

Occurrence of Familywise Error

6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7%

Control of False Discovery Rate at 10%

Percentage of Activated Pixels that are False Positives

SummarySummarySummarySummary

• We should correct for multiple comparisonsWe should correct for multiple comparisons

• We can use Random Field Theory (RFT)We can use Random Field Theory (RFT)

• RFT requires (i) a good lattice approximation to underlying RFT requires (i) a good lattice approximation to underlying multivariate Gaussian fields, (ii) that these fields are continuous multivariate Gaussian fields, (ii) that these fields are continuous with a twice differentiable correlation functionwith a twice differentiable correlation function

• To a first approximation, RFT is a Bonferroni correction using To a first approximation, RFT is a Bonferroni correction using RESELS.RESELS.

• We only need to correct for the volume of interest.We only need to correct for the volume of interest.

• Depending on nature of signal we can trade-off anatomical Depending on nature of signal we can trade-off anatomical specificity for signal sensitivity with the use of cluster-level specificity for signal sensitivity with the use of cluster-level inference. inference.

• We should correct for multiple comparisonsWe should correct for multiple comparisons

• We can use Random Field Theory (RFT)We can use Random Field Theory (RFT)

• RFT requires (i) a good lattice approximation to underlying RFT requires (i) a good lattice approximation to underlying multivariate Gaussian fields, (ii) that these fields are continuous multivariate Gaussian fields, (ii) that these fields are continuous with a twice differentiable correlation functionwith a twice differentiable correlation function

• To a first approximation, RFT is a Bonferroni correction using To a first approximation, RFT is a Bonferroni correction using RESELS.RESELS.

• We only need to correct for the volume of interest.We only need to correct for the volume of interest.

• Depending on nature of signal we can trade-off anatomical Depending on nature of signal we can trade-off anatomical specificity for signal sensitivity with the use of cluster-level specificity for signal sensitivity with the use of cluster-level inference. inference.

statistical inference and random field theory will penny spm short course, london, may 2003 will...

Documents