2014 eugm - blinded adaptations, permutations tests and t tests

Blinded Adaptations, Permutation Tests & T-Tests

Michael Proschan (NIAID)

Introduction

• Joint work with Ekkehard Glimm and Martin Posch 2014, Stat. in Med. online

• See also Posch & Proschan 2012, Stat. in Med. 31, 4146-4153

Introduction

• Clinical trials are pre-meditated!• We pre-specify everything

– Superiority/noninferiority– Population (inclusion/exclusion criteria)– Primary endpoint– Secondary endpoints– Analysis methods– Sample size/power

Introduction

• Changes made after seeing data are rightly questioned: are investigators trying to get an unfair advantage?– Changing primary endpoint because another

endpoint has a bigger treatment effect– Increasing sample size because the p-value is close– Changing primary analysis because “assumptions

are violated”– Changing population because of promising

subgroup results

Introduction

• What’s the harm? 0.05 is arbitrary anyway• Problem: if unlimited freedom to change

anything, the real error rate could be huge• Reminiscent of Bible code controversy

– Clairvoyant messages such as “Bin Laden” and “twin towers” by skipping letters in Old Testament

– Similar messages can be found by skipping letters in any large book (Brendan McKay)

Introduction

• But changes made before unblinding are different

• Under strong null hypothesis that treatment has NO effect, blinded data give no info about treatment effect– Impossible to cheat even if it seems like cheating

• E.g., even if blinded data show bimodal distribution, it is not caused by treatment if strong null is true

Permutation Tests

• Permutation tests condition on all data other than treatment labels

• Under strong null, (D,Z ) are independent, where Z are ±1 treatment indicators & D are data – Observed data D would have been observed

regardless of the treatment given– It is as if we observed D FIRST, then made the

treatment assignments Z

Permutation Tests

• Peaking at data changes nothing because permutation tests already condition on D

• Conditional distribution of test statistic T(Z,Y) given D is that of T(Z,y) where y is fixed

• Distribution of Z depends on randomization method – Simple– Permuted block, etc.

T T C C C T C T C C T T C T T C

4 8 4 0 1 3 0 4 4 0 2 5 0 2 1 0

T-C T-C T-C T-C

Overall T-C

4.0 3.0 1.5 1.5

2.5

Permutation Tests

T C C T C T C T T T C C C T C T

4 8 4 0 1 3 0 4 4 0 2 5 0 2 1 0

T-C T-C T-C T-C

Overall T-C

-4.0 3.0 -1.5 0.5

-0.5

Permutation Tests

11

Rerandomization Distribution

T-C Mean

Fre

qu

en

cy

-3 -2 -1 0 1 2 3

02

04

06

08

01

00

Permutation Distribution

Blinded 2-Stage Procedures

• Blinded 2-stage adaptive procedures use 1st stage to make design changes– Sample size (Gould, 1992, Stat. in Med. 11, 55-66;

Gould & Shih, 1992 Commun. in Stat. 21, 2833-2853) – Primary endpoint (e.g., diastolic versus systolic blood

pressure)• Previous argument shows that if adaptation is

made before unblinding, a permutation test on 1st stage data is still valid


• Careful! Subtle errors are possible• E.g., in adaptive regression, which of the

following is (are) valid?1. From ANCOVAs Y=β01+βz+βixi, i=1,…,k, pick xi that

minimizes MSE; do permutation test on winner2. From ANCOVAs Y=β01+βixi, i=1,…,k, pick xi that

minimizes MSE; do permutation test on Y=β01+βz+β*x*, where x* is winner


• Unblinding and apparent α-inflation also possible if strong null is false

• E.g., change primary endpoint based on “blinded” data (X,Y1,Y2), Y1 and Y2 are potential primaries and X=level of study drug in blood– X completely unblinds– Can then pick Y1 or Y2 with biggest z-score– Clearly inflates α– Problem: strong null requires no effect on ANY variable

examined (including X=level of study drug)


• Claim: the following procedure is valid– After viewing 1st stage data D1, choose test

statistic T1(Y1,Z1) and second stage data to collect

– After observing D2, choose T2(Y2,Z2) and method of combining T1 and T2, f(T1,T2)

– Conditional distribution of f(T1,T2) given (D1,D2) is its stratified permutation distribution

– Stratified permutation test controls conditional, & therefore unconditional type I error rate

Focus of Rest of Talk

• Permutation tests are asymptotically equivalent to t-tests

• Suggests that adaptive t-tests might be valid if adaptive permutation tests are

• We consider connections between permutation and t-tests, and validity of adaptive t-tests from adaptive permutation tests

One-Sample Case

• Community randomized trials sometimes pair match & randomize within pairs

• E.g., COMMIT trial used community intervention to help people quit smoking—11 matched pairs

• D=difference in quit rates between treatment (T) & control (C)

T C D=T-CPair i 0.30 0.25 +0.05

One-Sample Case

• Community randomized trials sometimes pair match & randomize within pairs

• E.g., COMMIT trial used community intervention to help people quit smoking—11 matched pairs

• D=difference in quit rates between treatment (T) & control (C)

C T D=T-CPair i 0.30 0.25 -0.05

One-Sample Case

• Permuting labels changes only sign of D• Permutation test conditions on |Di|= di

+;

-di+ and di

+ are equally likely

• The permutation distribution of Di is dist. of

21 w.p.1

21 w.p.1 where,

/

/ZdZ iii

One-Sample Case

• In 1st stage, adapt based on |D1|,…,|Dn| (blinded)– E.g., increase stage 2 sample size because |Di| is very

large

• What is conditional distribution of 1st stage sum ΣDi given |D1|=d1

+,…,|Dn|= dn+ and the adaptation?

– The adaptation is a function of |D1|,…,|Dn|

– The null distribution of ΣDi given |D1|=d1+,…,|Dn|= dn

+ IS its permutation distribution

– Conclusion: permutation test on stage 1 data still valid

One-Sample Case

• Mean and variance of permutation distribution are

222 )(var

0)(E

iiiii

iiii

dZEddZ

ZEddZ

One-Sample Case

• Asymptotically, permutation distribution is normal with this mean and variance (Lindeberg-Feller CLT)

• I.e., conditional distribution of Di given |D1|=d1+,

…,|Dn|= dn+ is asymptotically N(0,di

2)

• Depends on |D1|=d1+,…,|Dn|= dn

+ only through

L2=di2

One-Sample Case

• Asymptotically, permutation distribution of

• Like t-test with variance estimate s02 instead of

usual sample variance s2

n

LDns

ns

DT

Nd

dN

D

DT

ii

i

i

i

i

222

020

2

2

2

)/1( ;'

)1,0(,0

'

One-Sample Case

• Recap: Permutation distribution of T’ is dist of

• Conclusion: T’ is asymptotically indep of L2

22

2

12

on dependt doesn' )1,0(

given '

|||,...,| given '

i

i

n

i

i

DLN

DT

DDD

DT

One-Sample Case

• Begs question, is this true for all sample sizes under normality assumption?

• if Di are iid N(0,2), then can

• Seems crazy, but it’s true!

? oft independen be ' 2

2 i

i

i DD

DT

One-Sample Case

• One way to see that T’ is independent of Di2 uses

Basu’s theorem: • Recall S is sufficient for θ if F(y|s) does not depend

on θ; it is complete if E{g(S)}=0 for all θ implies g(S)≡0 with probability 1

• A is ancillary if its distribution does not depend on θ• Basu, 1955, Sankhya 15, 377-380:

If S is a complete, sufficient statistic and A is ancillary, then S and A are independent

One-Sample Case

• Consider Di iid N(0,2) with 2 unknown

–Di2 is complete and sufficient

– T’= Di/(Di2)1/2 is ancillary because it is scale-

invariant

– By Basu’s theorem, T’ and Di2 are independent

One-Sample Case

• Same argument shows that the usual t-statistic is independent of Di

2

• Under Di iid N(0,2) with 2 unknown

– Di2 is complete and sufficient

– Usual t-statistic T= Di/(ns2)1/2 is ancillary

– By Basu’s theorem, T and Di2 are independent

( Shao (2003): Mathematical Statistics, Springer)

One-Sample Case

• This result is important for adaptive sample size calculations– Stage 1 with n1= half of original sample size: change

second stage sample size to n2=n2(ΣDi2)

– Conditioned on ΣDi2:

• Test statistic T1 has exact t-distribution with n1-1 d.f.

• Test statistic T2 has exact t-distribution with n2-1 d.f. and is independent of T1

• P-values P1 and P2 are independent U(0,1)

• Y={n11/2Φ-1(P1)+n2

1/2Φ-1(P2)}/(n1+n2)1/2 is N(0,1) under H0

One-Sample Case

• Reject if Y>zα

• Conditioned on ΣDi2, type I error rate is α

• Unconditional type I error rate is α as well• Most other two-stage procedures are only

approximate

One-Sample Case

• Could even make other adaptations like changing primary endpoint

• Look at ΣDi2 for each endpoint and determine which one

is primary – E.g., pick endpoint with smallest Di

2

• Slight generalization of our result shows that conditional distribution of T given adaptation is still exact t

One-Sample Case

• Shows that conditional type I error rate given adaptation is controlled at level α

• Unconditional type I error rate must also be controlled at level α

• Derivation assumes multivariate normality with variance/covariance not depending on mean

Two-Sample Case

• Can use same reasoning in 2-sample setting • With equal sample sizes, the numerator is

• Permutation distribution is distribution of

• Let sL2 be “lumped” variance of all data (treatment

and control)

iiC

iT

i YZYY

0 ,1 each , iiii ZZyZ

Two-Sample Case

• Mean and variance of permutation distribution are

• Basu’s theorem shows usual 2-sample T is independent of sL

2 under null hypothesis of common mean

• Conditional distribution of T given sL2 is still t

22

)(1

1var

0)(EE

Lii

iiii

syyn

yZ

ZyyZ

Two-Sample Case

• Two-stage procedure– Stage 1: look at lumped variance and change stage 2

sample size– Conditioned on 1st stage lumped variance & H0

• T1 has t-distribution with n1-2 d.f.

• T2 has t-distribution with n2-2 d.f. & independent of T1

• P-values P1 and P2 are independent uniforms

• {n11/2Φ-1(P1)+n2

1/2Φ-1(P2)}/(n1+n2)1/2 is N(0,1) under H0

– Controls type I error rate conditionally and unconditionally

Summary

• Permutation tests are often valid even in adaptive settings if blind is maintained

• There is a close connection between permutation tests and t-tests

• Can deduce validity of adaptive t-tests from validity of adaptive permutation tests

2014 eugm - blinded adaptations, permutations tests and t tests

Health & Medicine

blinded data

data d

stage data

data changes

permutation tests4840

treatment indicators

stage procedures

treatment labels