analysis of stratified trials – challenging the “standard” methods devan v. mehrotra clinical...
Post on 19-Dec-2015
238 views
TRANSCRIPT
Analysis of Stratified Trials – Challenging the “Standard”
Methods
Devan V. MehrotraClinical Biostatistics
Department SeminarMerck Research Laboratories
Jan 10, 2008
2
Outline• Part I: binary response variable
> Mantel-Haenszel test> Minimum risk weights> Simulation results> Conclusions
• Part II: continuous non-normal response variable> Motivating example> Technical details> Simulation results> Conclusions
3
Part IAnalysis of Binary Data
4
Stratified Trials with Binary Endpoints
• 2 treatments (A and B), number of strata = sBinary response (responder/non-responder)
• pij = true (population) proportion for strat i, trt j
i = piA - piB = true difference for strat i
fi = true (population) relative frequency for strat i = true overall difference
• = observed proportion for strat i, trt jnij = observed number of subjects in strat i, trt j
ijp
i
iif
.ˆˆˆiBiAii ppiw , stratum to assigned weight
5
Hypothesis Testing: General Framework
Superiority or Non-Inferiority Trials
correction continuity term, sample finite ccai
trial) yinferiorit-non (for
trial) ysuperiorit (for
iiii
iii
w
iiii
iii
w
Vwa
ccwZ
Vwa
ccwZ
)ˆ(
ˆ
)ˆ(
ˆ
2
0
2
trial) yinferiorit-(non vs.
trial) ty(superiori vs.
0100
10
::
0:0:
HH
HH
? and for use to What :IMPORTANT cc,awV iii ,),ˆ(
6
Mantel-Haenszel Test (1959)Superiority Trials
1
5.0
1/
/
/
i iBiA
iBiA
iBiAiBiAi
iiBiAiBiA
iBiAiBiACMHi
nnnn
cc
nnnna
nnnn
nnnnw
s
iiii
iii
MH
Vwa
ccwZ
1
2
2
2
ˆ
)|ˆ(|
iBiA
iBiBiAiAi nn
pnpnp
ˆˆ
where ,iiiBiA
i ppnn
V
1
11
Note: MH test is optimal
is constant across strata.
iBiB
iAiA
pp
pp
1/
1/ if only and if
7
Choice of Variance
• Null variance [Miettinen & Nurminen 1985, Farrington & Manning, 1990]
m.l.e. of under the restriction
Note: MH test uses the null variance.
• Observed (OBS) variance
• Note: With 1:1 randomization, for superiority trials, and usually so (but not always) for non-inferiority trials.
i
iB
iBiB
iA
iAiAi V
n
pp
n
ppV ~~1~~1~
ˆ
ijp~ ijp0 iCiT pp
i
iB
iBiB
iA
iAiAi V
npp
npp
V ˆˆ1ˆˆ1ˆˆ
ii VV ~ˆ always is
8
(pA, pB) pairs where Null or Observed Variance is “Better”
Non-Inferiority Margin = 15%
EQUAL ALLOCATION, VR = VARIANCE RATIO (NULL:OBS) VR < 0.98 (N), 0.98 < VR < 1.02 (=), VR > 1.02 (O)
P_A (Test) ‚ 1.00 ˆ O O O O O O O O O O O ‚ 0.95 ˆ O O O O O O O O O O O ‚ 0.90 ˆ O O O O O O O O O O O ‚ 0.85 ˆ O O O O O O O O O O = ‚ 0.80 ˆ O O O O O O O O O = ‚ 0.75 ˆ O O O = = = = = = ‚ 0.70 ˆ O = = = = = = = ‚ 0.65 ˆ = = = = = = = ‚ 0.60 ˆ = = = = = = ‚ 0.55 ˆ = N = = = ‚ 0.50 ˆ N = = = ‚ 0.45 ˆ = = = ‚ 0.40 ˆ = = ‚ 0.35 ˆ = ‚ Šƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒ 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
P_B (Control) (Above, P_T minus P_C >= -0.15)
9
(pA, pB) pairs where Null or Observed Variance is “Better”
Non-Inferiority Margin = 5%
EQUAL ALLOCATION, VR = VARIANCE RATIO (NULL:OBS)
VR < 0.98 (N), 0.98 < VR < 1.02 (=), VR > 1.02 (O) P_A (Test) ‚ 1.00 ˆ O O O O O O O O O O O ‚ 0.95 ˆ O O O O O O O O O O = ‚ 0.90 ˆ O O O O O O O O O = ‚ 0.85 ˆ O O O O O O O = = ‚ 0.80 ˆ O O O O O = = = ‚ 0.75 ˆ O O O = = = = ‚ 0.70 ˆ O O = = = = ‚ 0.65 ˆ O = = = = ‚ 0.60 ˆ = = = = ‚ 0.55 ˆ = = = ‚ 0.50 ˆ = = ‚ 0.45 ˆ = ‚ Šƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒ 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
P_B (Control) (Above, P_T minus P_C >= -0.05)
10
Choice of Weights • Cochran-Mantel-Haenszel (CMH) weights
>> Estimator of is ~ unbiased.
• Minimum Risk (MR) weights [Mehrotra & Railkar, 2000]
>> Estimator of has smallest mean squared error.
>> If (optimal weights!)
constant is if iBiAi
iiBiAiBiA
iBiAiBiACMHi nnf
nnnn
nnnnw :ˆ
/
/
(2000) Railkar & Mehrotra see strata), two ( formula general For
:trial strata two for Formula
MRMRMR wwVVVV
VVfVw 121
21
1
2
211
21
1
12
11
2
2111
11 1,
ˆˆˆˆˆˆ
ˆˆˆˆˆˆ
ii
iMRii V
Vw 1
1
ˆˆ
ˆ constant,
11
Choice of Finite Sample Term
• With CMH weights (i.e., with MH test):
is used.
• With MR weights:
is recommended.1ia
1/ iBiAiBiAi nnnna
12
Choice of Continuity Correction
• With CMH weights:
is used by original MH test.
However, is a less conservative choice.
• With MR weights:
is recommended.
See Mehrotra & Railkar, Stats in Med, 2000
cc 0
1
163
i iBiA
iBiA
nn
nncc
1
5.0
i iBiA
iBiA
nnnn
cc
13
Motivating Example RevisitedTest for Superiority
Strat Vaccine A Vaccine B Diff . OR Null
iV~
Obs
iV
1 .771 (37/ 48)
.647 (44/ 68)
.124 1.835 .086 .084
2 .156 (5/ 32)
.000 (0/ 12)
.156 infinity .091 .064
Weights 2-tailed Method w1 w2 w w
ˆV p-value MH (original cc) .763 .267 .131 .069 .0882 MH (cc=0) .763 .267 .131 .069 .0573 MR (null variance) .400 .600 .143 .064 .0318* MR (obs variance) .400 .600 .143 .051 .0068*
* establishes superiority at 2-tailed = .05
MH = Mantel-Haenszel test; MR = test using minimum risk weights
14
Simulation ResultsTest for Superiority (2 strata)
Type I Error Rate (alpha = 5%) Strat 1 Strat 2 Method
p1A p1B p2A p2B N* MHorig MHcc=0 MRnull MRobs .83 .83 .37 .37 0 50 2.6 5.0 4.5 5.1 .83 .83 .37 .37 0 150 3.6 5.0 4.7 4.9 .83 .83 .37 .37 0 250 3.9 5.1 4.8 4.9 .83 .83 .37 .37 0 500 4.1 4.9 4.7 4.8
Power (%) .884 .83 .470 .37 .068a 500 75 77 77 77 .898 .83 .438 .37 .068b 500 76 78 81 81 .906 .83 .419 .37 .068c 500 77 79 83 83 .914 .83 .401 .37 .068d 500 78 80 84 84
* per treatment group
(f1 = .7, f2 = .3); No TxS interaction on (a) logit, (b) proportion, (c ) square root, and (d) log scales; 100,000 simulations.
15
Illustrative Example # 2Test for Non-Inferiority
vs. 05005.0:05.0: 010 .δHH
Stratum Test (A) Control (B) A – B Null Observed i iAp iBp
i iV~ iV
1 .891
(98/ 110) .891
(98/ 110) .000 .0427 .0420
2 .978
(88/ 90) .978
(88/ 90) .000 .0283 .0220
METHOD 1-tailed Weights_Variance 1w 2w
iiiw w ˆˆ )ˆ( wV p-value
CMH_NULL .55 .45 .000 .0267 .0307 CMH_OBS .55 .45 .000 .0251 .0234* MR_OBS .21 .79 .000 .0195 .0059*
* establishes non-inferiority at 1-tailed = .025
16
Simulation ResultsTest for Non-inferiority (2 strata)
Type I Error Rate
(nominal = .025) 0 N A
CMH_NULL
B CMH_OBS
C MR_OBS
.20 74 .026 .026 .023 .15 130 .025 .025 .025 .10 285 .024 .025 .024 .05 1130 .025 .025 .025
Results based on 100,000 simulations.
true
group treatment per size sample
00021
122111
,90.,70.
,,50.~
Hipppp
nNnnNfBnn
N
iCiTCC
TCTCT
17
Simulation Results: PowerTest for Non-inferiority (2 strata)
Power
0 N A CMH_NULL
B CMH_OBS
C MR_OBS
$$ saved C vs. A*
.20 74 .871 .886 .900 $80K .15 130 .870 .881 .900 $150K .10 285 .865 .871 .900 $330K .05 1130 .863 .865 .900 $1.42M
* Based on in N required to achieve 90% power with popular method A, and assuming $5,000 per subject. Results based on 100,000 simulations.
true
group treatment per size sample
121
122111
00,90.,70.
,,50.~
Hipppp
nNnnNfBnn
N
iCiTCC
TCTCT
18
For stratified trials with binary responses:
The popular Mantel-Haenszel test uses sample size (CMH) weights with null variances. It has good power properties if and only if the odds ratio is constant across strata.
Using minimum risk (MR) weights with observed (OBS) variances will usually provide notably more power than CMH weights with null variances for both superiority and non-inferiority trials.
Recommendation: consider MR_OBS as a default, but use simulations to quantify power differences between methods when planning a new trial.
Summary (Part I)
19
Part IIAnalysis of Continuous Data Using Ranks
20
Motivating ExampleHypothetical viral loads of HIV+ subjects (log10 copies/ml)
Stratum Placebo Vaccine
Females 3.90, 3.96 1.40, 2.802.90
Males 3.50, 3.503.56, 3.593.69, 3.854.06, 4.364.36, 4.434.68, 4.694.70, 4.855.06, 5.50
1.79, 2.322.54, 3.423.59, 3.894.64, 5.235.32
21
Motivating Example (continued)
• Observed viral load summaries (log10 copies/ml):
• Compared to placebo, the VLs for vaccine appear to be “shifted” to the left (i.e., are numerically smaller). Is the shift statistically significant?
Stratum Summary Placebo Vaccine
FemalesMean
MedianSDn
3.933.930.04
2
2.372.800.84
3
MalesMean
MedianSDn
4.274.360.6216
3.643.591.27
9
22
Motivating Example (continued)
Stratified rank-based analysis: SAS implementation
• PROC FREQ; TABLES gender * trt * vload/CMH SCORES=RANK;
RUN;
• PROC FREQ; TABLES gender * trt * vload/CMH SCORES=MODRIDIT;
RUN;
• PROC TWOSAMPL; [Part of PROC StatXact module]
WI/AS; PO trt; RE vload; ST gender;
RUN;
23
Motivating Example (continued)
• 2-tailed p-values using the three “methods”:
Different conclusions at =.05 … why?
• PROC FREQ> Ranks based on pooled sample within each stratum (“stratum-specific” ranks)> SCORES = RANK equal stratum weights
SCORES = MODRIDIT unequal stratum weights
• PROC TWOSAMPL: Ranks based on overall pooled sample, ignoring strata (“stratum-invariant” ranks), with equal stratum weights.
PROC FREQRANK PROC FREQMODRIDIT PROC TWOSAMPL
p = .1506 p = .0642 p = .0436*
24
Technical Details
Stratifi ed Rank-Based Tests
ijkY = response f or stratum i, treatment j, subject k
( ijnkjsi ,,1;2,1;,,1 )
Assumptions kiY 1 ~ i.i.d iyF [placebo]
kiY 2 ~ i.i.d iiyF [vaccine]
Ri is the fi xed eff ect of stratum i
i is the treatment eff ect (“shif t”) in stratum i
No T x S interaction ii (constant shif t)
iH i 0:0 vs. 0:1 iH f or at least one i
25
Technical Details (continued)
Let ijkR = rank of ijkY (stratum-specifi c OR stratum-invariant)
iii
iiii
obsHSVw
HSESwZ
02
0
|
|, p-value = ||2 obsZZP
1
11
in
kkii RS , iw weight f or stratum i
2
1 121
10|
j
n
kijk
ii
ii
ij
Rnn
nHSE
2
1 1
2
1
0
2121
210
|1
|j
n
k i
iijk
iiii
iii
ij
nHSE
Rnnnn
nnHSV
26
Technical Details (continued)
Three Popular Rank-Based Tests
Test Stratum weights
Comments
TEQ wi = 1 • PROC FREQ SCORES = RANK• Stratum-specific ranks
TvE wi = 1/(ni + 1) • PROC FREQ SCORES = MODRIDIT• van Elteren test (1960)• Stratum-specific ranks
wi = 1 • PROC TWOSAMPL• Stratum-invariant ranks
*EQT
21 iii nnn :Note
27
• If there is no true treatment by stratum interaction (i = for all i), the van Elteren test is optimal among all the stratified test, i.e., wi = 1/(ni + 1) are optimal weights.
• However, if interaction exists, the van Elteren test can suffer from a power loss.
• In general, is there an asymptotically optimal test (with optimal weights) that allows for interaction?
YES … we derived it , based on stratum-specific ranks.
Technical Details (continued)
)( optT
28
Technical Details (continued)
Weights needed f or optT :
)(,1
5.021, kijii
i
iopti YYPwith
nw
Since i is unknown, we studied a test based on
estimated optimal weights. i can be estimated as
kjiikijii nnYYI
,2121 )/()(
29
Technical Details (continued)
We studied two other published tests:
Aligned rank test ( alignT ) [Hodges and Lehmann, Annals of Stat, 1962]
Step 1: Calculate iijk
alignijk bYY , where ib is the
Hodges-Lehmann estimate of the stratum "location" (median of all pairwise means of the observations in stratum i) Step 2: Perf orm unstratifi ed Wilcoxon rank sum test using align
ijkY
30
Technical Details (continued)
Brunner's test ( BrunnerT ) [Brunner, Puri and Sun, J ASA, 1995]
Define overall treatment eff ect as:
2
1 1
21.2
21
2 )2
1()
21
ˆ(
s
i
s
i
iiiiis
nnRnp , where
)2
1(
1ˆ 2
.21
ii
ii
nR
np ,
2
12
12.2
in
jjiii RnR ( itjR is a stratum-specifi c rank)
Under 0:0 sH (equivalent to iH i 0:0 )
2
1
221.2
221 ~)
21
( s
s
i
iiiii
nnRnD
itn
j
itit
titjitj
t itiiit
itii
iii
nRRR
nnnn
nnn
nn 1
2.
)(2
12
21
21
21
2 ),4
1]
2
1[
)(
1(
1
where )(titjR is the rank within the stratum by treatment cell.
31
Technical Details (continued)
We also studied two versions of an adaptive test:
Let TxSp = p-value f or treatment by stratum (T x S) interaction, and ),ˆ( n = Spearman’s rank correlation between the (estimated) treatment eff ect ( ) and stratum size (n). Our proposed adaptive test:
.
,0),(10.0
alignadpt
EQadptTxS
TTotherwise
TTnandpIf
How to obtain the T x S interaction p-value ( TxSp )?
32
Technical Details (continued)
• Adaptive test 1adapT based on TxS test of Öhrvik [1999]:
Let ijkZ = rank of alignijkY (stratum-invariant rank)
Test statistic:
s
i jijij
NZn
NNQ
1
2
1
2.int )
21
()1(
12
where
s
i jijnN
1
2
1
and
ijn
kijkij ZZ
1.
21int ~ sQ under the hypothesis of no TxS interaction
P-value: int2
1 QPp sTxS
33
Technical Details (continued)
• Adaptive test 2adapT based on TxS test of Brunner et al.
[1995]:
Test statistic: ,)ˆ
)/1(
1ˆ(
1
1
2
12
1
22
s
i
s
j j
js
jj
ii
B
ppQ
where 2i and ip are as described f or Brunner's test
2
1~ KBQ under the hypothesis of no TxS interaction
P-value: BKTxS QPp 2
1
Note: The two adaptive tests have similar perf ormance in simulations, so 1adapadap TT f rom here on.
34
Estimate and 100(1-)% CI for Obtained by Inverting the Given Test
• Let
Let p(c) = 1-tailed p-value for test applied to
•
Obtained via a numerical search.
Technical Details (continued)
2
1~
jcY
jYcY
ijk
ijkijk
if
if
cYijk~
21)(
2)(
50.)(ˆ
cpc
cpc
cpc
U
L
which for )( limit Upper
which for )( limit Lower
which for )( estimate Point
35
TEQ .1505
[with stratum-invariant ranks] .0435*
TvE van Elteren test, .0643
TBrunner described on slide 29 .0250*
.0990
Tadap
.0654
Talign Aligned rank test .0654
11 ii nw
15.0ˆ iii nw optT
1iw*
EQT 1iw
Note: All methods except use stratum-specific ranks
*EQT
Motivating Example Revisited2-tailed p-values
alignadap
EQadapTxS
TTotherwise
TTnandpIf
,0),(10.0
36
Method: TEQ TvE Talign
p-value .1506 .0435 .0643 .0654
Estimate
.80 0.94 1.00 .84
95% CI(-0.28, 1.61)
(.01, 1.71) (-.04, 1.61) (-.09, 1.53)
Motivating Example RevisitedEstimates and 95% CIs for (selected methods)
*EQT
Stratum Summary
Placebo Vaccine
P - V
Females
Mediann
3.932
2.803
1.13
MalesMedian
n4.3616
3.599
0.77
37
• 2 treatments, 1:1 randomization per stratum
• Number of strata = 2, 4, 6, 8, 10, and 12
• Stratum size (ni): 10*i for stratum i
• Different choices of i:
– constant for each stratum (no TxS interaction)
– positively or negatively associated with stratum size (TxS interaction, with 50% power to detect it)
• Four different distributions for Y:
– Normal
– Log Normal
– Mixture of Normals: 0.9N(m,v) + 0.1N(m*,v*)
– t3
Simulation Study
38
Simulation ResultsType I Error Rate (nominal = 5%)
Normal Distribution
Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)
Number of Strata Test 2 4 6 8 10 12
EQT 4.6 4.7 4.8 5.0 4.7 5.3 *
EQT 5.0 4.8 4.7 5.0 5.0 4.9
vET 5.0 4.8 4.7 5.3 4.6 5.3
optT 4.7 4.1 4.8 4.4 4.5 4.7
alignT 5.5 5.3 5.0 5.7 5.1 5.3 1adapT 5.7 5.3 5.1 5.7 5.1 5.5
BrunnerT 11.3 10.7 12.1 11.7 11.1 10.9
39
Simulation ResultsType I Error Rate (nominal = 5%)
Lognormal Distribution
Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)
Number of Strata Test 2 4 6 8 10 12
EQT 4.1 4.9 5.0 5.2 5.3 5.3 *
EQT 4.9 4.8 5.5 5.3 5.4 4.6
vET 4.6 4.4 5.2 5.4 4.9 5.3
optT 4.4 3.8 5.0 4.6 5.0 5.0
alignT 5.0 5.1 5.4 5.4 5.0 5.5 1adapT 5.0 5.3 5.5 5.4 5.0 5.5
BrunnerT 11.0 11.3 12.6 12.1 11.6 11.0
40
Simulation Results Type I Error Rate (nominal = 5%)
Mixture of Normals Distribution
Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)
Number of Strata Test 2 4 6 8 10 12
EQT 4.3 4.8 5.2 5.1 4.9 5.0 *
EQT 4.5 4.9 5.4 4.8 4.9 4.7
vET 4.7 4.8 4.8 4.8 5.3 4.9
optT 4.9 4.1 4.4 4.0 4.5 5.2
alignT 5.3 5.2 5.5 5.2 5.1 5.0 1adapT 5.3 5.3 5.4 5.3 5.1 5.0
BrunnerT 11.1 11.2 11.4 11.1 10.7 11.4
41
Simulation ResultsType I Error Rate (nominal = 5%)
t3 Distribution
Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)
Number of Strata Test 2 4 6 8 10 12
EQT 3.9 4.7 4.4 5.0 5.3 4.8 *
EQT 4.4 4.7 4.6 4.7 4.6 5.1
vET 4.4 4.6 4.4 4.8 5.1 4.6
optT 4.3 3.6 4.5 4.7 4.9 5.1
alignT 4.9 5.0 4.8 4.8 5.3 5.0 1adapT 4.9 5.1 4.9 4.9 5.5 5.1
BrunnerT 11.4 11.2 11.9 12.0 11.6 11.7
42
Simulation Results: Power (%) No T x S interaction (constant
Normal Lognormal No. of Strata No. of Strata
Test 2 4 6 8 2 4 6 8 EQT 75.8 78.4 77.1 78.4 78.2 77.8 78.5 76.9 *
EQT 80.0 82.7 81.1 81.4 82.2 80.6 81.0 77.4
vET 79.0 82.1 81.5 82.5 81.5 81.3 83.2 81.2
optT 67.1 59.3 51.8 49.6 70.3 59.2 53.5 47.6
alignT 82.1 84.0 83.2 83.9 84.3 83.2 84.6 82.5
1adapT 82.4 84.2 83.4 84.1 84.6 83.3 84.9 82.6
Note: if there is no T x S interaction, optvE TT
43
Simulation Results: Power (%) No T x S interaction
Mixture of Normals t3 Distribution No. of Strata No. of Strata
Test 2 4 6 8 2 4 6 8 EQT 75.6 79.3 76.6 77.5 79.4 78.0 76.6 78.0 *
EQT 79.7 80.3 76.2 75.6 82.7 79.0 74.1 72.8
vET 79.1 82.5 80.4 82.4 82.5 81.9 80.9 82.8
optT 67.7 60.6 50.7 48.8 72.5 59.2 50.7 49.2
alignT 81.9 84.2 82.2 83.3 83.6 83.3 81.9 83.6
1adapT 82.7 84.4 82.3 83.4 84.1 83.5 82.0 83.6
Note: if there is no T x S interaction, optvE TT
44
Simulation Results: Power (%)Normal Distribution
adapalignoptvEEQEQ TTTTTT 65ˆ4321 *
0),( : nssociationPositive A 0),( : nssociationNegative A
50
60
70
80
90
Po
we
r (%
)
2 strata 4 strata 6 strata 8 strata
1
2
3
45
61
2
34
5
6
1
2
3
4
5
6
1
23
4
5
6
50
60
70
80
90
Po
we
r (%
)
2 strata 4 strata 6 strata 8 strata
1
234
56
1
2
3
4
56
1
2
3
4
56
1
2
3
4
56
45
Simulation Results: Power (%)Lognormal Distribution
adapalignoptvEEQEQ TTTTTT 65ˆ4321 *
0),( : nssociationPositive A 0),( : nssociationNegative A
50
60
70
80
90
Po
we
r (%
)
2 strata 4 strata 6 strata 8 strata
1
2
3
45
6
1
2
3
4
5
6
1
23
4
5
6
1
23
4
5
6
50
60
70
80
90
Po
we
r (%
)
2 strata 4 strata 6 strata 8 strata
1
234
56
1
2
3
4
56
1
2
3
4
56
1
2
3
4
56
46
Simulation Results: Power (%)Mixture of Normals
adapalignoptvEEQEQ TTTTTT 65ˆ4321 *
0),( : nssociationPositive A 0),( : nssociationNegative A
50
60
70
80
90
Po
we
r (%
)
2 strata 4 strata 6 strata 8 strata
1
2
3
4
5
61
2
3
4
5
6
1
23
4
5
6
1
2
3
4
5
6
50
60
70
80
90
Po
we
r (%
)
2 strata 4 strata 6 strata 8 strata
1
234
56
1
2
3
4
56
1
2
3
4
56
1
2
3
4
56
47
Simulation Results: Power (%)t3 Distribution
adapalignoptvEEQEQ TTTTTT 65ˆ4321 *
0),( : nssociationPositive A 0),( : nssociationNegative A
50
60
70
80
90
Po
we
r (%
)
2 strata 4 strata 6 strata 8 strata
1
2
34
5
61
23
4
5
61
2
3
4
5
6
1
2
3
4
5
6
50
60
70
80
90
Po
we
r (%
)
2 strata 4 strata 6 strata 8 strata
1
23
4
56
1
2
3
4
56
1
2
3
4
56
1
2
3
4
56
48
For rank-based analyses of stratified trials:
> No single method is uniformly the best
Recommendation: use the aligned rank test (Talign) or either of the proposed adaptive tests (Tadap1 or Tadap2). Both tests were more powerful than the van Elteren test (TvE) in every case studied, notably so when there was a true (but hard to detect) T x S interaction.
> It is time to retire the popular van Elteren test!
Conclusions (Part II)
49
• Brunner, E., Puri, M. L., and Sun, S. (1995). Nonparametric Methods for Stratified Two-Sample Designs with Application to Multiclinic Trials. Journal of American Statistical Association, 90, 1004-1014.
• Hodges, J. L. and Lehman, E. C. (1962). Rank Methods for Combination of Independent Experiments in the Analysis of Variance. Annals of Mathematical Statistics, 33, 482-497.
• Mehrotra, D.V. and Railkar, R. (2000). Minimum Risk Weights for Comparing Treatments in Stratified Binomial Trials. Statistics in Medicine, 19, 811-825.
• Wang, W., Mehrotra, D.V., Chan, I.S.F. and Heyse, J.F. (2006). Non-Inferiority /Equivalence Trials in Vaccine Development. Journal of Biopharmaceutical Statistics, 16, 429-441.
• Öhrvik, J. (1999). Aligned Ranks: A Method of Gaining Efficiency in Rank Tests. http://www.stat.fi/isi99/proceedings/arkisto/varasto/hrvi0423.pdf
• van Elteren, P. H. (1960). On the Combination of Independent Two Sample Tests of Wilcoxon. Bulletin of the Institute of International Statistics, 37, 351-361.
References