sct2013 boston,randomizationmetricsposter,d6.2

1
Quantitative metrics clearly distinguished advantages and disadvantages between methods Results confirmed that there are consistent trade-offs between efficiency and unpredictability over all methods Simulations for studies & their expected populations can: take some of the guesswork out of selecting a particular randomization method. Allow better risk management for potential biases and impacts on analysis Measures of randomness & predictability were complementary, Potential Selection Bias (Predictability) is customized to what a participating clinician might know Syntropy (Randomness) provides a universal randomness measure Randomization methods generally are designed to be both unpredictable and balanced between treatment allocations overall and within strata. However, when planning studies, little consideration is given to measuring these characteristics, nor are they examined jointly, and published comparisons between methods often use incompatible metrics and simulation assumptions. Furthermore, for purposes of real-world planning, such simulations often make unrealistic assumptions (e.g., equal sized strata), and summary statistics give limited information. In order to better reflect real-world study performance, we carried out a series of simulations with 2 treatment arms, and stratification factors that are unequally populated (e.g., 1:2, 1:2:3, or a power law distribution). To measure predictability, we modified the potential selection bias (Efron, Blackwell-Hodges) in which an observer guesses the next treatment to be the one that previously occurred least in the strata containing the subject (i.e., limiting the observer’s knowledge to individual strata, such as site). This reflects a game theory model of randomization pitting observers versus statistician, and is easy to calculate and interpret. To measure imbalances, we calculated efficiency loss using Atkinson’s method because: The main impact of imbalances on the outcomes of a study is a loss of statistical power; Even if treatments are balanced overall, imbalances within sall strata can have a disproportionate impact on efficiency; And it is easy to interpret as lost sample size. We applied these methods to evaluate the performance of several popular and novel randomization methods for a variety of parameters, including methods based on permuted blocks, dynamic allocation/ minimization, and urn designs. Simulation results were summarized with the median and confidence intervals to estimate best & worst-case scenarios as well as expected performance. Results showed consistent trade-offs between efficiency and unpredictability over methods and parameters, supporting no ‘best’ method to optimize both. Randomization Metrics: Jointly assessing predictability and efficiency loss in randomization designs Abstract Discussion Atkinson, AC. (2003) The distribution of loss in two-treatment biased-coin designs. Biostatistics, 2003, 4, 2, pp. 179–193 Blackwell, D. and J.Hodges Jr (1957). Design for the control of selection bias. Ann Math Stat 28, 449-460 Wikipedia contributors. "Entropy (information theory)." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 23 Apr. 2013. Web. 14 May. 2013. Lebowitsch, J, et al, (2012). “Generalized multidimensional dynamic allocation method”. Statistics in Medicine,2012; Bibliography Questions: What is the best method for randomizing a clinical trial? What is the best method for randomizing this kind of clinical trial? What is the best method for randomizing this design of clinical trial in this population of patients? Objective: Develop a set of quantifiable metrics To evaluate performance of covariate-adjusted randomization methods and their parameters For specific study designs In realistic study populations Objectives Dennis E. Sweitzer, Ph.D Principal Biostatistician Simulation Strategy: Simulate 200 subjects for each of 500 simulations Assign each subject values of covariates: {Sex, Age, Site} Randomize each set of subjects by each randomization method Evaluate each randomization schedule using proposed metrics for subsets of: 25, 50, 100, 200 Summarize metrics across simulations with medians and 80% Confidence Intervals Plot metrics Simulated Study Populations: Stratify by Age and Sex Use Age & Sex for covariate-adjusted Randomization Site as an additional covariate included in Analysis model, but not in randomization NB: Age, Sex, and Site are proxies for real-world covariates – which rarely are distributed evenly. Simulation Methods Comparing Methods and their parameters (CR) Complete Randomization – also use as comparator (PB) Stratified Permuted Block: 1:1, 2:2, 3:3, 4:4, 6:6, 8:8 Treatments are assigned in blocks such as for 2:2: {TTPP, TPTP, TPPT, …} Dynamic Allocation / Minimization, weighted to: (DAS) Stratified: Balance on strata (DAM) Marginal: Balance on marginal balances (DAE) Equal: Balance with equal weight on strata, marginal and overall balances. Treatments are assigned to best reduce weighted sum of imbalances within stratification factors, eg, Randomizing young male male to PLA might worsen the imbalance within males, but improve it in the Young. Choice of weights will determine the treatment 2 nd Best Probabilities of: {0, 0.15, 0.25, 0.35, 0.5} This is the probability of assigning a treatment which worsens the imbalance. This parameter adds randomness to the algorithm Randomization Methods Sex (2:1) Female Male Age Group (3:2:1) 67% 33% Mid. Aged 50% 33% , M 17% , M Young 33% 22% , Y 11% , Y Older 16% 11% , O 5% , O Site ( 1 : ½ : ) a b c 55% 27% 18% Over all Ages: Pla Test Pla Test Pla Test Over both sexes Pla Test Pla Test Pla Test Pla Test Pla Test Males Females 18-35 yo 35-65 yo >65 yo Pla Test Pla Test Pla Test Pla Test Marginal Imbalances within age groups, ignoring sex Marginal Imbalances within sexes, ignoring age groups Imbalances Within Strata: All combinations of Age x Sex Overall Imbalance over all subjects Potential Selection Bias, limited to the knowledge available to an observer (Blackwell-Hodges, 1957) Observer always “guesses” to restore balance Example: For treatment sequence “TPPP” Initial guess Expectation = ½ “T” Imbalance =+1 Guess “P” Correct “TP” Imbalance=0 Guess either Expectation=½ “TCC” Imbalance=-1 Guess “T” Wrong “TCCC” # Correct= ½ + 1+ ½ +0 =2 Score = #Correct - #Expected = 2 - 2 = 0 Use: Potential Selection Bias (Strata): Based only on imbalances observed within Strata Also: Potential Selection Bias (Site): Based only on imbalances observed within sites Customize this metric to reflect the information available to study personnel --typically only at their site. Measuring Predictability For an analysis model: z treatment allocation α treatment effect X Covariates / Design Matrix β Covariate effects The variability of the treatment estimate is: And is the Loss of Efficiency, or loss of Statistical power (Atkinson, 2003) LOE # Subjects “Wasted” by sub-optimal treatment assignments In a Designed Experiment, one can select z and X to minimize LOE In a Randomized Controlled Trial, can only assign treatments z, but have very little control over X (which is the subjects arriving in the study) Use: Model is: Treatment + Sex + Age + Sex*Age + Site Measuring Impact on Analysis LOE = z t X( X t X ) 1 X t z ! ! = !! + !! Var ( ˆ α ) = σ 2 z t z z t X( X t X) 1 X t z Simulation results are summarized as Medians + 80% Confidence intervals Meaning: 10% of simulations were beyond the upper limit, and 10% were beyond the lower limit. Y-axis: Measures of Randomness: (Syntropy or Potential Selection Bias) X-axis: Loss of efficiency. Note on Graphic Results 0% 5% 10% 15% 20% 25% 0 1 2 3 4 5 6 Poten&al Selec&on Bias Loss of Efficiency Simula&on results as 80% Confidence Intervals DA(0), Margin Balance PB(1:1) DA(0), Margin Balance PB(1:1) Potential Selection Bias vs LOE All methods, one Graph Clear trade off between LOE and Potential Selection Bias Results 1 2 3 4 5 6 7 8 0.00 0.05 0.10 0.15 0.20 0.25 N= 200 CR PB1 PB2 PB3 PB4 PB6 PB8 DAS0 DAS15 DAS25 DAS35 DAS50 DAE0 DAE15 DAE25 DAE35 DAE50 DAM0 DAM15 DAM25 DAM35 DAM50 1 2 3 4 5 6 7 8 0.00 0.05 0.10 0.15 0.20 0.25 N= 25 CR PB1 PB2 PB3 PB4 PB6 PB8 DAS0 DAS15 DAS25 DAS35 DAS50 DAE0 DAE15 DAE25 DAE35 DAE50 DAM0 DAM15 DAM25 DAM35 DAM50 1 2 3 4 5 6 7 8 0.0 0.2 0.4 0.6 0.8 N= 25 Syntropy CR PB1 PB2 PB3 PB4 PB6 PB8 DAS0 DAS15 DAS25 DAS35 DAS50 DAE0 DAE15 DAE25 DAE35 DAE50 DAM0 DAM15 DAM25 DAM35 DAM50 1 2 3 4 5 6 7 8 0.0 0.2 0.4 0.6 0.8 N= 200 Syntropy CR PB1 PB2 PB3 PB4 PB6 PB8 DAS0 DAS15 DAS25 DAS35 DAS50 DAE0 DAE15 DAE25 DAE35 DAE50 DAM0 DAM15 DAM25 DAM35 DAM50 Syntropy vs LOE Clear trade off between LOE and Potential Selection Bias Potential Selection Bias vs LOE All methods become more efficient, Some become more predictable DAM becomes less predictable Results, Increasing N 1 2 3 4 5 6 7 8 0.00 0.05 0.10 0.15 0.20 0.25 Permuted Block, 1:1, 3:3, 8:8 + CR n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 1 2 3 4 5 6 7 8 0.00 0.05 0.10 0.15 0.20 0.25 DA (Strata), phi=0, 0.15, 0.35, 0.5 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 1 2 3 4 5 6 7 8 0.00 0.05 0.10 0.15 0.20 0.25 DA (Equal Wts), phi=0, 0.15, 0.35 0.5 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 1 2 3 4 5 6 7 8 0.00 0.05 0.10 0.15 0.20 0.25 DA (Margin), phi=0, 0.35 0.5 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 n= 25 n= 50 n= 100 n= 200 (Shannon) Entropy: From information theory: Syntropy: Rescale Entropy to [0,1] so that: Where: p i = Pr{Treatment i } 0 = Max Randomness, 1 = Min Randomness Essentially this is the maximum possible average predictability if an observer had complete information about the algorithm and the subjects being randomized. H/log(2) is interpretable as the theoretical lower limit of the number of bits required to encode a randomization schedule. Measuring Randomness H = p i log( p i ) 1 H max( H ) = 1 H / log(2)

Upload: dennis-sweitzer

Post on 18-Jun-2015

95 views

Category:

Technology


2 download

DESCRIPTION

Poster for Society for Clinical Trials annual meeting in Boston, MA Abstract Randomization methods generally are designed to be both unpredictable and balanced between treatment allocations overall and within strata. However, when planning studies, little consideration is given to measuring these characteristics, nor are they examined jointly, and published comparisons between methods often use incompatible metrics and simulation assumptions. Furthermore, for purposes of real-world planning, such simulations often make unrealistic assumptions (e.g., equal sized strata), and summary statistics give limited information.

TRANSCRIPT

Page 1: Sct2013 boston,randomizationmetricsposter,d6.2

•  Quantitative metrics clearly distinguished advantages and disadvantages between methods • Results confirmed that there are consistent trade-offs between efficiency and unpredictability over all methods •  Simulations for studies & their expected populations can:

•  take some of the guesswork out of selecting a particular randomization method. •  Allow better risk management for potential biases and impacts on analysis

•  Measures of randomness & predictability were complementary, •  Potential Selection Bias (Predictability) is customized to what a participating clinician might know •  Syntropy (Randomness) provides a universal randomness measure

Randomization methods generally are designed to be both unpredictable and balanced between treatment

allocations overall and within strata. However, when planning studies, little consideration is given to measuring these characteristics, nor are they examined jointly, and published comparisons between methods often use incompatible metrics and simulation assumptions. Furthermore, for purposes of real-world planning, such simulations often make unrealistic assumptions (e.g., equal sized strata), and summary statistics give limited information.

In order to better reflect real-world study performance, we carried out a series of simulations with 2 treatment arms, and stratification factors that are unequally populated (e.g., 1:2, 1:2:3, or a power law distribution).

To measure predictability, we modified the potential selection bias (Efron, Blackwell-Hodges) in which an observer guesses the next treatment to be the one that previously occurred least in the strata containing the subject (i.e., limiting the observer’s knowledge to individual strata, such as site). This reflects a game theory model of randomization pitting observers versus statistician, and is easy to calculate and interpret.

To measure imbalances, we calculated efficiency loss using Atkinson’s method because: The main impact of imbalances on the outcomes of a study is a loss of statistical power; Even if treatments are balanced overall, imbalances within sall strata can have a disproportionate impact on efficiency; And it is easy to interpret as lost sample size.

We applied these methods to evaluate the performance of several popular and novel randomization methods for a variety of parameters, including methods based on permuted blocks, dynamic allocation/minimization, and urn designs. Simulation results were summarized with the median and confidence intervals to estimate best & worst-case scenarios as well as expected performance.

Results showed consistent trade-offs between efficiency and unpredictability over methods and parameters, supporting no ‘best’ method to optimize both.

Randomization Metrics: Jointly assessing predictability and efficiency loss in randomization designs

Abstract

Randomization Metrics: Median and Confidence Intervals

Discussion

Atkinson, AC. (2003) The distribution of loss in two-treatment biased-coin designs. Biostatistics, 2003, 4, 2, pp. 179–193 Blackwell, D. and J.Hodges Jr (1957). Design for the control of selection bias. Ann Math Stat 28, 449-460 Wikipedia contributors. "Entropy (information theory)." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 23 Apr. 2013. Web. 14 May. 2013. Lebowitsch, J, et al, (2012). “Generalized multidimensional dynamic allocation method”. Statistics in Medicine,2012;

Bibliography

Questions: What is the best method for randomizing a clinical trial? What is the best method for randomizing this kind of clinical trial?�What is the best method for randomizing this design of clinical trial in this population of patients? Objective: Develop a set of quantifiable metrics •  To evaluate performance of covariate-adjusted randomization

methods and their parameters •  For specific study designs •  In realistic study populations

Objectives

Dennis E. Sweitzer, Ph.D Principal Biostatistician

Simulation Strategy: ➣ Simulate 200 subjects for each of 500 simulations ➣ Assign each subject values of covariates:

{Sex, Age, Site} ➣ Randomize each set of subjects by each randomization method ➣ Evaluate each randomization schedule using proposed metrics for subsets of: 25, 50, 100, 200 ➣ Summarize metrics across simulations with medians and 80% Confidence Intervals ➣ Plot metrics Simulated Study Populations: ➣ Stratify by Age and Sex ➣ Use Age & Sex for covariate-adjusted Randomization ➣ Site as an additional covariate included in Analysis model, but not in randomization NB: Age, Sex, and Site are proxies for real-world covariates – which rarely are distributed evenly.

Simulation Methods

Comparing Methods and their parameters ☞ (CR) Complete Randomization – also use as comparator ☞ (PB) Stratified Permuted Block: 1:1, 2:2, 3:3, 4:4, 6:6, 8:8

Treatments are assigned in blocks such as for 2:2: {TTPP, TPTP, TPPT, …} ☛ Dynamic Allocation / Minimization, weighted to:

☞(DAS) Stratified: Balance on strata ☞(DAM) Marginal: Balance on marginal balances ☞(DAE) Equal: Balance with equal weight on strata, marginal and overall balances.

Treatments are assigned to best reduce weighted sum of imbalances within stratification factors, eg, Randomizing young male male to PLA might worsen the imbalance within males, but improve it in the Young. Choice of weights will determine the treatment ☛ 2nd Best Probabilities of: {0, 0.15, 0.25, 0.35, 0.5} This is the probability of assigning a treatment which worsens the imbalance.

This parameter adds randomness to the algorithm

Randomization Methods

Sex (2:1) Female Male

Age Group (3:2:1) 67% 33% Mid. Aged 50% 33% ♀, M 17% ♂, M

Young 33% 22% ♀ , Y 11% ♂, Y Older 16% 11% ♀, O 5% ♂, O

Site ( 1 : ½ : ⅓ ) a b c

55% 27% 18%

Over all Ages:

Pla Test

Pla

Test

Pla Test

Over both sexes

Pla Test

Pla Test

Pla Test

Pla Test

Pla Test

Males Females

18-35 yo

35-65 yo

>65 yo

Pla Test

Pla Test

Pla Test

Pla Test

Marginal Imbalances within age groups, ignoring sex

Marginal Imbalances within sexes, ignoring age

groups

Imbalances Within

Strata: All combinations

of Age x Sex

Overall Imbalance over all subjects

Potential Selection Bias, limited to the knowledge available to an observer (Blackwell-Hodges, 1957) Observer always “guesses” to restore balance

Example: For treatment sequence “TPPP” Initial guess ⟶ Expectation = ½ “T” ⟶ Imbalance =+1 ⟶ Guess “P” ⟶ Correct “TP” ⟶ Imbalance=0 ⟶ Guess either ⟶ Expectation=½ “TCC” ⟶ Imbalance=-1 ⟶ Guess “T” ⟶ Wrong “TCCC” ⟶ # Correct= ½ + 1+ ½ +0 =2 Score = #Correct - #Expected = 2 - 2 = 0

Use: Potential Selection Bias (Strata): Based only on imbalances observed within Strata Also: Potential Selection Bias (Site): Based only on imbalances observed within sites

•  Customize this metric to reflect the information available to study personnel --typically only at their site.

Measuring Predictability

For an analysis model: z ≣treatment allocation α ≣treatment effect

X ≣Covariates / Design Matrix β ≣Covariate effects The variability of the treatment estimate is: And

is the Loss of Efficiency, or loss of Statistical power (Atkinson, 2003)

LOE ≈ # Subjects “Wasted” by sub-optimal treatment assignments

☞ In a Designed Experiment, one can select z and X to minimize LOE ☞ In a Randomized Controlled Trial, can only assign treatments z, but have very little control over X (which is the subjects arriving in the study)

Use: Model is: Treatment + Sex + Age + Sex*Age + Site

Measuring Impact on Analysis

LOE = ztX(XtX)−1Xtz

! ! = !! + !!!!Var(α̂) = σ 2

ztz− ztX(XtX)−1Xtz

Simulation results are summarized as Medians + 80% Confidence intervals Meaning: 10% of simulations were beyond the

upper limit, and 10% were beyond the lower limit.

Y-axis: Measures of Randomness: (Syntropy or Potential Selection Bias)

X-axis: Loss of efficiency.

Note on Graphic Results

0%#

5%#

10%#

15%#

20%#

25%#

0# 1# 2# 3# 4# 5# 6#

Poten&

al)Selec&o

n)Bias)

Loss)of)Efficiency)

Simula&on)results)as)80%)Confidence)Intervals)

DA(0),#Margin#Balance#

PB(1:1)#

DA(0),#Margin#Balance#

PB(1:1)#

Potential Selection Bias vs LOE All methods, one Graph ☞ Clear trade off between LOE and Potential Selection Bias

Results

1 2 3 4 5 6 7 8

0.00

0.05

0.10

0.15

0.20

0.25

N= 200

CR

PB1

PB2

PB3

PB4

PB6

PB8

DAS0

DAS15

DAS25

DAS35

DAS50

DAE0

DAE15

DAE25

DAE35

DAE50

DAM0 DAM15 DAM25

DAM35

DAM50

1 2 3 4 5 6 7 8

0.00

0.05

0.10

0.15

0.20

0.25

N= 25

CR

PB1

PB2

PB3

PB4

PB6

PB8

DAS0

DAS15

DAS25

DAS35 DAS50

DAE0

DAE15

DAE25

DAE35

DAE50

DAM0

DAM15

DAM25

DAM35

DAM50

1 2 3 4 5 6 7 8

0.0

0.2

0.4

0.6

0.8

N= 25

Syntropy

CR

PB1

PB2

PB3 PB4

PB6 PB8

DAS0

DAS15

DAS25 DAS35

DAS50

DAE0

DAE15

DAE25

DAE35

DAE50

DAM0

DAM15

DAM25

DAM35

DAM50

1 2 3 4 5 6 7 8

0.0

0.2

0.4

0.6

0.8

N= 200

Syntropy

CR

PB1

PB2

PB3

PB4

PB6 PB8

DAS0

DAS15

DAS25

DAS35 DAS50

DAE0

DAE15

DAE25

DAE35

DAE50

DAM0

DAM15

DAM25

DAM35

DAM50

Syntropy vs LOE ☞ Clear trade off between LOE and Potential Selection Bias

Potential Selection Bias vs LOE All methods become more efficient,

Some become more predictable DAM becomes less predictable

Results, Increasing N

1 2 3 4 5 6 7 8

0.00

0.05

0.10

0.15

0.20

0.25

Permuted Block, 1:1, 3:3, 8:8 + CR

LOE

n= 25

n= 50 n= 100

n= 200

n= 25

n= 50 n= 100 n= 200

n= 25

n= 50 n= 100 n= 200

n= 25

n= 50

n= 100 n= 200

1 2 3 4 5 6 7 8

0.00

0.05

0.10

0.15

0.20

0.25

DA (Strata), phi=0, 0.15, 0.35, 0.5

LOE

Pot.Sel.Bias.Strata

n= 25 n= 50

n= 100 n= 200

n= 25 n= 50

n= 100

n= 200

n= 25

n= 50

n= 100 n= 200

n= 25 n= 50

n= 100 n= 200

1 2 3 4 5 6 7 8

0.00

0.05

0.10

0.15

0.20

0.25

DA (Equal Wts), phi=0, 0.15, 0.35 0.5

n= 25 n= 50

n= 100 n= 200

n= 25 n= 50

n= 100 n= 200

n= 25

n= 50 n= 100 n= 200

n= 25

n= 50 n= 100

n= 200

1 2 3 4 5 6 7 8

0.00

0.05

0.10

0.15

0.20

0.25

DA (Margin), phi=0, 0.35 0.5

Pot.Sel.Bias.Strata

n= 25

n= 50

n= 100

n= 200 n= 25

n= 50 n= 100

n= 200 n= 25

n= 50 n= 100

n= 200

(Shannon) Entropy: From information theory: Syntropy: Rescale Entropy to [0,1] so that: Where: pi = Pr{Treatment i } 0 = Max Randomness, 1 = Min Randomness

•  Essentially this is the maximum possible average predictability if an observer had complete information about the algorithm and the subjects being randomized.

•  H/log(2) is interpretable as the theoretical lower limit of the number of bits required to encode a randomization schedule.

Measuring Randomness

H = − pi log(pi )∑1−H max(H ) = 1−H / log(2)