why bayes? innovations in clinical trial design & analysis

WHY BAYES?INNOVATIONS IN CLINICAL TRIAL DESIGN & ANALYSIS

Donald A. [email protected]

2

Conclusion These data add to the growing evidence thatsupports the regular use of aspirin and other NSAIDs … aseffective chemopreventive agents for breast cancer.

3

Results Ever use of aspirin or other NSAIDs … was reported in 301 cases (20.9%) and 345 controls (24.3%) (odds ratio 0.80, 95% CI 0.66-0.97).

4

Bayesian analysis? Naïve Bayesian analysis of

“Results” is wrong Gives Bayesians a bad

name Any naïve frequentist

analysis is also wrong

5

What is Bayesian analysis?

Bayes' theorem:'( | X ) () * f( X | ) Assess prior (subjective,

include available evidence) Construct model f for data

6

Implication: The Likelihood PrincipleWhere X is observed data, the likelihood function

LX() = f( X | )contains all the information in an experiment relevant for inferences about

7

Short version of LP: Take data at face value

Data: Among cases: 301/1442 Among controls: 345/1420

But “Data” is deceptive These are not the full data

8

The data Methods:

“Population-based case-control study of breast cancer”

“Study design published previously” Aspirin/NSAIDs? (2.25-hr ?naire) Includes superficial data:

Among cases: 301/1442 Among controls: 345/1420

Other studies (& fact published!!)

9

Silent multiplicities

Are the most difficult problems in statistical inference

Can render what we do irrelevant—and wrong!

10

Which city is furthest north? Portland, OR Portland, ME Milan, Italy Vladivostok, Russia

11

Beating a dead horse . . . Piattelli-Palmarini (inevitable illusions)

asks: “I have just tossed a coin 7 times.” Which did I get?

1: THHTHTT2: TTTTTTT

Most people say 1. But “the probabilities are totally even”

Most people are right; he’s totally wrong! Data: He presented us with 1 & 2!

Piattelli-Palmarini (inevitable illusions) asks: “I have just tossed a coin 7 times.” Which did I get?

1: THHTHTT2: TTTTTTT

Most people say 1. But “the probabilities are totally even”

Most people are right; he’s totally wrong! Data: He presented us with 1 & 2!

12

THHTHTT or TTTTTTT? LR = Bayes factor of 1 over 2 =

P(Wrote 1&2 | Got 1)P(Wrote 1&2 | Got 2)

LR > 1 P(Got 1 | Wrote 1&2) > 1/2 Eg: LR = (1/2)/(1/42) = 21

P(Got 1 | Wrote 1&2) = 21/22 = 95% [Probs “totally even” if a coin was used

to generate the alternative sequence]

13

0.00.10.20.30.40.50.60.70.80.91.0

0 1 2 3 4 5 6 7Years

DFS

Std (96)

Hi (95)

Low (93)

0.00.10.20.30.40.50.60.70.80.91.0

0 1 2 3 4 5 6 7Years

DFS

Std (41)

Hi (38)

Low (36)

Marker/dose interactionMarker/dose interactionMarker negative Marker positiveMarker negative Marker positive

14

Proportional hazards modelVariable Comp RelRisk P#PosNodes 10/1 2.7 <0.001MenoStatus pre/post 1.5 0.05TumorSize T2/T1 2.6 <0.001Dose –– –– NSMarker 50/0 4.0 <0.001MarkerxDose –– –– <0.001

This analysis is wrong!

15

Data at face value? How identified? Why am I showing you these

results? What am I not showing you? What related studies show?

16

Solutions? Short answer: I don’t know! A solution:

Supervise experiment yourself Become an expert on substance

Partial solution: Supervise supervisors Learn as much substance as you can

Danger: You risk projecting yourself as uniquely scientific

17

A consequence

Statisticians come to believeNOTHING!!

18

OUTLINE Silent multiplicities Bayes and predictive probabilities Bayes as a frequentist tool Adaptive designs:

Adaptive randomization Investigating many phase II drugs Seamless Phase II/III trial Adaptive dose-response Extraim analysis

Trial design as decision analysis

19

Bayes in pharma and FDA …

http://www.cfsan.fda.gov/~frf/bayesdl.htmlhttp://www.prous.com/bayesian2004/

23

BAYES AND PREDICTIVE PROBABILITY

Critical component of experimental design

In monitoring trials

24

Example calculationData: 13 A's and 4 B's

Likelihood p13 (1–p)4

25

Posterior density of p for uniform prior: Beta(14,5)

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1p

p (1–p)13 4

26

Laplace’s rule of succession

P(A wins next pair | data)= EP(A wins next pair | data, p)= E(p | data)= mean of Beta(14, 5)= 14/19

27

Updating w/next observation

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1p

Beta(15, 5)

Beta(14, 6)

prob 5/19 prob 14/19

28

Suppose 17 more observations

P(A wins x of 17 | data)= EP(A wins x | data, p)= E[ px(1–p)17–x | data, p]17

x( )

29

Best fitting binomial vs. predictive probabilities

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Binomial, p=14/19Binomial, p=14/19

Predictive, p ~ beta(14,5)Predictive, p ~ beta(14,5)

30

Comparison of predictive with posterior

.00

.05

.10

.15

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170 .1 .2 .3 .4 .5 .6 .7 .8 .9 1p

p (1–p)13 4

31

Example: Baxter’s DCLHb & predictive probabilities

Diaspirin Cross-Linked Hemoglobin Blood substitute; emergency trauma Randomized controlled trial (1996+)

Treatment: DCLHb Control: saline N = 850 (= 2x425) Endpoint: death

32

Waiver of informed consent Data Monitoring Committee First DMC meeting:

DCLHb SalineDead 21 (43%) 8 (20%)Alive 28 33Total 49 41

P-value? No formal interim analysis

33

Predictive probability of future results (after n = 850)

Probability of significant survival benefit for DCLHb after 850 patients: 0.00045

DMC paused trial: Covariates? No imbalance DMC stopped trial

34




35

BAYES AS A FREQUENTIST TOOL

Design a Bayesian trial Check operating characteristics Adjust design to get = 0.05 frequentist design That’s fine! We have 50+ such trials at MDACC

36




37

ADAPTIVE DESIGN Look at accumulating data …

without blushing Update probabilities Find predictive probabilities Modify future course of trial Give details in protocol Simulate to find operating

characteristics

38




39

Giles, et al JCO (2003) Troxacitabine (T) in acute myeloid

leukemia (AML) when combined with cytarabine (A) or idarubicin (I)

Adaptive randomization to:IA vs TA vs TI

Max n = 75 End point: CR (time to CR < 50 days)

40

Randomization Adaptive Assign 1/3 to IA (standard)

throughout (unless only 2 arms) Adaptive to TA and TI based on

current results Final results

41

Patient Prob IA Prob TA Prob TI Arm CR<501 0.33 0.33 0.33 TI not2 0.33 0.34 0.32 IA CR3 0.33 0.35 0.32 TI not4 0.33 0.37 0.30 IA not5 0.33 0.38 0.28 IA not6 0.33 0.39 0.28 IA CR7 0.33 0.39 0.27 IA not8 0.33 0.44 0.23 TI not9 0.33 0.47 0.20 TI not

10 0.33 0.43 0.24 TA CR11 0.33 0.50 0.17 TA not12 0.33 0.50 0.17 TA not13 0.33 0.47 0.20 TA not14 0.33 0.57 0.10 TI not15 0.33 0.57 0.10 TA CR16 0.33 0.56 0.11 IA not17 0.33 0.56 0.11 TA CR

42

Patient Prob IA Prob TA Prob TI Arm CR<5018 0.33 0.55 0.11 TA not19 0.33 0.54 0.13 TA not20 0.33 0.53 0.14 IA CR21 0.33 0.49 0.18 IA CR22 0.33 0.46 0.21 IA CR23 0.33 0.58 0.09 IA CR24 0.33 0.59 0.07 IA CR25 0.87 0.13 0 IA not26 0.87 0.13 0 TA not27 0.96 0.04 0 TA not28 0.96 0.04 0 IA CR29 0.96 0.04 0 IA not30 0.96 0.04 0 IA CR31 0.96 0.04 0 IA not32 0.96 0.04 0 TA not33 0.96 0.04 0 IA not34 0.96 0.04 0 IA CR

Compare n = 75

DropTI

43

Summary of results

CR rates: IA: 10/18 = 56% TA: 3/11 = 27% TI: 0/5 = 0%

Criticisms . . .

44




45

Example: Adaptive allocation of therapies

Design for phase II: Many drugsAdvanced breast cancer (MDA);

endpoint is tumor responseGoals:

Treat effectively Learn quickly

46

Comparison: Standard designs

One drug (or dose) at a time; no drug/dose comparisons

Typical comparison by null hypothesis: response rate = 20%

Progress is slow!

47

Standard designsOne stage, 14 patients:

If 0 responses then stop If ≥ 1 response then phase III

Two stages, first stage 20 patients: If ≤ 4 or ≥ 9 responses then stop Else second set of 20 patients

48

An adaptive allocation When assigning next patient, find

r = P(rate ≥ 20%|data) for each drug[Or, r = P(drug is best|data)]

Assign drugs in proportion to r Add drugs as become available Drop drugs that have small r Drugs with large r phase III

49

9 drugs have mix of response rates 20% & 40%, 1 (“nugget”) has 60%

Standard 2-stage design finds nugget with probability < 70% (After 110 patients on average)

Adaptive design finds nugget with probability > 99% (After about 50 patients on average)

Adaptive also better at finding 40%

Suppose 10 drugs, 200 patients

50

Suppose 100 drugs, 2000 patients 99 drugs have mix of response rates

20% & 40%, 1 (“nugget”) has 60% Standard 2-stage design finds nugget

with probability < 70% (After 1100 patients on average)

Adaptive design finds nugget with probability > 99% (After about 500 patients on average)

Adaptive also better at finding 40%

51

Consequences Recall goals:

(1) Treat effectively(2) Learn quickly

Attractive to patients, in and out of the trial

Better drugs identified faster; move through faster

52




53

Example: Seamless phase II/III

Drug vs placebo, randomized Local control (or biomarker, etc):

early endpoint related to survival? May depend on treatment

*Inoue et al (2002 Biometrics)

54

LocalcontrolNo localcontrol

SurvivaladvantageNo survivaladvantage

Phase II Phase III

Conventional drug development

6 mos 9-12 mos > 2 yrs

Stop

Seamless phase II/III

< 2 yrs (usually)

Not

Market

55

Seamless phases Phase II: Two centers; 10 pts/mo.

drug vs placebo. If predictive probabilities look good, expand to

Phase III: Many centers; 40+ pts/mo.(Initial centers accrue during set-up)

Max sample size: 900

[Single trial: survival data from both phases combined in final analysis]

56

Early stopping Use predictive probs of stat. signif. Frequent analyses (total of 18)

using predictive probabilities: To switch to Phase III To stop accrual

For futilityFor efficacy

To submit NDA

57

Conventional Phase III designs: Conv4 & Conv18, max N = 900(samepower as adaptive design)

Comparisons

58

Expected N under H0

0

200

400

600

800

1000

431

855 884

Bayes Conv4 Conv18

59

Expected N under H1

0

200

400

600

800

1000

649

887 888

Bayes Conv4 Conv18

60

Benefits Duration of drug development is

greatly shortened under adaptive design: Fewer patients in trial No hiatus for setting up phase III Use all patients to assess phase III

endpoint and relationship between local control and survival

61

Possibility of large N N seldom near 900 When it is, it’s necessary! This possibility gives Bayesian

design its edge[Other reason for edge is modeling local control/survival]

62




*Berry, et al. Case Studies in Bayesian Statistics 2001

*

64

Example: Stroke and adaptive dose-response

Adaptive doses in Phase II setting: learn efficiently and rapidly about dose-response relationship

Pfizer trial of a neutrofil inhibitory factor; results recently announced

Endpoint: stroke scale at week 13 Early endpoints: weekly stroke scale

65Doses

Standard Parallel Group DesignStandard Parallel Group DesignEqual sample sizes at each of k doses.

66

Res

pons

e

Doses

True dose-response curve True dose-response curve (unknown)(unknown)

67

Res

pons

e

Doses

Observe responses (with error) Observe responses (with error) at chosen dosesat chosen doses

68

Res

pons

e

Doses

True EDTrue ED9595

Dose at which 95% max effectDose at which 95% max effect

69

Res

pons

e

Dose

True EDTrue ED9595

Uncertainty about ED95Uncertainty about ED95

??

70

Res

pons

e

Dose

Uncertainty about ED95Uncertainty about ED95

??

71

Res

pons

e

Doses

Solution: Solution: Increase number of dosesIncrease number of doses

EDED9595

72

Res

pons

e

Doses

But, enormous sample size, and . . . But, enormous sample size, and . . . wasted dose assignments—always!wasted dose assignments—always!

EDED9595

73

Our adaptive approach Observe data continuously Select next dose to maximize

information about ED95, given available evidence

Stop dose-ranging trial when know ED95 & response at ED95 “sufficiently well”

74

Our approach (cont’d)Info accrues gradually about each patient; prediction using longitudinal model

Longitudinal Model Longitudinal Model Copenhagen Stroke DatabaseCopenhagen Stroke Database

Difference from baseline in SSS week 3

-30

-20

-10

0

10

20

30

40

50

-40 -30 -20 -10 0 10 20 30 40

Diff

eren

ce fr

om b

asel

ine

in S

SS

wee

k 12

50

75

Our approach (cont’d)

Model dose-response (borrow strength from neighboring doses)

Many doses (logistical issues)

76

Possible decisions each day: Stop trial and drug’s development Stop and set up confirmatory trial Continue dose-finding (what dose?)

Size of confirmatory trial based on info from dose-ranging phase

Choices by decision analysis (Human safeguard: DSMB)

77

Dose-response trial Learn efficiently and rapidly about

dose-response; if + go to Phase III Assign dose to maximize info

about dose-response parameters given current info

Use predictive probabilities, based on early endpoints

Doses in continuum, or preset grid

78

Dose-response trial (cont’d) Learn about SD on-line Halt dose-ranging when know

dose sufficiently well Seamless switch from dose-

ranging to confirmatory trial—2 trials in 1!

79

Advantages over standard design

Fewer patients (generally); faster & more effective learning

Better at finding ED95Tends to treat patients in trial

more effectivelyDrops duds early —actual trial!

80

Dose-assignment simulation

Assumes particular dose-response curve

Assumes SD = 12Shows weekly results, several

patients at a time (green circles)

81DOSE

E(f)

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

Prior

82DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA

green=obs, blue=imputed, black=true mn

83DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


84DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


85DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


86DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


87DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


88DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


89DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


90DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


91DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


92DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


93DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


94DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


95DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


96DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


97DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


98DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


99DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


100DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


101DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


102DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


103DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


104DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


105DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


106DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


107DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


108DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


109DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


110DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


111DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


112DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


113DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


114DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


115DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


116DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


117DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


118DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


119DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


120DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


121DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


122DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


123DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


124DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


125DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


126DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


127DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


128DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


129DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


130DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


131DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


132DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


133DOSE

Y

0.0 0.5 1.0 1.5

0

5

10

15

20

25

30

DATA


EstimatedED95

Confirmatory

134

0 10 20 30

WEEK

0.0

0.5

1.0

1.5

DOSE

Assigned Doses by Week - one simulationD

OS

E0.

0

0.5

1.

0

1.5

DO

SE

135Z

F

0.0 0.5 1.0 1.5

0

5

10

15

20

Estimated functions

d:/data/build13/run11/

136

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

0.0

0.02

0.04

0.06

0.08

0.10

0.12

0.14

ASSIGNED DOSES

Proportion

Doses assigned across all simulations

0

5

10

15

20

Black: median; Red: upper & lower quartiles; Green: Nominal

137Z

F

0.0 0.5 1.0 1.5

0

5

10

15

20

Estimated functions

d:/data/build13/run12/

(no dose effect)

138

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

0.0

0.05

0.10

0.15

0.20

0.25

ASSIGNED DOSES

Proportion

Doses assigned across all simulations

0

5

10

15

20

Black: median; Red: upper & lower quartiles; Green: Nominal

139

Consequences of Using Bayesian Adaptive ApproachFundamental change in the

way we do medical researchMore rapid progressWe’ll get the dose right!Better treatment of patients . . . at less cost

140

ReactionsFDA: Positive. “Makes coming to

work worthwhile.” “In five years all trials may be seamless.”

Pfizer management: EnthusiasticOther companies: Cautious

141




142

Example: Extraim analysis Endpoint: CR (detect 0.42 vs 0.32) 80% power: N = 800 Two extraim analyses, one at 800 Another after up to 300 added pts Maximum n = 1400 (only rarely) Accrual: 70/month Delay in assessing response

143

After 800 patients, have response info on 450

Find predictive probability of stat significance when full info on 800

Also when full info on 1400 Continue if . . . Stop if . . . If continue, n via predictive power Repeat at second extraim analysis

Table 1: p0=0.42 p1 P(succ) meanSS sdSS P(800) P(1400) P(succ1) P(succ2) 0.37 0.0001 844.6 122.0 0.8707 0.0194 0.0001 0.0001 0.42 0.0243 1011.2 247.6 0.5324 0.2360 0.0084 0.0059 0.47 0.4467 1188.5 254.5 0.2568 0.5484 0.1052 0.0914 0.52 0.9389 1049.9 248.7 0.4435 0.2693 0.4217 0.2590 0.57 0.9989 874.2 149.1 0.7849 0.0268 0.7841 0.1729



vs 0.80

145




146

For each trial design … List possible results

Calculate their predictive probabilities Evaluate their utilities

Average utilities by probabilities to give utility of trial with that design

Compare utilities of various designs Choose design with high utility

Decision-analytic approach

147

Choosing sample size* Special case of above One utility: Effective overall

treatment of patients, both those after the trial in the trial

Example, dichotomous endpoint:Maximize expected number of successes over all patients*Cheng et al (2003 Biometrika)

148

Compare Joffe/Weeks JNCI Dec 18, 2002

“Many respondents viewed the main societal purpose of clinical trials as benefiting the participants rather than as creating generalizable knowledge to advance future therapy. This view, which was more prevalent among specialists such as pediatric oncologists that enrolled greater proportions of patients in trials, conflicts with established principles of research ethics.”

149

Maximize effective treatment overall

What is “overall”? All patients who will be treated

with therapies assessed in trial Call it N, “patient horizon” Enough to know mean of N Enough to know magnitude of N:

100? 1000? 1,000,000?

150

Goal: maximize expected number of successes in N Either one- or two-armed trial Suppose n = 1000 is right for N = 1,000,000 Then for other N’s use n =

151

Optimal allocations in a two-armed trial

152

Knowledge about success rate r

153




why bayes? innovations in clinical trial design & analysis

Documents