basics of clinical trial design msc in drug development, clinical pharmacology and translational...

47
MSc in Drug Development, Clinical Pharmacology and Translational Medicine BASICS of CLINICAL TRIAL DESIGN Janet Peacock Professor of Medical Statistics Division of Health and Social Care Research

Upload: miles-jefferson

Post on 16-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

MSc in Drug Development, Clinical Pharmacology and Translational

Medicine

BASICS of CLINICAL TRIAL DESIGN

Janet Peacock Professor of Medical Statistics

Division of Health and Social Care Research

Content

Study question Comparison groups Randomisation Blinding and placebos Primary and secondary outcomes Analysis populations Choosing outcomes

I gratefully acknowledge use of slides from IRM marked with a **

QuizFind the best answer:

1. Type 1 error is when: b

2. Type 2 error is when: c

3. A non-significant finding in a phase 3 trial means: h

4. If the clinically important difference is increased: g

5. If the outcome is a mean rather than a proportion: g

6. A statistically significant difference in a superiority trial

means: i

7. Power of a study is: k

8. Significance level is: e

9. If an equivalence design is used rather than

superiority: f

10. If groups of individuals are randomised: f

a a significant difference is found in the study sample when there is a real difference in the population

b a significant difference is found in the study sample when there is no difference in the population

c no significant difference is found in the study sample when there is a real difference in the population

d no significant difference is found in the study sample when there is no difference in the population

e Usually set at 5%

f the sample size is increased

g the sample size is reduced

h the study cannot prove efficacy

i the new treatment works

j 1-type 1 error

k 1-type 2 error

Planning a Study

Question 1: What is the Study Question?

Phase I: How is the drug/treatment handled by the human body? Healthy volunteers or unresponsive patients

Phase II: What is the dose response curve? small group of patients

Phase III: Is the treatment better than placebo/std treatment? large patient population

Phase IV: What are the long-term effects, are there any drug interactions? post-marketing, large samples, long follow-up

** I gratefully acknowledge use of this slide from from IRM

Planning a StudyQuestion 2: Study Population? First specify inclusion exclusion criteria: The results can only be

generalized to patients who are similar to study participants

Patient Characteristics

Diagnostic test results (standardized)

Disease Duration

Disease Severity

Consider prevalence and patient numbers to calculate approximate sample size to achieve

Need to consider compliance and attrition

Multi-center collaborations increase the target population but introduce noise

** I gratefully acknowledge use of this slide from from IRM

Comparison groups

To discuss:

Why do we need a comparison group for a trial of a new treatment?

How could we use a historical control group? Any problems?

Comparison groups

Need concurrent comparison group

Avoids changes over time in: Other treatments patients receive as these may

change over time Other services & treatment by clinical staff

where practice changes and staff change Behaviours due to secular/cultural influences

eg media campaigns or media education etc

Allocation to treatment groups

To discuss:

Is it okay to let subjects choose between either of 2 new treatments and then compare the groups?

Could this cause any problems?

Randomisation

Allocation needs to be unbiased

ie not affected by patient characteristics

Use random allocation by computer program to do this

To ensure similar numbers per group use ‘block randomisation’

Not predictable to recruiting clinicians/researcher else may affect their actions

Block randomisation Used to ensure no. subjects in each group is similar Random allocation determined in discrete blocks so that within

each block there are equal numbers in each group Example using blocks of size 4, and 2 treatments A and B

Stratification Used when it is important to have balanced random

allocation in specific sub-groups defined by specific prognostic factors eg age, sex, severity of disease etc

For example in a trial conducted in several centres it may be important to ensure that the numbers on each treatment are similar within each centre

This is achieved by using a separate randomization list within each centre

Minimization

Non-random way of allocating subjects to treatments that maintains balance in several specific prognostic factors

Can be useful in a small trial where random allocation may by chance produce imbalance in key factors (less likely in large trials)

For an example see Altman and Bland BMJ 2005;330:843

Blinding & placebos

Discussion:

Does it matter if patients know what treatment there are receiving?

When might it matter most & least?

Should clinicians/researchers be blind to treatment? Why?

Blinding & placebos

Blinding:

Psychological effects in patients and assessors Randomization makes blinding possible Single, double-blind Placebos

single dummy (identical inert treatment) double dummies (used for blinding when 2

treatments have different modes eg liquid compared to tablet)

Primary, secondary outcomes

PRIMARY OUTCOME

Used to determine whether treatment is effective

Usually only have one

Usually a measure of efficacy

SECONDARY OUTCOMES

Used to look at other effects, positive (efficacy) and negative (safety [adverse events] or side effects)

Usually several

Analysis population in RCTs?

Intention to treat

In original randomised groups

Adherers: ‘per protocol’

Receive full protocol

Fully compliant

Treatment received

Regardless of allocation

Intention to treat (ITT) in RCTs

Analyse according to original randomised groups even if subjects drop-out/refuse/switch treatments

Preserves comparability between groups: unbiased

Difference can be directly attributed to treatment

Tests offer rather than receipt of treatment

Conservative (bias to null) if non-compliance

Usually the main analysis

Per protocol analysis in RCTs

Receive full protocol & fully compliant (omit others)

Reflects what happens in practice

May be biased as groups no longer comparable patients not included likely to be different ie difference not necessarily due to treatment

May be useful as a contributory/explanatory analysis but not usually main analysis

Treatment received in RCTs

Regardless of allocation

Real world

Likely to be biased

May be relevant with adverse events

Choosing outcomes

Suitable

Continuous vs dichotomous, time to event

Sizes of differences thought to be clinically meaningful

Consequences of too small or too big a study

Presenting proportions

Dichotomisation of continuous data?

Statisticians may raise objections when researchers turn continuous data into dichotomous data

…because it throws data away

.. and so loses power, precision, obscures/changes relationships

Subject

no.

Birthweight (g) LBW (BW<2500g;

0=no, 1=yes)

1 1600 1

2 2940 0

3 2920 0

4 4560 0

5 3400 0

6 2800 0

7 4510 0

8 3870 0

9 2810 0

10 3200 0

11 3660 0

12 1860 1

So why do doctors and researchers dichotomise?

Clinical Practice: cut-offs commonly used to define point at which treatment starts eg

anti-hypertensive treatment commonly starts when diastolic blood pressure ≥90mmHg

statins may be given if cholesterol level above say 5.3

Epidemiological/clinical research: cut-offs used to indicate poor outcome eg

low birthweight (<2500g) widely used as indicator of poor outcome of pregnancy at population level

specific cut-off for pain scores used as indication patient has ‘responded’ to pain treatment

Some statistical research: motivation

• RCT in diabetic pregnant women to reduce percentage of babies large-for-gestational age (LGA)

• Outcome: % LGA babies: currently 15%

• Reduction to 12% considered clinically relevant

• Required sample size is 2791 in each group

• Ie a total of 5582, with =5%, 1-=90%

• Not feasible

• But ...large-for-gestational age is based on dichotomising continuous variable, birthweight-for-gestation (z score)

Tail area in Normal distribution

For Normal distribution can calculate % above given cut-off given mean and SD

Use this principle to base study design on a continuous variable BUT also allows calculation of %

What do we mean by shift in means?

• Difference in means is directly related to a given difference in %s below a cut-off

• ‘Distributional method’ allows calculation of both differences in means and differences in %s without loss of precision

Ref: Peacock et al Statistics in Medicine 2012

Is study now feasible?Relative

change in LGA

(%)

% LGA in treated women

(vs 15% in untreated)

Equivalent change in

mean in SDs

Total SS for change in

means

(2xn)

Total SS for change in

proportions

(2xn)

30%

25%

20%

10.50%

11.25%

12.00%

0.217

0.177

0.139

896

1344

2178

2394

3510

5582

The table above shows the required sample size is much less when considering comparison of means rather than comparison of proportions

The study was then feasible

Results….

Can now get difference in means with 95% CI plus difference in proportions with derived 95% CI so meet needs of both stakeholders while maintaining statistical rigour

Estimates for proportions data in 2 groups, p1, p2

Difference in proportions: p1-p2

Ratio of proportions: p1/p2

Ratio of odds: p1/(1-p1) p2/(1-p2)

Number needed to treat:1/(p1-p2)

Which to use?

Choosing the estimate to report for proportions data in 2 groups

Difference in proportions: when absolute differences are of interest

Ratio of proportions: When relative differences matter eg if comparing effects

for lots of factors

Ratio of odds: Case-control study – it’s all you can do (plus logistic

regression)

Number needed to treat:

RCT and interested in how many patients need to treat to get positive outcome in one

Use as subsidiary to p1-p2 or p1/p2

Don't just report NNT

Calculating NNT

Suppose the proportions with successful outcome are: p1=0.8 (treatment group) p2=0.6 (placebo group)

p1 – p2 = 0.2

This is proportion of success over and above placebo

So for every one patient treated, 0.2 will be successful

So, need to treat five to get one success (1/0.2)

Hence NNT=1/(p1-p2) = 5 here

Sample size and power (1)

What happens if a study comparing 2 groups is too small?

Eg.in a small drug trial, difference between new & old drug is not significant – it’s hard to know if:

i) new drug really doesn’t work or ii) trial is too small to show a difference

Sample size and power (2)

Need to know study question & design before doing calculations (+ draft protocol)

Need idea of what size of effects we expect/hope to find

Want good precision for estimates

Want to minimise chance of drawing wrong conclusion, due to:i) poor precisionii) false positive (type 1 error) iii) false negative (type 2 error)

Type 1 error

We conclude there is a difference between the groups (ie get a significant finding) when there is no difference in the underlying population

ie by chance we get an unusual sample

This is defined by the cut-off for significance and is usually set at 0.05 or 5% -- known as the significance level

Note: this means we get a false significant result on average 1/20 times

Avoid by careful analysis:i) choice of significance level

ii) define questions in advance – no fishing!

Type 2 error

We conclude there is no difference when in fact there is a difference in the underlying population

100%-type 2 error is the power of the study and is usually set at ≥80%, preferably ≥90%

Avoid by good design:

i) Choose right outcome

ii) Large enough sample for question

iii) ie high power

Pragmatics

Sometimes sample size is constrained by time, and/or cost, and/or availability of subjects etc

In this case sample size calculations should still be presented to show that aims of study can still be achieved

If aims can’t be achieved then it may not be good to do study unless data can be pooled with other study data (meta-analysis)

What is a clinically important or clinically meaningful difference? Sample size calculations for comparative studies need estimate of

size of difference that would be considered important

ie size of difference that researcher would not want to fail to detect in his/her study

Researcher’s decision not statistician’s one

Can be difficult to decide on:- consult literature other studies colleagues.... if unsure

Trial phases and design

Early phase trials: Looking at tolerance/toxicity and may involve human volunteers or

animals (phase 1) & may be uncontrolled ‘First-in-man’ studies may be uncontrolled or small controlled

studies (phase 2) & test feasibility/dose/side effects/safety Conducted prior to large & conclusive phase 3 trial if drug/treatment

is ‘promising’ in early trials

Phase 3: What we have mostly referred to here ie where treatments are

randomly allocated in a way that mirrors how the treatment will be used

Usually tests efficacy

Phase 4: Post-marketing surveillance - safety

Trial designsSo far considered 2-group situation where looking at superiority. Other situations are:

Cross-over trials Two or more treatments are compared in a random order within

individuals Can only be used for chronic conditions such as pain

Sequential trials: Specifically designed where 2 parallel groups are treated and

studied but the trial stops when either a clear benefit emerges or there is no possibility of a difference

Equivalence trials/non-inferiority: Used when trialing a new drug that is expected to be at least as

good as an existing one but has benefits such as fewer side effects or cheaper

Needs specific design and generally needs larger sample than conventional (superiority) designs

Design: What is the hypothesis?

1. Superiority:

Objective To determine whether there is evidence of statistical difference in the comparison of interest between 2 treatments:

A: treatment of interest B: placebo or active control

Null (H0): The mean response is the same for the 2 treatments

ie A=B

Alternative (H1): The mean response is different for the 2 treatments

ie AB (either A>B or B>A)

** I gratefully acknowledge use of this slide from from IRM

2. Equivalence:

Objective To demonstrate that 2 treatments have no clinically meaningful difference

Null (H0): The 2 treatments have different mean responses such that: ie either (A-B) ≤ -d or (A-B) ≥ +d

implies: A not equivalent to B

Alternative (H1): The 2 treatments means are the same such that:

ie either –d < (A-B) < +dimplies A equivalent to B

Design: What is the hypothesis?

d = largest difference clinically acceptable

** I gratefully acknowledge use of this slide from from IRM

3. Non-Inferiority:

Objective To demonstrate that a given treatment A is not clinically inferior to another, B(maximum allowable clinically meaningful difference=d)

Null hypothesis H0: A given treatment is inferior with respect to the mean response

ie A-B ≤ -d

Alternative hypothesis H1: A given treatment is non-inferior with respect to the mean response

ie A-B > -d

Design: What is your Hypothesis

** I gratefully acknowledge use of this slide from from IRM

Interim Analysis

Interim analysis: analysis conducted before specified full sample size reached

Purpose: may allow trial to stop early if:- * strong evidence for superiority of either treatment* safety concerns* futility

Timing MUST be specified in protocol

Need to allow for additional testing of treatment difference in sample size calculations (‘spending p’)

** I gratefully acknowledge use of this slide adapted from IRM

Predictive Biomarker Validation*

*S.J. Mandrekar & D.J. Sargent. Clinical Oncology, 2009

A validated predictive marker can prospectively identify individuals who are likely to have a given clinical outcome.

Retrospective Validation

** I gratefully acknowledge use of this slide from from IRM

Predictive Biomarker Validation

Prospective Validation

① Targeted or Enrichment Design② Unselected or all-comers Design

a) Marker Based Designb) Sequential Testing Strategy Designc) Hybrid Design

** I gratefully acknowledge use of this slide from from IRM

Predictive Biomarker Validation

① Targeted or Enrichment Design

There is preliminary evidence that only patients who express a marker will benefit from the study treatment. All patients are screened, only biom patients recruited.✚

Appropriate when therapy has modest benefit on biom− patients, but cause sig. toxicity. Also if treating biom− is ethically impossible

** I gratefully acknowledge use of this slide from from IRM

② Unselected or All-Comers Designs

a) Marker-based designs

Marker status used as stratification factor, and randomize patients within each marker group.

- Only patients with valid marker result randomized- Sample size prospectively specified per marker group

Predictive Biomarker Validation

** I gratefully acknowledge use of this slide from from IRM

References & further reading

Janet Peacock & Philip Peacock

THE OXFORD HANDBOOK OF MEDICAL STATISTICS

Oxford University Press 2010(chapter 1 , p6-23 [clinical trials], 62-70 [sample size])

Janet Peacock & Sally Kerry

PRESENTING MEDICAL STATISTICS FROM PROPOSAL TO PUBLICATION

Oxford University Press 2006(chapter 3, p19-24 for how to do sample size calculations in Stata)