RDP Statistical Methods in Scientific Research - Lecture 3 1
Lecture 3
The design of scientific investigations
3.1 Considerations and terminology
3.2 Agricultural field trials
3.3 Clinical trials
3.4 General design considerations
RDP Statistical Methods in Scientific Research - Lecture 3 2
Units of observation
These are the items from which responses are recorded
They may be people, human families, pots of tomatoes,agricultural field plots, samples of river water, washing machines, etc.
One response is taken from each unit
3.1 Considerations and terminology
RDP Statistical Methods in Scientific Research - Lecture 3 3
Factors
These might be items under the control of the investigator:drugs administered to ratsfertilizers administered to cropsadditives put into petrol
or out of control but to be explored or adjusted for:ages of volunteersseason of the yeartemperature during the experiment
They might be quantitative (dose of drug, amount of additive)or qualitative (active or control, male or female)
RDP Statistical Methods in Scientific Research - Lecture 3 4
Responses
These are the measurements of interest:did the rat develop cancer (yes or no)?what was the crop yield?how many miles per litre were achieved?
There may be several different responses collected number of tomatoes, total weight of tomatoes,
quality of tomatoes
There will be only one response of each type per unit ofobservation
RDP Statistical Methods in Scientific Research - Lecture 3 5
Analysis
An exploration of the way in which the distribution ofresponses changes according to the values of the factors
Exploratory:Find out what factors have an influence on the response
Hypothesis testing:Find out whether one factor really does have an effect onresponse
Estimation:Determine the magnitude of the effect of a factor or factors onresponse
RDP Statistical Methods in Scientific Research - Lecture 3 6
complete randomised block experiment
direction of slope
3.2 Agricultural field trials
n p K n p k n P K
n p k N P K N P k
N p k n p K N p K
n P K n P k N P K
N P K N p k n P k
n P k n P K N p k
N P k N p K n p k
N p K N P k n p K
RDP Statistical Methods in Scientific Research - Lecture 3 7
The factors are : extra nitrogen (N) or not (n)extra phosphorous (P) or not (p)extra potassium (K) or not (k)
This gives 23 = 8 treatments
The field is sloping from right to left, otherwise homogeneous:it is split into 3 blocks, internally homogeneous but differentfrom one another
Each block is divided into 8 plots which are the units ofobservation
A different treatment is applied to each plot, the arrangementwithin blocks is at random
Response will be yield of grain
RDP Statistical Methods in Scientific Research - Lecture 3 8
Each block is a replicate of the others – the more replicates, the greater the precision of the experiment
Each block is complete, every treatment is included
There are 24 plots, 1 overall effect, 7 treatment effects and 2 block effects, leaving 14 degrees-of-freedom for the
estimation of variability
y = + i + j + , where 1 = 0 and 1 = 0
The separate and combined effects of N, P and K can be explored, usually interactions with blocks are not fitted
RDP Statistical Methods in Scientific Research - Lecture 3 9
An analagous situation
Car 1 Car 2 Car 3
a b C a b c a B C
a b c A B C A B c
A b c a b C A b C
a B C a B c A B C
A B C A b c a B c
a B c a B C A b c
A B c A b C a b c
A b C A B c a b C
RDP Statistical Methods in Scientific Research - Lecture 3 10
The factors are presence or absence of three petrol additives (A, B and C)
The response is the emission of polluting chemicals over onehour of running
Each of the each treatments (formed by combining the additives)is tried in each of three cars (which take the place of the blocks)
The experimental structure is the same as the field experiment
Here, car treatment interactions may be of interest
RDP Statistical Methods in Scientific Research - Lecture 3 11
Variations and compromises
The complete randomised block experiment is an ideal
Often there are compromises
• covariates at each plot to account for (eg. moisture content)• number of plots per block number of treatments (eg tomato)• number of plots varies from block to block
Design the experiment to come close to the ideal: the analysiswill allow for the real situation
RDP Statistical Methods in Scientific Research - Lecture 3 12
Split plots
The spraying machine covers 3 plotsThree varieties are to be compared (for pest infestation)
Spray 1 Var A Spray 2 Var A Spray 3 Var C
Spray 1 Var B Spray 2 Var C Spray 3 Var A
Spray 1 Var C Spray 2 Var B Spray 3 Var B
Spray 3 Var C Spray 1 Var A Spray 2 Var B
Spray 3 Var A Spray 1 Var B Spray 2 Var A
Spray 3 Var B Spray 1 Var C Spray 2 Var C
Spray 2 Var B Spray 3 Var C Spray 1 Var A
Spray 2 Var A Spray 3 Var B Spray 1 Var B
Spray 2 Var C Spray 3 Var A Spray 1 Var C
RDP Statistical Methods in Scientific Research - Lecture 3 13
Such a structure is common:
• children within classes• fruits within trees• repeated episodes within individuals (cross-over study)
The analysis of such experiments is routine, but the nature of theexperimental structure must be taken account of
if each split-plot is of size 2, then a paired t-test might be used
RDP Statistical Methods in Scientific Research - Lecture 3 14
randomised intervention studies
An experimental drug (E) is to be compared with a controltreatment (C) in a population of patients diagnosed with thecondition in question
Units of observation: individual patientsFactors: treatment experimental or control
baseline prognostic factors such as age, severity of condition Response: a measure of efficacy reduction in blood pressure after
one month measure of functionality 90 days after a stroketime from entry of trial to death
3.3 Clinical trials
RDP Statistical Methods in Scientific Research - Lecture 3 15
Analysis
Straightforward:Compare the patients receiving E with those receiving C in termsof the efficacy response while adjusting for baseline prognosticfactors
Typical strategy: Fit a linear model (for normally distributed, binary, ordinal or
survival data) Include prognostic factors first, then add treatment: significant
effect Check whether prognostic factor treatment interactions are
important
RDP Statistical Methods in Scientific Research - Lecture 3 16
Hard part
Ensuring that any differences found between E and C really are due to treatment
Strategies: randomisation blindness
Note:In agricultural field trials, all plots are to be sown and thenharvested at the same time
In clinical trials, patients enter one by one over a period of time, as they are diagnosed and they are treated immediately
RDP Statistical Methods in Scientific Research - Lecture 3 17
Randomisation
When each patient is diagnosed, first assess eligibility and obtain consentthen allocate to treatment
toss a coin (heads E, tails C) completely random allocation
throw a die to allocate next four patients 1 EECC, 2 ECEC, 3 ECCE 4 CEEC, 5 CECE, 6 CCEErandom permuted blocks
phone up an Interactive Voice Recognition System (IVRS): random allocation will be made to favour comparability of the two groups in terms of prognostic factors
minimisation
RDP Statistical Methods in Scientific Research - Lecture 3 18
Randomisation
Each method would be implemented by computer, either inadvance (giving allocations in sealed envelopes) or on-line
The random element ensures that the two treatment groups are ascomparable as possible
no choosing the treatment having met the patient no predicting the next allocation when assessing eligibility
By chance, some imbalance between treatment groups mayremain, so still a need to adjust for prognostic factors
RDP Statistical Methods in Scientific Research - Lecture 3 19
Blindness
The patient should not know whether they are on E or C the control group receives a placebo identical to E this avoids bias in subjective assessments and
decisions such as withdrawal from the study
The treating clinician should not know patients’ treatments this also avoids bias in subjective assessments and
decisions such as withdrawal from the study
Not always possible: for example surgery versus drug treatment could use a blind assessor
RDP Statistical Methods in Scientific Research - Lecture 3 20
Analogous situations
• Psychological interventions in human subjects• Educational interventions in children• Animal experiments
Cluster randomised trials
Clusters of units of observation are randomised to treatment
• Classes of children are taught in different ways• Groups of prisoners are supervised in different ways
The analysis follows the split-plot pattern
RDP Statistical Methods in Scientific Research - Lecture 3 21
The randomised clinical trial as a gold standard
John Snow’s cholera study of 1854:
Snow (1855), see also MacMahon and Pugh (1970)
Water supply of individual houses
Population
1851
Deaths from cholera
Cholera death rate per 1000 population
Southwark & Vauxhall Company
98,862 419 4.2
Lambeth Company 154,615 80 0.5
RDP Statistical Methods in Scientific Research - Lecture 3 22
Choice of controls
Neither subjects nor households could be randomised to water company, but Snow writes:
In the sub-districts enumerated ... the mixing of the supply is of themost intimate kind. The pipes of each company go down all thestreets, and into nearly all of the courts and alleys. A few houses aresupplied by one Company and a few by the other, according to the decision of the owner or occupier when the Water Companies werein active competition. In many cases a single house has a supply different from that on either side. Each company supplies both richand poor, both large houses and small: there is no difference either in the condition or occupation of the persons receiving the water of the different companies.
In other words: nearly as good as randomisation
RDP Statistical Methods in Scientific Research - Lecture 3 23
Decide on the objectives of your investigation exploratory, hypothesis testing, estimation?
Identify the units of observation, the factors of interest both those under your control and those outside it
Determine the responses to be collected from each unit of observation
Work out how the data will be analysed when they have been collected
Determine an appropriate sample size (next lecture) Write a protocol for your study, recording both the
considerations above and the relevant details from your own subject (and any ethical considerations)
Check with your supervisor and with a statistician
3.4 General design considerations