michael festing - medicres world congress 2011

137
1 The design and statistical analysis of laboratory The design and statistical analysis of laboratory animal experiments animal experiments An interactive program aimed at helping scientists to improve the design of their animal experiments and reduce the number of animas which they use. The main menu is designed to be worked through sequentially, but you are free to dip in wherever you want. © Michael Festing This disk may be copied for personal use but must not be used for commercial purposes and must not be sold [email protected]

Upload: medicres

Post on 16-Apr-2017

1.071 views

Category:

Health & Medicine


2 download

TRANSCRIPT

Page 1: Michael Festing - MedicReS World Congress 2011

11

The design and statistical analysis of laboratory animal The design and statistical analysis of laboratory animal experimentsexperiments

An interactive program aimed at helping scientists to improve the design of their animal experiments and reduce the number of animas which they use.

The main menu is designed to be worked through sequentially, but you are free to dip in wherever you want.

© Michael Festing

This disk may be copied for personal use but must not be used for commercial purposes and must not be sold

[email protected]

Page 2: Michael Festing - MedicReS World Congress 2011

22

Home page

1. Why bother?

2, Types of experiment

3. The experimental unit

4. “A good experiment..”

5. Avoiding bias

6. Power and sample size

8. Strains of mice and rats

9. Experimental designs

10. Factorial experiments

11. Statistical analysis

12. Presenting your results

14. Test yourself7. Controlling variability

13. The ARRIVE guidelines

15. Summary of main points

Page 3: Michael Festing - MedicReS World Congress 2011

33

Why bother? Why bother?

Because there is a good chance that you will:1. Improve the quality of your science2. Get published in journals with a higher

impact3. Save yourself time when doing your

research4. Use fewer animals5. Save money 6. Afford to do more experiments & buy better

apparatus

This section introduces you to the ethical aspects of animal experimentation (the “3Rs”) and then shows you the results of some surveys of published papers which suggest that there is scope for improving animal research.

A black and tan rabbit

Page 4: Michael Festing - MedicReS World Congress 2011

44

Principles of Humane Experimental TechniquePrinciples of Humane Experimental Technique

In 1959 two British scientists, Bill Russell and Rex Burch (right) wrote “The Principles of Humane Experimental Technique” in 1959, in which they introduced the “3Rs”:” These provide a useful background against which we can assess each individual experiment.

Wherever possible live animals should be replaced by non-sentient or less sentient alternatives such as cell cultures, lower forms of life or even mathematical models.

Some people claim that you can’t replace animals in this way. But there are many examples where this has been possible. Rabbits and mice used to be used to assay batches of insulin, but now this can be done chemically. Monoclonal antibodies used to be grown as ascites tumours in mice, but this is now done in-vitro. Even in toxicity testing there are number of tests, such as the Ames test for mutagenesis which, although they do not completely replace the use of animals, at least partially replace them so that fewer are used.

Replacement:

Russell,W.M.S. and R.L.Burch. 1959. The principles of humane experimental technique. Special Edition, Universities Federation for Animal Welfare. Potters Bar, England.

Page 5: Michael Festing - MedicReS World Congress 2011

55

Principles of Humane Experimental TechniquePrinciples of Humane Experimental Technique

If the use of animals can not be avoided, then pain, distress or lasing harm should be minimised. Anaesthesia and analgesia should be used where appropriate. In procedures which result in death, human end-points should be used so that the animals are painlessly killed rather than being allowed to die in pain. Tumours should not be allowed to grow to an excessive size.The animals should be protected from disease and be provided with an enriched environment where they have sufficient space to be able to behave naturally. Food and water should only be withdrawn for strictly limited periods. Social animals such as mice and rats should be housed with other animals.

Refinement:

Russell and Burch

Page 6: Michael Festing - MedicReS World Congress 2011

66

Principles of Humane Experimental TechniquePrinciples of Humane Experimental Technique

Investigators should use the minimum number of animals consistent with achieving the objectives of the study. This involves:1. The development of a research strategy with clearly defined objectives that can be achieved with the available resources2. Choice of a suitable animal model including genotype (inbred/outbred, mutant, genetically modified) if using rodents3. Well designed experiments which use neither too many animals so that resources are wasted nor too few animals so that important effects are missed4. The correct statistical analysis of the results, including summary statistics such as means and standard deviations as well as indicators of uncertainly such as significance levels and confidence intervals.

Reduction (the main subject of this program)

Note that there is a possible conflict between Reduction and Refinement. Fewer animals are needed if the response to a treatment is greater. But giving a higher “dose” (or the equivalent) may involve more pain.

Page 7: Michael Festing - MedicReS World Congress 2011

77

Test yourselfTest yourself

A scientist improves the design of an experiment so that it gives more significant results. Which of the 3Rs does this count as?

Replacement

Refinement

Reduction

Routine sterilisation of all materials entering an animal house helps prevent the entry of disease-causing micro-organisms. What does this count as?

Replacement

Refinement

Reduction

A neuro-scientist finds that she can use rats instead of non-human primates for a project. What does this count as?

Replacement

Refinement

Reduction

Feedback

Page 8: Michael Festing - MedicReS World Congress 2011

88

FeedbackFeedback

Reducing the number of animals used, or getting more information out of an experiment both count as “Reduction” because in the latter case there will be fewer false negative results, so the research should progress more rapidly.

Preventing a disease from entering the animal house stops the animals getting sick, and therefore it counts as a Refinement. However, because the animals will be more uniform and because diseased animals may give the wrong results, it also counts as “Reduction”.

Rats are regarded as less sentient than non-human primates, so this counts as a Replacement. If she could do the work using Drosophila or C. elegans, that would be a further replacement..

Page 9: Michael Festing - MedicReS World Congress 2011

99

Survey of statistical quality of published papersSurvey of statistical quality of published papers

In the opinion of the statistician:

61% Would have required statistical revision had they been seen before publication

5% had such serious errors such that the conclusions were not supported by the data

30% had deficiencies in design of the studies including: Failure to randomise, inappropriate size, heterogeneity of subjects

and possible bias 45% had deficiencies in the statistical analysis including:

The use of sub-optimal methods and errors in calculation 33% had deficiencies in presentation of the results including:

Unexplained omission of data and inappropriate statistical methods

Surveys of published papers suggest that there is ample scope for improving the quality of animal experimentation. This one (McCance, 1995 Aust. Vet. Journal. 72:322) commissioned by the editors found that:

Page 10: Michael Festing - MedicReS World Congress 2011

1010

A meta-analysis of 44 randomised controlled A meta-analysis of 44 randomised controlled animal studies of fluid resuscitationanimal studies of fluid resuscitation

Only 2 said how animals had been allocated (i.e. whether they had been randomised)

None had sufficient power to detect reliably a halving in risk of death, a clinically relevant en point

There was substantial scope for bias in the experimental designs There was substantial heterogeneity in results, due to method of

inducing the bleeding As a result the odds ratios were impossible to interpret The authors queried whether these animal experiments made any

contribution to human medicine

Roberts et al 2002, BMJ 324:474

If humans or animals lose a large proportion of their blood they go into a state of shock. These studies used animal models to find ways of reducing mortality. But a meta-analysis of 44 such studies found that:

Page 11: Michael Festing - MedicReS World Congress 2011

1111

Poor agreement between animal and human responsesPoor agreement between animal and human responses

Intervention Human results Animal results (meta-analysis)

Agree?

Corticosteroids for head injury

No improvement Improved nurological outcomen=17

No

Antofibrinolytics for surgery

Reduces blood loss

Too little good quality datan=8

No

Thrombolysis with TPA for acute ischemic stroke

Reduces death Reduces death but publication bias and overstatement (n=113)

Yes

Perel et al (2007) BMJ 334:197-200

In this study the authors looked at six interventions where the human outcome was known and then did a meta-analysis of all the animal papers on the same topic to see if the results were in agreement.

First three interventions. Next page for rest of results

Page 12: Michael Festing - MedicReS World Congress 2011

1212

Poor agreement between animal and human responsesPoor agreement between animal and human responses

Intervention Human results Animal results (meta-analysis)

Agree?

Tirilazad for stroke

Increases risk of death

Reduced infarct volume and improved behavioural score n=18

No

Corticosteroids for premature birth

Reduces mortality Reduces mortality n=56 Yes

Bisphosphonates for osteoporosis

Increase bone density

Increase bone density n=16 Yes

Perel et al (2007) BMJ 334:197-200

In this study the authors looked at six interventions where the human outcome was known and then did a meta-analysis of all the animal papers on the same topic to see if the results were in agreement.

Three more interventions

Over all, in three cases the response in animals and humans differed. Was this poor experimental design or were the models inadequate? This is not known, but the authors stated that in many cases the designs seemed to be inadequate.

Page 13: Michael Festing - MedicReS World Congress 2011

1313

A survey of a random sample of 271 papers A survey of a random sample of 271 papers involving live mice, rats or non-human primates involving live mice, rats or non-human primates

foundfound

Of the papers studied: 5% did not clearly state the purpose of the study 6% did not indicate how many separate experiments were

done 13% did not identify the experimental unit 26% failed to state the sex of the animals 24% reported neither age not weight of animals 4% did not mention the number of animals used 0% justified the sample sizes used 35% which reported numbers used, these differed in the

materials and methods and the results sections etc.

Kilkenny et al (2009), PLoS One Vol. 4, e7824

Over-all conclusion

Most papers reported their results inadequately. None justified the numbers they used, and in many cases the design of the experiments and/or the statistical analysis were inadequate.

Page 14: Michael Festing - MedicReS World Congress 2011

1414

Test yourselfTest yourself

In the survey published in the Aust. Vet. J. what proportion of papers did the statistician think had defects in the design of the experiments? (click ?)

10-20% 21-30% 31-40% >40%

What proportion did he think would have required statistical revision had he seen them before publication?

10-20% 21-30% 31-40% 41-50% 51-60% >60%

In the survey of 271 papers, what proportion gave a justification for the sample size which they used?0-20% 21-30% 31-40% >40%

Page 15: Michael Festing - MedicReS World Congress 2011

1515

Click picture for main menu

Page 16: Michael Festing - MedicReS World Congress 2011

1616

There are several There are several types types of experiment which you might use. These of experiment which you might use. These are:are:

Pilot study/experimentThe aim is to study the logistics of a proposed experiment and to obtain preliminary information. These experiments are usually small, and results should be treated with caution. It is probably best not to publish them.They may provide an estimate of the standard deviation which can later be used in a “power analysis” to determine sample size, but because of small sample size that estimate may be inaccurate.

Exploratory experimentThe aim is to provide data to generate new hypotheses.Typically, these experiments may “work” or “not work”.There are often many outcomes (characters) measured .Statistical analysis may be problematical due to false positive results and data snooping (looking at the results then doing a test of those which seem most interesting). As a result, the p-values may not be correct.However this sort of experiment can be useful at the start of a new project.

Page 17: Michael Festing - MedicReS World Congress 2011

1717

Types of experimentTypes of experiment

Experiments may be set up just to estimate means or standard deviations. “Uncontrolled” experiments (where there is no comparison between groups) are sometimes done. An LD50 test is an example. Regression and correlation studies may be done to estimate the relationship between variables (e.g. time and weight as in a growth curve).

Formal hypothesis stated a priori. The double-blinded, randomised placebo-controlled experiment is the gold standard. p-values must be correct. It is this type of experiment which is explored in more detail here.

Other types of study

Confirmatory experiments

Page 18: Michael Festing - MedicReS World Congress 2011

1818

A Hairless mouse. Click picture to return to main menu

Page 19: Michael Festing - MedicReS World Congress 2011

1919

The experimental unit isThe experimental unit is

“The smallest division of the experimental material such that any two experimental units can receive different treatments”.

It is the unit of randomisation and of statistical analysis to compare groups.

The animals are all housed in one cage but the treatment is given by injection. Any two can receive different treatments, so the animal is the experimental unit and “N” (the total number of subjects) is 8

Page 20: Michael Festing - MedicReS World Congress 2011

2020

The experimental unitThe experimental unit

Here the animals are housed two per cage and the treatment is given in the feed or water. The two animals can not have different treatments.

What do you think is “N”, the total number of experimental units in this case?

2? 4? 8?

Page 21: Michael Festing - MedicReS World Congress 2011

2121

Here are two tanks each with seven fish. The treatment is given in Here are two tanks each with seven fish. The treatment is given in the water, and two fish have diedthe water, and two fish have died

What is N, the total number of experimental units ?

One

Two

Seven

fourteen

Page 22: Michael Festing - MedicReS World Congress 2011

2222

It is sometimes possible to do within-animal experimentsIt is sometimes possible to do within-animal experiments . .

In a crossover experiment an animal could be given a treatment for a period, then rested and given a different treatment for a period. In this case the experimental unit is an animal for a period of time. It is assumed that the treatment doesn’t alter the animal, so it has to be very mild.

In this experiment animals are given four treatments, sequentially in random order. What do you think is N? 3? 4? 12?

Page 23: Michael Festing - MedicReS World Congress 2011

2323

Another within-animal experimentAnother within-animal experiment

The animal has had it’s back shaved and four treatments have been applied topically in random locations. In this case, N=12.

Page 24: Michael Festing - MedicReS World Congress 2011

2424

Teratology: mother treated, young measuredTeratology: mother treated, young measured

Mothers, not the pups, are the experimental unit because pups from the same mother can not receive different treatments

N=2n=1

Page 25: Michael Festing - MedicReS World Congress 2011

2525

A black and tan fancy mouse. Click picture for main menu

Page 26: Michael Festing - MedicReS World Congress 2011

2626

A well designed experiment should (summary):A well designed experiment should (summary):

1. Have a clear specification of the aims of the experiment.The hypothesis to be studied needs to be clearly stated before planning the experiment.It would be a serious error to look at the results of the experiment and then adjust the hypothesis to fit them!

2. Be unbiasedThere should be no systematic differences between the treated and control groups apart from the effects of the treatmentBias may result in false positive results when the effects of some other factor are assumed to be due to the treatmentBias is avoided by correct identification of the experimental unit, blinding, and by randomisation

2. The experiment should be powerfulIf the treatment really has an effect, there should be a high chance that it can be detectedExperiments which lack power have too many false negative resultsPower depends on sample size, control of variability and sensitivity of the subject

Page 27: Michael Festing - MedicReS World Congress 2011

2727

A well designed experiment should:A well designed experiment should:

4. Have a wide range of applicabilityWhere possible the extent to which a result can be generalised across strains, diets, environments or techniques should be known.An experiment where the results can only be replicated in some animal houses but not in others lacks generalityThe range of applicability is explored using factorial designs

5. Experiments should be simpleThey should not be so complex that mistakes are made or they are impossible to interpret.Clearly written protocols should be used

6. It should be possible to statistically analyse the result of an experiment.The statistical analysis and the experiment should be planned at the same time.An investigator should never start an experiment without knowing how it is going to be analysed

Page 28: Michael Festing - MedicReS World Congress 2011

2828

Test yourselfTest yourself

How can you increase the power of an experiment (there may be more than one correct answer)?

By:

Using factorial designs

Increasing sample size

Better randomisation

Avoiding bias

Better statistical methods

Controlling variability

Page 29: Michael Festing - MedicReS World Congress 2011

2929

Click picture for main menu

Page 30: Michael Festing - MedicReS World Congress 2011

3030

Avoiding biasAvoiding bias

Bias is avoided by:

1. Correct selection of the experimental unit (as discussed previously)

2. Randomisation of the experimental units to the treatment groups in a method appropriate to the type of experiment.

3. Randomisation of the order in which measurements are made and the animals are housed

4. “Blinding” and the use of coded samples to ensure that the investigator or other staff can not easily influence the outcome of the experiment.

Page 31: Michael Festing - MedicReS World Congress 2011

3131

RandomisationRandomisation

Why do we randomise?

Because it ensures that there can be no systematic differences between the treatment groups

Randomisation is easy using a spread sheet such as EXCEL and all good statistical packages provide ways of putting numbers or letters in random order. The next page shows how it might be done using EXCEL or another spread sheet.

Page 32: Michael Festing - MedicReS World Congress 2011

3232

Randomisation of 12 animals to three treatments (A-C) using EXCELRandomisation of 12 animals to three treatments (A-C) using EXCEL

Original =rand() Sorted on =rand() Animal number

A 0.527 A 0.067 1A 0.100 A 0.100 2A 0.067 A 0.122 3A 0.122 C 0.210 4B 0.665 B 0.248 5B 0.875 C 0.265 6B 0.478 B 0.478 7B 0.248 A 0.527 8C 0.210 C 0.628 9C 0.628 B 0.665 10C 0.265 B 0.875 11C 0.895 C 0.895 12

1. The treatment designations A-C were put in the first column2. A random number was put in the second one (as “values”)3. The columns were then sorted on the random number column to

give column 3 in random order. The animal numbers are then added

4. In this case the first three animals will be assigned to A, the 4th. To C etc.

Sometimes a random order doesn’t look very random, such as when the first three animals (here) all receive treatment A.

But use this sort of method and you won’t go far wrong.

Page 33: Michael Festing - MedicReS World Congress 2011

3333

RandomisationRandomisation

Treatment Random number Animal(randomised) now sorted number A 0.067 1 A 0.100 2 A 0.122 3 C 0.210 4 B 0.248 5 C 0.265 6 B 0.478 7 A 0.527 8 C 0.628 9 B 0.665 10 B 0.875 11 C 0.895 12

How should the animals be caged?

A AA C B etcSingle animal/cage

AX

AX

AX

CX

BX

etcSingle with companion

A,A,A,C B,C,B,AA,A,A,C B,C,B,AA,A,A,C C,B,B,C

Several/cage at random

A,B,C A,B,C A,B,C A,B,CRandomised block design

Page 34: Michael Festing - MedicReS World Congress 2011

3434

Number of animals per cageNumber of animals per cage

There is no correct answer to the numbers of animals housed per cage. It depends on species and the nature of the experiment.

With rats and mice single housing may be stressful. But male mice may fight, depending on the strain and husbandry conditions.

Sometimes, in order to avoid stress, expensive animals (those fitted with telemetry apparatus, for example) can be housed with a companion which is not part of the experiment.

Group housing poses problems if treatment is given in the food or water as the cage is then the experimental unit unless sophisticated apparatus is used so that each animal can have a different diet. This is sometimes done with rats and farm animals.

Group housing may also be a problem if drug treatments are involved as rats and mice are coprophageous so control animals may consume metabolites of the test compound if the animals of different treatment groups are housed together.

Finally, it is not a good idea to house all the controls in one cage, all of treatment 1 in a second cage etc. as there can be “cage effects” due to social interactions which could seriously bias the results (e.g. if all the controls are fighting, but the treated animals are not).

Page 35: Michael Festing - MedicReS World Congress 2011

3535

BlindingBlinding

We usually have some idea of what we would like to find in our experiments. So it is better, where possible, to use coded samples so that we do not bias our results by favouring (often inadvertently) one or more of the treatment groups.

This is particularly important when scoring histological sections or measuring behaviour.

The next slide shows the consequences of failing to randomise and/or blind a study

Page 36: Michael Festing - MedicReS World Congress 2011

3636

Failure to randomise and/or blind leads to false positive resultsFailure to randomise and/or blind leads to false positive results

Blind/not blind odds ratio 3.4 (95% CI 1.7-6.9)

Random/not random odds ratio 3.2 (95% CI 1.3-7.7)

Blind Random/ odds ratio 5.2 (95% CI 2.0-13.5)not blind random

290 animal studies were scored for blinding, randomisation and whether the outcome was positive or negative outcome, as defined by authors(Babasta et al 2003 Acad. emerg. med. 10:684-687)

An odds ratio of one would imply that blinding or randomisation was not associated with the outcome of an experiment. These positive odds ratios imply that on average studies which were not blinded and/or randomised produced excessive numbers of false positive results.

In other words, studies where there was no blinding or randomisation were unreliable. They give too many false positive results.

Page 37: Michael Festing - MedicReS World Congress 2011

3737

Test yourselfTest yourself

Randomisation is used to ensure that the means of each group are identical

We randomise our animals so that they won’t fight so much

Randomisation ensures that each experimental unit has an equal probability of being assigned to a particular treatment group

Randomisation is the only way to avoid bias

Blinding is used so that other people can not copy our data

Blinding helps us to avoid unintentionally biasing our results

Page 38: Michael Festing - MedicReS World Congress 2011

3838

C57BL/6 mice like to explore their environment. Click picture for main menu

Page 39: Michael Festing - MedicReS World Congress 2011

3939

Power and Sample sizePower and Sample size

It is important not to use too many animals (or other experimental units) in an experiment because it costs money, time and effort, and it is unethical.

Conversely, if too few animals are used the experiment may be unable to detect a clinically or scientifically important response to the treatment. This also wastes resources and could have serious consequences, particularly in safety assessment.

We need to avoid making either of these mistakes

Page 40: Michael Festing - MedicReS World Congress 2011

4040

Minimising statistical errorsMinimising statistical errors

Experimental conclusion

State of nature Accept null hypothesis

Reject null hypothesis

Null hypothesis true

Correct conclusion Type I or error

Null hypothesis false

Type II or error Correct conclusion

The null hypothesisIn a controlled experiment the aim is usually to compare two or more means (or sometimes medians or proportions). We normally set up a “null hypothesis” that there is no difference between the means, and the aim of our experiment is to disprove that null hypothesis.However, as a result of inter-individual variability we may make a mistake. If we fail to find a true difference, then we have a false negative result, also known as a type II or error. Conversely, if we think that there is a difference when in fact it is just due to chance, then we have a false positive, Type I, or error. These are show in the` table below

Page 41: Michael Festing - MedicReS World Congress 2011

4141

Power analysis and the control of statistical errorsPower analysis and the control of statistical errors

We can control type I errors because we can estimate the probability that the means could differ to a given degree knowing the sample sizes and the degree of variability (and making some assumptions about the distribution of the data).

If it is highly unlikely that they came from the same population, we reject the null hypothesis and assume that the treatment has had an effect.

The probability of a type I error is usually we set it at 0.05, or 5%. For every 100 experiments we would expect, on average five type I errors to be made.

We don’t usually set it much lower than this because that will increase the probability of a type II error.

Page 42: Michael Festing - MedicReS World Congress 2011

4242

Power analysis and the control of statistical errorsPower analysis and the control of statistical errors

Type II errors are more difficult to control. False negative results occur when there is excessive variation (“noise”) or there is only a small response to the treatment (a low “signal”). We can specify the probability of a type II error or the statistical power (one minus the type II error) if we use a power analysis.

There is a mathematical relationship between the six variables discussed in the next two slides such that if five of them are specified or fixed, the sixth cam be` estimated.

Page 43: Michael Festing - MedicReS World Congress 2011

4343

Signal Effect size of scientific interest

(or actual response)

Chance of a false positive result.

Significance level (0.05?)

Sidedness of statistical test (usually 2-sided)

Power of theExperiment (80-90%?)

Noise (SD)Variability of the

experimental material

Sample size

Power analysis: the variablesPower analysis: the variables(More details on the next slide)(More details on the next slide)

Page 44: Michael Festing - MedicReS World Congress 2011

4444

Variables involved in a power analysisVariables involved in a power analysis1. The effect size of scientific interest (the signal)

This is the magnitude of response to the treatment likely to be of scientific or clinical importance. It has to be specified by the investigator. Alternatively, if the experiment has already been done it is the actual response (difference between treated and control means)

2. The variability among experimental units (the noise)This is the standard deviation of the character of interest. It has to come from a previous study or the literature as the experiment has not yet been done

3. The power of the proposed experimentThis is 1-where is the probability of a type II error. This also has to be specified by the investigator. It is often set at 0.8 to 0.9 (80 or 90%)

4. The alternative hypothesisThe null hypothesis is that the means of the two groups do not differ. The alternative hypothesis may be that they do differ (two sided), or that they differ in a particular direction (one sided)

5. The significance levelAs previously explained, this is usually set at 0.05

6. The sample sizeThis is the number in each group. It is usually what we want to estimate. However, we sometimes have only a fixed number of subjects in which case the power analysis can be used to estimate power or effect size.

Page 45: Michael Festing - MedicReS World Congress 2011

4545

The signal/noise ratio is:

(Control mean)-(treated mean)

pooled standard deviation

This is also known as the “standardised effect size” and “Cohen’s d”.

Most statistical packages provide power calculations and there are several web sites that will do them. However, the graph in the next page is probably sufficiently accurate for most people, given the uncertainty in deciding how large an effect is going to be of biological importance, and the fact that the estimate of the standard deviation may not be all that accurate.

The standardised effect size or signal/noise ratioThe standardised effect size or signal/noise ratio

Page 46: Michael Festing - MedicReS World Congress 2011

4646

Sample size as a function of signal/noise ratioSample size as a function of signal/noise ratio

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2

102030405060708090

100110120130

Signal to noise ratio

Sam

ple

size

Sample size as a function of the signal/noise ratio in a two-sample t-test with a 5% significance level and a two-sided test. Red circles are for a power of 90%, triangles for a power of 80%.

Note that a sample size of about 17-23 in each group, depending on power, is needed to detect a signal/noise ratio of 1.0.

A sample size of about 9-11 in each group, depending on power, is needed to detect a signal/noise ratio of 1.4

Signal/noise ratios of less that 0.05 require large numbers of animals in each group

Page 47: Michael Festing - MedicReS World Congress 2011

4747

Power analysis softwarePower analysis software

Most modern statistical packages will do power analysis calculations for the two-sample situation. Some, such as “nQuery Advisor” will also do the calculations for more complex situations. There are also a number of web sites which will do the calculations for you (do a Google search for “statistical power calculations”).

The R statistical package is a command-driven free package used by professional statisticians. The command, below, generated the output, below right, for the random dogs (see example, next slides), using the signal/noise ratio 0.56 with a 90% power and a 5% significance level. Note that we do not need “n” to five decimal places!

power.t.test(delta=0.56, sd=1, power=0.9, sig.level=0.05)

Two-sample t test power calculation n = 67.98649 delta = 0.56 sd = 1 sig.level = 0.05 power = 0.9 alternative = two.sided NOTE: n is number in *each* group

Page 48: Michael Festing - MedicReS World Congress 2011

4848

An exampleAn example::

A vet wants to compare the effect on blood pressure of two anaesthetics for dogs under clinical conditions. He has some preliminary data. The dogs are unsexed healthy animals: weight 3.8 to 42.6 kg mean systolic BP of 141 (SD 36) mm Hg

Assume that:1. a difference of 20 mmHg or more would be of clinical importance (a clinical not a statistical decision).2. a significance level of of 0.05, 3. a power of 90% 4. and a 2-sided t-test,Then the signal/noise ratio would be 20/36 = 0.56From the graph on the previous page the required sample size is about 80 dogs/group.

Page 49: Michael Festing - MedicReS World Congress 2011

4949

A different vet has some beaglesA different vet has some beagles

Male Beagles weighing 17-23 kg, Mean BP of 108 (SD 9) mm Hg.

Assuming a 20mm difference between groups would be of clinical importance (as before)

With the same assumptions as previous slide:

Signal/noise ratio = 20/9 = 2.22

Referring again to the graph:Required sample size 6/group (although it can not be read very accurately off the graph)

Page 50: Michael Festing - MedicReS World Congress 2011

5050

Summary for two sources of dogs: aim is to be able to detect a 20mmHg change in blood Summary for two sources of dogs: aim is to be able to detect a 20mmHg change in blood pressurepressure

Type of dog SDev Signal/noise Sample %Power (n=8) size/gp(1) (2)

Random dogs 36 0.56 68 18Male beagles 9 2.22 6 98

(1) Sample size: 90% power(2) Power, Sample size 8/group (this can not be read off the graph)

Assumes =5%, 2-sided t-test and effect size 20mmHg

Conclusion: It would not be sensible to do the experiment with the random dogs. Either an investigator should assume that Beagles can represent “Dogs” or the experiment could be done using several breeds, but using a factorial design, discussed later.

Page 51: Michael Festing - MedicReS World Congress 2011

5151

The resource equationThe resource equation

A power analysis is not always possible.

1. If lots of characters are being measured it may not be clear which one is the most important.

2. There may be no estimate of the standard deviation,

3. In fundamental research it may be impossible to specify an effect size likely to be of scientific importance.

4. Experiments may be complex with many treatment groups and possible interactions.

0

2

4

6

8

10

12

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Degrees of freedom (E)

Info

rmat

ion

Diminishing returns set in rapidly as the total number of subjects increases. Adding units up to E=10 is good value but increasing E beyond about 20 provides little extra information.

An alternative is the “Resource Equation”` method. This depends on the law of diminishing returns.

E= (Total number of experimental units)-(number of treatment groups)

E should be between 10 and 20

Page 52: Michael Festing - MedicReS World Congress 2011

5252

Resource equation exampleResource equation example

You decide to do an experiment with four treatment groups (a control and 3 dose levels) and eight animals per group.

E= 32 – 4 = 28. So this is unnecessarily large.

With six animals per group E=20, which is acceptable

Control Low dose Mid dose High dose

Each rectangle represents a single experimental unit

Page 53: Michael Festing - MedicReS World Congress 2011

5353

A mutant Dwarf mouse and wild-type litter mate. Click picture for main menu

Page 54: Michael Festing - MedicReS World Congress 2011

5454

Controlling variationControlling variation

35 45 55Weight

Body weight of mice housed 1, 2, 4 or 8 per cage

Mice/cage

1 SD=5.8

2 SD=3.9

4 SD=3.2

8 SD=2.9

Chvedoff et al (1980) Arch.Toxicol. Suppl 4:435

If variation can be reduced the signal/noise ratio goes up and either sample size can be reduced, power can be increased or a smaller response can be detected.

The plot shows that mice housed singly are more variable than those housed in pairs or groups, although they weigh slightly more on average.

Page 55: Michael Festing - MedicReS World Congress 2011

5555

Assume you want to do an experiment to see whether a specified drug treatment affects body weight in mice, with individual mice being the experimental unit.

You plan to compare treated and a control means and consider that if the two means differ by 4g or more (the signal) this would be of biological interest. You plan to use a two-sided t-test with a significance level of 0.05. Should you house your mice singly or in pairs? (you rule out having more per cage)

The consequences of inter-individual variabilityThe consequences of inter-individual variability

Mice/cage Mean SD (noise) Signal/noise Number needed/group 1 46.0 5.8 4/5.8=0.8630 2 44.7 3.9 4/3.9=1.28 14

Assuming that the response (signal) is not affected by number per cage you would only need half the number of animals if they were to be housed in pairs

Page 56: Michael Festing - MedicReS World Congress 2011

5656

Controlling genetic variabilityControlling genetic variability

Strain Mean SD Sig/noise Group size* Power**A/N 48 4 1.0 23 86BALB/c 41 2 2.0 7 99C57BL/He 33 3 1.3 13 98C3HB/He 22 3 1.3 13 98SWR/HeN 18 4 1.0 23 86Stock CFW 48 12 0.3 191 17Swiss 43 15 0.2 97 13

* To detect a 4 min. change in the mean (2-sided) with=0.05, power = 90%** to detect a 4 min. change in the mean with 20 mice/group

Data from Jay 1955 Proc Soc. Exp Biol Med 90:378

The data below shows the mean and standard deviation (N=25-47) of sleeping time in five inbred strains and two outbred stocks of mice under hexobarbital anaesthetic.

Note the much greater variability in the outbred stocks. This substantially reduces the signal/noise ratio (assuming an effect size of 4 minutes), and means much larger sample sizes are needed.

Page 57: Michael Festing - MedicReS World Congress 2011

5757

Variation in kidney weight in 58 groups of ratsVariation in kidney weight in 58 groups of rats

0

10

20

30

40

50

60

70

80

90

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57

Sample number

Varia

bilit

y MycoplasmaOutbredF1F2

(re-drawn from Gartner,K. (1990), Laboratory Animals, 24:71-77.)

This study shows the variability of kidney weight in 58 groups of rats (N=approx 30 in each group).

Groups have been ranked in order of variability which is expressed as a percentage.

Some groups were affected with Mycoplasma pulmonis causing chronic respiratory disease (in red), some were outbred, some F1 hybrid and some F2 hybrids.

The effect of these sources of variation are shown on the next slide

Page 58: Michael Festing - MedicReS World Congress 2011

5858

Variation, sample size and powerVariation, sample size and power

Factor Type Std.Dev Signal/ noise

Sample size

Power

Genetics

F1 hybrid 13.5 1.48 10 87

F2 hybrid 18.4 1.09 15 63Outbred 20.1 0.99 18 55

Disease

Mycoplasma free 18.6 1.08 15 53

With Mycoplasma 43.3 0.23 76 14

Suppose the aim of an experiment is to find out whether a drug affects the weight of the kidneys in rats. We can use a power analysis to find out how many rats of each type shown on the previous page would be needed.Assume that we want to be able to detect a 20% change in kidney weight (either way), we want a power of 80%, a significance level of 5%, and we have data on the variability of each group. The results are shown in the table below.The table also shows the power of the experiment to detect a 20% change if the sample size is fixed at 10 animals per group with all other assumptions the same.

Page 59: Michael Festing - MedicReS World Congress 2011

5959

Controlling variation: ConclusionControlling variation: Conclusion

Four examples:

The random dogs versus beagles in the previous section

Housing mice singly or in groups,

Sleeping time under anaesthetics,

Kidney weight in rats of various types

All show that increased variability reduces the signal/noise ratio so larger sample sizes are needed to detect the effect of a treatment.

This will cost money, time and effort, and it is also unethical not to control such variation wherever possible.

Uncontrolled variation in almost any controlled experiment is “bad news”

Page 60: Michael Festing - MedicReS World Congress 2011

6060

A hooded rat. Click for main menu

Page 61: Michael Festing - MedicReS World Congress 2011

6161

76% of animals used in research in the UK in 2008 were mice or rats76% of animals used in research in the UK in 2008 were mice or rats

But there are lots of types of these species. What are they all and what are their properties?

The main types are:

Inbred strains*

Outbred stocks*

Mutants

Genetically modified strains (not discussed here)

* Outbreds are known as “stocks”, inbreds as “strains”

Page 62: Michael Festing - MedicReS World Congress 2011

6262

Outbred stocks of rats and mice: Outbred stocks of rats and mice:

“Sprague-Dawley” and “Wistar” rats “Sprague-Dawley” and “Wistar” rats “Swiss”, “CD-1” and “MF-1” mice “Swiss”, “CD-1” and “MF-1” mice

They can change rapidly in characteristics due to selection, inbreeding and random genetic driftThey are “genetically undefined”. Nothing is known about the genotype of any individualStock names such as “Sprague-Dawley” (SD) have no genetic meaning (no genetic markers to define them)

Each animal genetically different But colonies with same name will also differ genetically to some extent

But they are vigorous, cheap and prolific and are widely used in research

Page 63: Michael Festing - MedicReS World Congress 2011

6363

Outbred stocksOutbred stocks

0

10

20

30

40

5060

70

80

90

100

1 3 5 7 9 11 13 15 17 19 21 23 25

Sample number

Perc

ent r

espo

nder

s

Percent responders to a synthetic polypeptide in successive samples of about 30 outbred SD rats from the same commercial breeder over a period of 18 months.

Note that this is not just sampling variation. The high & low responding rats must have come from different colonies.

Seven inbred strains gave consistent results (100% responders or non-responders)

The variability of outbred stocks (compared with inbred strains) leads to lower signal/noise ratios and less powerful experiments

DNA fingerprint shows genetic heterogeneity

Page 64: Michael Festing - MedicReS World Congress 2011

6464

Outbred stocksOutbred stocks

Geneticists use outbred stocks only when they have no alternative, or for a few genetic studies. For example, an outbred stock can be used as a base population for a selective breeding experiment. More recently they are sometimes used in gene association experiments where the genotypes of many individuals is recorded at many gene loci to see if there are associations with a disease or response to an experimental treatment. But these are specialised (and expensive) studies.

For the vast majority of work geneticists recognise that they should control the genetic background, and that means using inbred strains.

Some scientists attempt to justify the use of outbred stocks on the grounds that “humans are outbred”. But humans and animals differ in many ways. We don’t insist on using animals weighing 70kg on the grounds that humans weight about that. And why do we so often use albino animals? Possibly the reason that this is the easiest way of making the animals all look the same, and good scientists know that they should control variability if they want powerful experiments

Page 65: Michael Festing - MedicReS World Congress 2011

6565

Inbred strainsInbred strainsAll animals in an inbred strain are genetically identical But each strain is different

Produced by >20 generations of brother x sister mating

Genetically stable. Can not be changed by selective breeding

Sublines have arisen as a result of “residual heterozygosity” (the sublines were separated before the strain was fully inbred).

Sublines can also arise as a result of new mutations (relatively rare)

There are >400 inbred strains of mice and 150 inbred strains of rats

Page 66: Michael Festing - MedicReS World Congress 2011

6666

““Derived” inbred strains. Brief details only are given hereDerived” inbred strains. Brief details only are given here

There a number of more specialised strains derived from straight inbred strains. These include:

Coisogenic strains: A pair of strains which differ at only a single genetic locus as a result of a mutation. “Knockout” strains usually fall into this category. They are used in the study of the mutation

Congenic strains: A pair of strains which differ at a single genetic locus plus a section of chromosome. These are produced by backcrossing a mutation or polymorphism to an inbred strain. The length of associated chromosome depends on the number of backcrossing generations.

Recombinant inbred (RI) strains: These are sets of inbred strains developed from an F1 cross between two standard inbred strains. They are used to determine the mode of inheritance of some measured phenotype.

Recombinant congenic (RC) strains. Like RI strains except they are produced from a backcross generation of a cross between two inbred strains.

Page 67: Michael Festing - MedicReS World Congress 2011

6767

Inbred strains: nomenclatureInbred strains: nomenclature

Inbred strains of mice and rats are designated by a code starting with an upper case letter followed by letters and/or numbers

Examples: Rat strains- LEW, F344, BDIX, PVG

Mouse strains DBA, CBA, BALB, C57BL, SJL.

There are some exceptions such as 129P1

Sublines are designated by a slash followed by a code involving letters and/or numbers

Example: Rat – LEW/Ss, F344/N

Mouse: BALB/c, C57BL/10

A code showing the breeder may be appended, e.g. C57BL/6J where the “J” stands for the Jackson Laboratory.

Page 68: Michael Festing - MedicReS World Congress 2011

6868

Inbred strainsInbred strains

Petko M. Petkov et al. Genome Res. 2004; 14: 1806-1811

Genetic similarities in mice based on many genetic markersDNA fingerprints show that within a

strain all animals are genetically identical but strains differ

Page 69: Michael Festing - MedicReS World Congress 2011

6969

Controlling genetic variability (this slide was shown in a previous section)Controlling genetic variability (this slide was shown in a previous section)

Strain Mean SD Sig/noise Group size* Power**A/N 48 4 1.0 23 86BALB/c 41 2 2.0 7 99C57BL/He 33 3 1.3 13 98C3HB/He 22 3 1.3 13 98SWR/HeN 18 4 1.0 23 86Stock CFW 48 12 0.3 191 17Swiss 43 15 0.2 97 13

* To detect a 4 min. change in the mean (2-sided) with=0.05, power = 90%** to detect a 4 min. change in the mean with 20 mice/group

Data from Jay 1955 Proc Soc. Exp Biol Med 90:378

The data below shows the mean and standard deviation (N=25-47) of sleeping time in five inbred strains and two outbred stocks of mice under hexobarbital anaesthetic.

Note the much greater variability in the outbred stocks. This substantially reduces the signal/noise ratio (assuming an effect size of 4 minutes), and means much larger sample sizes are needed.

Controlling the within-group variation using inbred strains increases statistical power or allows fewer animals to be used

Page 70: Michael Festing - MedicReS World Congress 2011

7070

Genetics is important: Twenty two Nobel Prizes since 1960 for Genetics is important: Twenty two Nobel Prizes since 1960 for work depending on inbred strainswork depending on inbred strains

CancermmTV

Transmissibleencephalopathacies/prionsPruisner

Retroviruses, Oncogenes & growth factorsCohen, Levi-montalcini, Varmus, Bishop, Baltimore, Temin

C.C. Little, DBA, 1909Inbred Strains and derivativesJackson Laboratory

monoclonal antibodiesBALB/c miceKohler and Millstein

SmellAxel & Buck

ES cellsEvans,“knockouts”Capecchi, Smithies

Genetics of the MHCSnell

Immunological toleranceMedawar, BurnetH2 restriction, Doherty, Zinkanagelimmune responsesBenacerraf (G.pigs)

T-cell receptorJerneAntibody diversity Tonegawa

Page 71: Michael Festing - MedicReS World Congress 2011

7171

Mutant strainsMutant strains

Clockwise from top left

Hairless mouse, nude mouse obese mouse, Rowett nude rat (with graft of hamster skin), viable yellow mutants, New Zealand nude rat (with skin grafts), dwarf mouse.

These and other mutants are widely used in biomedical research. Refer to textbooks for more details

Page 72: Michael Festing - MedicReS World Congress 2011

7272

Experimental designsExperimental designs

There are a number of formal experimental designs available for use. These include:

1. The completely randomised design. Subjects simply assigned to treatments at random. This is the commonest design in work with laboratory animals.

2. The randomised block design. The experiment is split up into a number of “mini-experiments”. This is for convenience, to have subjects in different treatment groups as similar as possible so as to increase power, to build in some repeatability, or to take account of some natural structure in the material such as litters.

Randomised block designs have several names such as “within-subject”, “repeated measures”, “crossover” or “matched subjects” designs. These depend on the nature of the experimental unit and whether replication is in time or space.

3. Latin square designs. These are used to further balance an experiment in special situations.

Page 73: Michael Festing - MedicReS World Congress 2011

7373

Experimental designsExperimental designs

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

The completely randomised design

Assume that the experimental units were reasonably homogeneous Treatments grey, green, red and orange were assigned at random to experimental units 1-20 using EXCEL exactly as described in section 3.

The fact that 4/5 of the grey treatment are in the first ten and 4/5 orange treatments are in the last ten would not matter in most cases. In short-term experiments it is unlikely that there would be any important variables that affect the first and last subjects differently.

This “completely randomised” design in which subjects are assigned at random to the treatments is simple, can tolerate unequal numbers in each group and is perfectly adequate in many experimental situations.

Page 74: Michael Festing - MedicReS World Congress 2011

7474

Experimental designsExperimental designs

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

The completely randomised design

However, for long term experiments or ones where, for example, those doing the experiments become more skilled, or more fatigued, this may introduce extra variation and bias the results. Also, the animals may not be very homogeneous, or they may have some natural structure, such as coming in litters at different times. It may also be more convenient if the experiment can be split up into smaller, more homogeneous bits which are easier to handle, particularly if the experiment is large.In these cases a “randomised block” design might be more powerful and more convenient.

Page 75: Michael Festing - MedicReS World Congress 2011

7575

The randomised block designThe randomised block design

Block 1 Block 2

Block 3 Block 4

Block 5

Here the experiment has been split up into five homogeneous blocks.

Each block has subjects matched for size.

Each block has exactly one experimental unit of each of the four treatments. Randomisation is done separately within each block

This design has one factor “Treatment” which is a “fixed factor” (i.e. we can do treatment 1 again if we wish) and one factor “Block” which is a random factor (you can’t replicate the effect of block one) in which we have little or no interest.

Page 76: Michael Festing - MedicReS World Congress 2011

7676

The randomised block designThe randomised block design

Block 1 Block 2

Block 3 Block 4

Block 5

Blocks can be done at different times (even weeks apart) and/or housed in different locations. Within each block the subjects can be matched

The main advantages of the RB design are that:

1 .It can deal with heterogeneous material by matching subjects in each block (increasing power).

2 .It is often more convenient to break the experiment down into smaller bits which can then be done more carefully.

The main disadvantages of the RB design are

1 .that it is not very tolerant of missing observations

2 .it should not be done with very small experiments (say less than about 16 experimental units total) because there may be a slight loss of power

Page 77: Michael Festing - MedicReS World Congress 2011

7777

Various types of randomised block designsVarious types of randomised block designs

A Matched pairs design(might be suitable for comparing mutant and wild-type in each litter as it becomes available)

A before & after experiment. But no randomisation is possible (can’t have an after before a before)

Page 78: Michael Festing - MedicReS World Congress 2011

7878

Various types of randomised block designsVarious types of randomised block designs

Time 1 Time 2 Time 3

Animal 1

Animal 2

Animal 3

A crossover or repeated measures** design (within-subject, but experimental unit is an animal for a period of time). Experimental unit is a subject for a period of time.

** The term “repeated measures” means different things to different statisticians. It can also be used to describe an experiment where each individual is measured several times but without receiving a different treatment each time. This is pseudo replication, which needs to be taken into account.

A within-subject design (experimental unit is an area on an animal)

Page 79: Michael Festing - MedicReS World Congress 2011

7979

The Latin square designThe Latin square design

The number of subjects is the number of treatments squared.

This is a 5x5 Latin square. It has five rows, five columns and five treatments (Grey, Red, Yellow, Green, White).

Note that there is one of each treatment in each row and in each column.

It has not yet been randomised. To maintain the layout we randomise whole rows and then whole columns.

It has one fixed factor (Treatment) and two random factors (Rows and Columns).

We would use it if there are two factors such as day of the week (represented as columns) and time of the day (rows) which may influence the outcome, and we want these balanced out. Latin squares with more than 7 treatments can

become too large, and those with fewer than four are too small. However, small ones (as small as 2x2) can be replicated.

Page 80: Michael Festing - MedicReS World Congress 2011

8080This was a wild one in India

Page 81: Michael Festing - MedicReS World Congress 2011

8181

Factorial designsFactorial designs

A 2x2 Factorial design

Treated Control

E=16-4 = 12

Factorial designs are common in research involving laboratory animals.

The design on the right has two factors, the treatment (Control versus Treated) and the colour (Blue versus Green). This might represent the two sexes, or two strains or two diets or any other factor of possible interest.

The aim is usually to see whether that other factor influences the results.

This is a 2x2 factorial because there are two factors each at two levels. The Resource Equation method of sample size determination is shown here. E is more than 10 and less than 20 so size is probably OK.

Page 82: Michael Festing - MedicReS World Congress 2011

8282

Factorial designsFactorial designs

A 2x2 Factorial design

Treated Control

E=16-4 = 12

Factorial designs are efficient because the effect of the treatment is still determined by eight treated and eight control animals but there is the additional information of Blue versus Green (also 8 vs 8) averaging across treatments, and it can be seen whether the response to the treatment is the same in the Blue and Green groups (known as the interaction effect).

This is a 2x2 factorial because there are two factors each at two levels. The Resource Equation method of sample size determination is shown here. E is more than 10 and less than 20 so size is probably OK.

Page 83: Michael Festing - MedicReS World Congress 2011

8383

Factorial experimentsFactorial experiments

Control Dose 1 Dose 2A 3x3 Factorial design

Diet 1

Diet 2

Diet 3

E= (36-9) = 27

Sample size is a bit too large. Three animals per group might be better

Factorial experiments can have any number of factors and each can be at any number of levels.

Here there are two factors (Ttreatment with levels Control, Dose 1 and Dose 2 and Diet with levels 1,2,3)

Levels can be either qualitative such as diets 1, 2, and 3 or quantitative such as dose 0, 1000 and 2000mg/kg.

Page 84: Michael Festing - MedicReS World Congress 2011

8484

Factorial experimentsFactorial experiments

Control Dose 1 Dose 2A 3x3x2 Factorial design

Diet 1

Diet 2

Diet 3

E= (36-18) = 18

Sample size is about right with 2 per smallest sub-group

Here there are three factors represented by Dose (3 levels), Diet (3 levels) and sex (Male and Female (say), represented by the patterns).

Note that the smallest sub-group can be quite small with this type of design because in calculating the means we average over the other groups

Page 85: Michael Festing - MedicReS World Congress 2011

8585

Effect of chloramphenicol on RBC counts (2000Effect of chloramphenicol on RBC counts (2000g/kg) in mice of two g/kg) in mice of two strainsstrains

Strain Control Treated Strain meansBALB/c 10.10 8.95

10.08 8.45 9.73 8.6810.09 8.89 9.37

C57BL 9.60 8.82 9.56 8.24 9.14 8.18 9.20 8.10 8.86

TreatmentMean 9.69 8.54

A real example.We want to know:1. Does treatment have an effect on

RBC counts2. Do strains differ in RBC counts3. Do strains differ in their response

(interaction)Clearly the treatment reduces red blood cell (RBC) counts. There is no overlap between treated and control individuals. Also, C57BL seems to have lower counts than BALB/c. Whether or not there is an interaction can best be seen graphically.

Page 86: Michael Festing - MedicReS World Congress 2011

8686

Plot of Means

Chloramphenicol2$Treat

mea

n of

Chl

oram

phen

icol

2$R

BC

8.5

9.0

9.5

10.0

c t

Chloramphenicol2$Strain

BALB/cC57BL

A plot of the means shows that the reduction is the same for each strain.

A 2-way analysis of variance is needed to show statistical significances. This is shown in Section 11. It finds that the treatment and strain differences are statistically significant (unlikely to be due to sampling variation), but the interaction is not significant, exactly as we determined just be looking at the data.

Effect of chloramphenicol on RBC counts (2000mg/kg) in mice of two strainsEffect of chloramphenicol on RBC counts (2000mg/kg) in mice of two strains

Page 87: Michael Festing - MedicReS World Congress 2011

8787

Effect of chloramphenicol (2000mg/kg) on RBC countEffect of chloramphenicol (2000mg/kg) on RBC count

Strain Control Treated Strain meansC3H 7.85 7.81

8.77 7.218.48 6.968.22 7.10 7.80

CD-1 9.01 9.187.76 8.318.42 8.478.83 8.67 8.58

Treatment means 8.42 7.96

Here are two different mouse strains. In this case the treatment seems to have reduced the RBC counts in C3H but not in CD-1.

Statistical analysis shows a highly significant interaction effect (see section 11). This can be seen in a plot of the means

Page 88: Michael Festing - MedicReS World Congress 2011

8888

Plot of Means

Chlorampehicol$Trt

mea

n of

Chl

oram

pehi

col$

RB

C

7.4

7.6

7.8

8.0

8.2

8.4

8.6

C T

Chlorampehicol$Strain

C3HCD-1

In this case the CD-1 outbred stock of mice was resistant to chloramphenicol at this dose level, but C3H has responded strongly with a highly significant interaction effect

Effect of chloramphenicol on RBC counts (2000mg/kg) in mice of two strainsEffect of chloramphenicol on RBC counts (2000mg/kg) in mice of two strains

Page 89: Michael Festing - MedicReS World Congress 2011

8989

A velvet rabbit. Click for main menu

Page 90: Michael Festing - MedicReS World Congress 2011

9090

Statistical analysisStatistical analysis

You will need statistical software with good graphics.

EXCEL is not recommended for statistical analysis, although it is useful for data entry prior to reading it into you chosen package. In some cases EXCEL graphics is quite useful.

If you have access to one of the larger commercial packages such as MINITAB, SPSS, SAS, or Graphpad Prism then use it. But allow time to get to know how it works and how to interpret the output. Better still take a course if one is available.

There are a number of open source programs available. R is widely used by professional statisticians. It is command driven and difficult to learn but there is a front end called “R Commander” (Rcmdr) which is menu driven and a lot easier to use. But still expect to spend time learning how to use it. R and Rcmdr can be down loaded from the CRAN web site.

Page 91: Michael Festing - MedicReS World Congress 2011

9191

Statistical analysisStatistical analysis

It would probably be sensible to get a statistical textbook where the examples are analysed using the statistical package that is available to you. Go to the web site associated with your package and see if they recommend any suitable texts.

This section is “about” the statistical analysis, not “how to do the statistical analysis”

Size matters

The aim of most controlled experiments is to estimate the magnitude of any differences between the means (or less frequently the medians or proportion affected) of the treatment groups for a trait of interest.

The statistical analysis normally estimates the probability that differences of the observed magnitude could have arisen by chance sampling variation. These are the so-called “p-values”.

If it is very unlikely that the differences could have arisen by chance, then they are assumed to be the result of the treatment.

Page 92: Michael Festing - MedicReS World Congress 2011

9292

Statistical analysisStatistical analysis

0 500 1000 1500 2000 25007.

07.

58.

08.

59.

09.

510

.010

.5Dose

RB

C

The first step is to look carefully at the raw data to see whether there any obvious errors and to get a feel for what is happening.

Graphical methods which show individual observations should be used

In this case there is one obvious outlier at the 1000 dose level. It was checked and was not a transcription error so it was not deleted. The statistical analysis was done with it and without it. In fact it made no difference to the obvious conclusion that chloramphenicol reduces red blood cell counts

Red blood cell counts in CBA mice given various dose levels of chloramphenicol

Page 93: Michael Festing - MedicReS World Congress 2011

9393

Statistical analysisStatistical analysis

GP1 GP234 4246 4235 5142 4842 4442 4344 4143 4539 4234 4446 44

Means 40.6 44.2

SDs 4.50 2.96

These are body weights of mice (g) fed different diets. The question is whether the difference in means is due to the effect of the diet, or could it just be due to chance sampling variation? There is quite a bit of variation within each group quantified by the standard deviations (SDs).

What is your guess. Is the difference likely to be due to the effect of the treatment?

SD stands for “standard deviation”. It is a measure of the variability in a group of numbers. It has the same units as the numbers, so in this case it is g.

Page 94: Michael Festing - MedicReS World Congress 2011

9494

Statistical analysisStatistical analysis

GP1 GP234 4246 4235 5142 4842 4442 4344 4143 4539 4234 4446 44

Means 40.6 44.2

SDs 4.50 2.96

1 2

3540

4550

Group

Wei

ght

A plot with individual observations helps. Differences in means seem to depend on about 3-4 individuals.

We need some objective way of reaching a decision of whether this is likely to be due to chance. The p-value for the difference between the two means provides this.

There are several ways of calculating p-values, one of which is to use a two-sample t.test.

Page 95: Michael Festing - MedicReS World Congress 2011

9595

Statistical analysisStatistical analysis

GP1 GP234 4246 4235 5142 4842 4442 4344 4143 4539 4234 4446 44

Means 40.6 44.2

SDs 4.50 2.96

1 2

3540

4550

Group

Wei

ght

The two-sample t-test is shown below. “t” is a “test statistic” which can be used in conjunction with the df (“degrees of freedom”) to estimate the p-value of 0.04114 shown here. In this version of the t-test it is assumed that the variation is the same in each group.

Here the test rejects the null hypothesis that the difference between the means is zero, and it gives a 95% confidence for the true difference of -6.93 to -0.157.

We conclude that there is only a 4% chance that the difference is due to sampling variation and that the difference is “statistically significant at p=0.04” (we usually quote the actual p-value).

Two Sample t-test

data: GP1 and GP2 t = -2.1829, df = 20, p-value = 0.04114alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -6.9334705 -0.1574386

Page 96: Michael Festing - MedicReS World Congress 2011

9696

Statistical analysisStatistical analysis

GP1 GP234 4246 4235 5142 4842 4442 4344 4143 4539 4234 4446 44

Means 40.6 44.2

SDs 4.50 2.96

1 2

3540

4550

Group

Wei

ght

We could also use the ANOVA (analysis of variance) to estimate the p-values. When there are only two groups the ANOVA and t-test are mathematically identical. Both give a p-value of 0.041 as shown below.

For an explanation of the ANOVA table see the next page

One-way ANOVA: Wt versus Gp

Analysis of Variance for Wt Source DF SS MS F PGp 1 69.1 69.1 4.77 0.041Error 20 290.2 14.5Total 21 359.3

Page 97: Michael Festing - MedicReS World Congress 2011

9797

Statistical analysis: The Analysis of Variance (ANOVA)Statistical analysis: The Analysis of Variance (ANOVA)

Analysis of Variance for Wt Source DF SS MS F PGp 1 69.1 69.1 4.77 0.041Error 20 290.2 14.5Total 21 359.3

The source of variation

(Groups, error or residual, total)

Degrees of freedom (n-1)

Quantification of the variation due to each source

Mean square (SS/DF)

The error mean square is the variance (sd squared)

A test statistic like t (actually t2)

P-value

This is the most widely used method of statistical analysis. It is very versatile and is essential for analysing randomised block and factorial designs.

Page 98: Michael Festing - MedicReS World Congress 2011

9898

Statistical analysisStatistical analysis

50-5

2

1

0

-1

-2

Nor

mal

Sco

re

Residual

Normal Probability Plot of the Residuals(response is Wt)

44434241

5

0

-5

Fitted Value

Res

idua

l

Residuals Versus the Fitted Values(response is Wt)

Assumptions

The t-test and the ANOVA are so called “parametric” tests. They depend on three assumptions:

1. That the numbers are independent observations. This depends on correct randomisation of independent experimental units

2. The residuals (deviation of each observation from its group mean) have a normal (bell-shaped) distribution.

3. The variation is the same in each group

These “Residuals diagnostic plots” are used to investigate whether these assumptions hold. The top one shows residuals versus fits (group means). All four corners should be equally filled, as shown.

The bottom one should be a straight line if the residuals have a normal distribution, as is the case here.

Scatter of points should be approximately the same

Points should lie on a straight line

Residuals diagnostic plots

Page 99: Michael Festing - MedicReS World Congress 2011

9999

Statistical analysisStatistical analysis

151050

10

0

-10

Fitted Value

Res

idua

l

Residuals Versus the Fitted Values(response is TumCount)

What if these assumption are not met?

The ANOVA is quite “robust”. Some deviation from the assumptions can be tolerated.

However, if the variation is much greater in the group with a larger mean and the normal plot is not a straight line then the next step is to try a transformation.

Top right shows a plot where the residuals for the smaller counts vary less than those of the larger counts. The bottom plot shows the residuals plots following a transformation X=log(Y+1), where Y is the original value and one has been added to avoid missing data when the count was zero (log of zero is undefined).

An analysis using the transformed data may be more reliable than one using the raw data, although this is a marginal case that may not even need transformation

1.21.11.00.90.80.70.60.50.40.3

0.5

0.0

-0.5

Fitted ValueR

esid

ual

Residuals Versus the Fitted Values(response is LogTums+)

Page 100: Michael Festing - MedicReS World Congress 2011

100100

Statistical analysisStatistical analysis

More than two treatment groups

A t-test is only suitable for comparing two groups.

But an ANOVA can be used with any number of treatment groups (if the assumptions are reasonably well met). It tests the over-all null hypothesis that the differences in means among groups are zero against the alternative hypothesis that they are not zero.

But it can not differentiate between the two situations to the right. In both cases it just gives an over-all p-value.

The most common (but not necessarily the best) way of finding out which groups differ significantly from each other is to use “post-hoc comparisons”. These will be available in all good statistical packages. There are many different ones. Use the ones in your package.

Statisticians would usually rather use orthogonal contrasts, but these are not discussed here.

Experiment 1. Three groups all different

Experiment 2.

Two groups the same, one group different

Page 101: Michael Festing - MedicReS World Congress 2011

101101

Statistical analysisStatistical analysis

050

100150200250300350400450500

1 2 3

Week

Apop

tosi

s sc

ore

Control

CGP

STAU

Randomised blocks and the two-way ANOVA

The aim of the experiment, right, was to determine whether two drugs CGP and STAU affected apoptosis in rat thymocytes (compared with the vehicle control). Each week they humanely killed one rat and prepared three dishes of thymocytes which received one of the three treatments. Apoptosis was scored after incubation for a fixed period.

This was a small randomised block experiment with the blocking factor being Week. Notice the large week-to-week variation, but a similar relationship between groups each week.

Raw data

C CPG STAUWeek 1 365 398 421 Week 2 423 432 459 Week 3 308 320 329 Means 365.3 383.3 403.0

Page 102: Michael Festing - MedicReS World Congress 2011

102102

Statistical analysisStatistical analysis

Two-way ANOVA without interaction

Source DF SS MS F PBlock 2 21764 10882 114.82 0.000Treat 2 2129 1064 11.23 0.023Error 4 379 94Total 8 24272

The experiment needs to be analysed using a 2-way ANOVA without interaction, shown on the right. The over-all test of the null hypothesis that there is no difference among the treatment groups gives a p-value of 0.023. Hence we reject the null hypothesis. Notice that most of the total variation is due to the blocks, but this is of little interest because we know that it is very difficult to get identical absolute measurements each week in such studies.

The SD is the square root of the error mean square (94).

The post-hoc Dunnett’s test shows that STAU but not CPG differs significantly from the control. But this is a very small experiment which lacks power.

Means p-value* Control 365.3 -----CPG 383.3 0.14 STAU 403.0 0.02 SD 9.7* Using Dunnett’s test, a post-hoc test for comparing the means of treated groups with controls (not discussed here).

Page 103: Michael Festing - MedicReS World Congress 2011

103103

Statistical analysisStatistical analysis

Source DF SS MS F PBlock 2 21764 10882 114.82 0.000Treat 2 2129 1064 11.23 0.023Error 4 379 94Total 8 24272

-10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 7.5

0

1

2

3

Residual

Freq

uenc

yHistogram of Residuals

0 1 2 3 4 5 6 7 8 9

-20

-10

0

10

20

Observation Number

Res

idua

l

I Chart of Residuals

Mean=3.16E-14

UCL=20.17

LCL=-20.17

300 350 400 450

-10

0

10

Fit

Res

idua

l

Residuals vs. Fits

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

-10

0

10

Normal Plot of Residuals

Normal ScoreR

esid

ual

Residual Model Diagnostics

The residuals diagnostic plots show that the two assumptions of normality of residuals and homogeneous variances appear to be reasonably well met. (plots top left and bottom right)

Page 104: Michael Festing - MedicReS World Congress 2011

104104

Statistical analysisStatistical analysis

Strain Control Treated Strain meansBALB/c 10.10 8.95

10.08 8.45 9.73 8.6810.09 8.89 9.37

C57BL 9.60 8.82 9.56 8.24 9.14 8.18 9.20 8.10 8.86

TreatmentMean 9.69 8.54

A real example.We want to know:1. Does treatment have an effect on

RBC counts2. Do strains differ in RBC counts3. Do strains differ in their response

(interaction)

This data was shown earlier. It can be analysed using a two-way ANOVA This data was shown earlier. It can be analysed using a two-way ANOVA with interaction.with interaction.

Effect of chloramphenicol ( 200mg/kg) on Red Blood Cell counts in mice of two strainsEffect of chloramphenicol ( 200mg/kg) on Red Blood Cell counts in mice of two strains

The two-way ANOVA (next page) shows that there are significant effects associated with strain and treatment, but no significant interactions (p=0.40)

Page 105: Michael Festing - MedicReS World Congress 2011

105105

Plot of Means

Chloramphenicol2$Treat

mea

n of

Chl

oram

phen

icol

2$R

BC

8.5

9.0

9.5

10.0

c t

Chloramphenicol2$Strain

BALB/cC57BL

A plot of the means shows that the reduction is the same for each strain and the 2-way ANOVA with interaction confirms these findings

Analysis of Variance for RBCs

Source DF SS MS F PStrain 1 1.0661 1.0661 17.15 0.001Treatment 1 5.2785 5.2785 84.92 0.000Strain*Treatment 1 0.0473 0.0473 0.76 0.400Error 12 0.7459 0.0622Total 15 7.1377

The effect can be stated in the original units as 9.69-8.54 =1.15 units or it can be expressed in standard deviation units. The standard deviation is the square root of the error mean square in the ANOVA, 0.0622=0.2494. So the response was 1.15/.2494=4.61 standard deviations. (But not this should not be done with smaller sample sizes, which need a correction factor)

Statistical analysisStatistical analysis

Page 106: Michael Festing - MedicReS World Congress 2011

106106

Effect of chloramphenicol (2000mg/kg) on RBC countEffect of chloramphenicol (2000mg/kg) on RBC count

Strain Control Treated Strain meansC3H 7.85 7.81

8.77 7.218.48 6.968.22 7.10 7.80

CD-1 9.01 9.187.76 8.318.42 8.478.83 8.67 8.58

Treatment means 8.42 7.96

Here are two different mouse strains. In this case the treatment seems to have reduced the RBC counts in C3H but not in CD-1.

Statistical analysis shows a highly significant interaction effect (see section 11). This can be seen in a plot of the means

Statistical analysisStatistical analysis

Page 107: Michael Festing - MedicReS World Congress 2011

107107

Plot of Means

Chlorampehicol$Trt

mea

n of

Chl

oram

pehi

col$

RB

C

7.4

7.6

7.8

8.0

8.2

8.4

8.6

C T

Chlorampehicol$Strain

C3HCD-1

In this case the CD-1 outbred stock of mice was resistant to chloramphenicol at this dose level, but C3H has responded strongly with a highly significant interaction effect

Effect of chloramphenicol on RBC counts (2000mg/kg) in mice of two strainsEffect of chloramphenicol on RBC counts (2000mg/kg) in mice of two strains

Statistical analysisStatistical analysis

Source Df Sum Sq Mean Sq F value Pr(>F) Trt 1 0.82356 0.82356 4.4302 0.057057 . Strain 1 2.44141 2.44141 13.1330 0.003489 **Trt:Strain 1 1.47016 1.47016 7.9084 0.015686 * Residuals 12 2.23077 0.18590

Note that this ANOVA was done using the R statistical package which doesn’t show totals and heads the p-value column as Pr(>F), but numerically it gives the same results. Note the interaction is significant at p=0.01569.

In this situation the response should be expressed separately for each strain

Page 108: Michael Festing - MedicReS World Congress 2011

108108A black hooded rat. Click for main menu

Page 109: Michael Festing - MedicReS World Congress 2011

109109

Presenting your resultsPresenting your results

The aim of your scientific paper or report is to communicate your results as clearly and concisely as possible.

Sufficient information should be provided to enable somebody else to repeat the experiments. The ARRIVE guidelines, given in the next section can be used as a check-list to ensure that nothing has been forgotten.

This section gives some general advice on the presentation of the numerical results.

Decimal places

Means, Medians and standard deviations should normally be given to no more than three significant digits, e.g. 13.3, 0.0124.

Page 110: Michael Festing - MedicReS World Congress 2011

110110

Presenting your resultsPresenting your results

Standard deviation, Standard Error or Confidence interval?1. A standard deviation SD) is used to describe individual

variability. 2. A standard error (SE or better SEM) is used to describe the

variability of means.3. A confidence interval (CI) is used to indicate the range within

which we can be reasonably sure the true mean lies.4. In all three cases it is important to know the numbers in each

group.5. Rather than using a ± it is better to use a designation such as

“Mean = 10.1 (SD 1.5, n=8)” or “Mean = 10.1 (SE 1.5, n=8)” so that there can be no confusion between standard error and standard deviation.

6. When two means are being compared, the size of the difference between them should be quoted, with a confidence interval.

7. The difference could also be expressed as the Standardised Effect Size, SES (The difference divided by the pooled SD). However this is biased upwards if “n” is very low (say less than 10). The SES is a ratio without units and can be used to compare different characters.

Page 111: Michael Festing - MedicReS World Congress 2011

111111

Presenting your resultsPresenting your results

When medians are being quoted, the 25 and 75% centiles can be given.

Where means are tabulated, they should be shown in columns rather than rows as this makes it easier to compare them.

If the means have been compared using an analysis of variance, then the assumption will have been made (and tested using residuals plots) that the variation is the same in each group. In this case a pooled standard deviation should be quoted rather than showing separate SDs for each mean.

When an analysis of variance has been used to analyse the results, and F-value should be quoted with numerator and denominator degrees of freedom, as well as a p-value (e.g F3, 9 = 3.91, p= 0.049).

Page 112: Michael Festing - MedicReS World Congress 2011

112112

Presenting your resultsPresenting your results

Page 113: Michael Festing - MedicReS World Congress 2011

113113

Page 114: Michael Festing - MedicReS World Congress 2011

114114

Page 115: Michael Festing - MedicReS World Congress 2011

115115

Presenting your resultsPresenting your results

Your papers should be written in such a way that other scientists can replicate your results

The ARRIVE guidelines shown in this section can be used as a check-list to make sure that you have not forgotten anything.

The points that they make may seem obvious, but many of the errors discussed in section 1 are the result of poorly written papers.

Other errors are the result of poorly designed experiments. If you have got this far in this document, your experiments should have been reasonable well designed.

Remember that a picture is said to be worth a thousand words, but make sure that it is worthwhile. Presenting means as bar diagrams may be good, but in some cases it is just a waste of space.

Page 116: Michael Festing - MedicReS World Congress 2011

116116

Page 117: Michael Festing - MedicReS World Congress 2011

117117

The ARRIVE GuidelinesThe ARRIVE Guidelines

Concerns about the quality of research involving animals were expressed in Section 1 “Why bother”. Anyone who has worked through this presentation should have a better idea of how to design and analyse an animal experiment, although the discussion of the statistical analysis needs to be supported by a good statistical textbook and software.

But writing the paper is a major bottleneck, and it doesn’t always get the attention that it deserves. Information vital for assessing the importance and reliability of a paper is often missing.

The ARRIVE (Animals in Research: Reporting In Vivo Experiments) Guidelines are based on the CONSORT statement for randomised clinical trials (see references, right) published in 2001, and now widely used when reporting clinical research.

The following pages list the 20 items which need to be taken into account when writing a paper involving the use of laboratory animals.

Moher D, Schulz KF, Altman DG for the CONSORT Group (2001) The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 357: 1191–1194.

Kilkenny,C., W.J.Browne, I.C.Cuthill, M.Emerson, and D.G.Altman. 2010b. "Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research." PLoS.Biol. 8:e1000412.

Page 118: Michael Festing - MedicReS World Congress 2011

118118

The ARRIVE guidelines (re-formatted from the original publication)

1. TITLE Provide as accurate and concise a description of the content of the article as possible.

2. ABSTRACT Provide an accurate summary of the background, research objectives (including details of the species or strain of animal used), key methods, principal findings, and conclusions of the study.

INTRODUCTION3. Background. • a. Include sufficient scientific background (including

relevant references to previous work) to understand the motivation and context for the study, and explain the experimental approach and rationale.

• b. Explain how and why the animal species and model being used can address the scientific objectives and, where appropriate, the study’s relevance to human biology.

4. Objectives. Clearly describe the primary and any secondary objectives of the study, or specific hypotheses being tested.

Page 119: Michael Festing - MedicReS World Congress 2011

119119

The ARRIVE guidelines

METHODS5. Ethical statement • Indicate the nature of the ethical review permissions, relevant

licences (e.g. Animal [Scientific Procedures] Act 1986), and national or institutional guidelines for the care and use of animals, that cover the research.

6. Study design For each experiment, give brief details of the study design, including:

• a. The number of experimental and control groups.• b. Any steps taken to minimise the effects of subjective bias when

allocating animals to treatment (e.g., randomisation procedure) and when assessing results (e.g., if done, describe who was blinded and when).

• c. The experimental unit (e.g. a single animal, group, or cage of animals). A time-line diagram or flow chart can be useful to illustrate how complex study designs were carried out.

Page 120: Michael Festing - MedicReS World Congress 2011

120120

The ARRIVE guidelines

METHODS (continued)7. Experimental procedures.

For each experiment and each experimental group, including controls, provide precise details of all procedures carried out. For example:

• a. How (e.g., drug formulation and dose, site and route of administration, anaesthesia and analgesia used [including monitoring], surgical procedure, method of euthanasia). Provide details of any specialist equipment used, including supplier(s).

• b. When (e.g., time of day). • c. Where (e.g., home cage, laboratory, water maze).• d. Why (e.g., rationale for choice of specific anaesthetic, route of

administration, drug dose used).

8. Experimental animals• a. Provide details of the animals used, including species, strain, sex,

developmental stage (e.g., mean or median age plus age range), and weight (e.g., mean or median weight plus weight range).

• b. Provide further relevant information such as the source of animals, international strain nomenclature, genetic modification status (e.g. knock-out or transgenic), genotype, health/immune status, drug- or test naive, previous procedures, etc.

Page 121: Michael Festing - MedicReS World Congress 2011

121121

The ARRIVE guidelines

METHODS (continued)9. Housing and husbandry

Provide details of:• a. Housing (e.g., type of facility, e.g., specific pathogen free (SPF); type

of cage or housing; bedding material; number of cage companions; tank shape and material etc. for fish).

• b. Husbandry conditions (e.g., breeding programme, light/dark cycle, temperature, quality of water etc. for fish, type of food, access to food and water, environmental enrichment).

• c. Welfare-related assessments and interventions that were carried out before, during, or after the experiment.

10. Sample size • a. Specify the total number of animals used in each experiment and the

number of animals in each experimental group.• b. Explain how the number of animals was decided. Provide details of

any sample size calculation used.• c. Indicate the number of independent replications of each experiment,

if relevant.

Page 122: Michael Festing - MedicReS World Congress 2011

122122

The ARRIVE guidelines

METHODS (continued)11. Allocating animals to experimental groups• a. Give full details of how animals were allocated to experimental

groups, including randomisation or matching if done.• b. Describe the order in which the animals in the different experimental

groups were treated and assessed.

12. Experimental outcomes • Clearly define the primary and secondary experimental outcomes

assessed (e.g., cell death, molecular markers, behavioural changes).

13. Statistical methods • a. Provide details of the statistical methods used for each analysis.• b. Specify the unit of analysis for each dataset (e.g. single animal, group

of animals, single neuron).• c. Describe any methods used to assess whether the data met the

assumptions of the statistical approach.

Page 123: Michael Festing - MedicReS World Congress 2011

123123

The ARRIVE guidelines

RESULTS14. Baseline data• For each experimental group, report relevant characteristics and health

status of animals (e.g., weight, microbiological status, and drug- or test- naive) before treatment or testing (this information can often be tabulated).

15. Numbers analysed • a. Report the number of animals in each group included in each analysis.

Report absolute numbers (e.g. 10/20, not 50%).• b. If any animals or data were not included in the analysis, explain why.

16. Outcomes and estimation • Report the results for each analysis carried out, with a measure of

precision (e.g., standard error or confidence interval).

17. Adverse events • a. Give details of all important adverse events in each experimental group.• b. Describe any modifications to the experimental protocols made to

reduce adverse events.

Page 124: Michael Festing - MedicReS World Congress 2011

124124

The ARRIVE guidelines

DISCUSSION18. Interpretation/scientific implications• a. Interpret the results, taking into account the study objectives and

hypotheses, current theory, and other relevant studies in the literature.

• b. Comment on the study limitations including any potential sources of bias, any limitations of the animal model, and the imprecision associated with the results.

• c. Describe any implications of your experimental methods or findings for the replacement, refinement, or reduction (the 3Rs) of the use of animals in research.

19. Generalisability/translation. Comment on whether, and how, the findings of this study are likely to

translate to other species or systems, including any relevance to human biology.

20 Funding. List all funding sources (including grant number) and the role of the

funder(s) in the study.

Page 125: Michael Festing - MedicReS World Congress 2011

125125

The ARRIVE GuidelinesThe ARRIVE Guidelines

Page 126: Michael Festing - MedicReS World Congress 2011

126126

Test yourselfTest yourselfQuestion 1

An investigator plans an experiment with a control and treated group, but reduces the numbers in the treated group and increases those in the control group because she fears that the treated animals may experience pain. Is this an example of:

1. Replacement

2. Refinement

3. Reduction

Question 2

Which is the best way to randomise 12 animals all in the same cage to three treatment groups A, B and C?

1. Roll a die and assign the first animal to group A if the die shows 1 or 2, to group B if it shows 3 or 4 or to group C if it shows 5 or 6.

2. Assign the first animal to group A, the next to Group B and the third to group C, and repeat this four times

3. Assign the first four animals to group A, the next four to group B and the last four to group C.

4. Use EXCEL to randomise a column with 4 As, 4 Bs and 4 Cs and assign the animals according to the random sequence

Feedback

Page 127: Michael Festing - MedicReS World Congress 2011

127127

FeedbackFeedback

Question 1.

This is an example of Refinement because the aim is to reduce over-all suffering. Many people also feel that it is better for large numbers of animals to have a mild stress rather than fewer having more severe stress/pain, although this is not the situation here.

Question 2.

Option 1, although random would not result in the same number of animals in each group, so it would not be good

Option 2. This does not assign the animals at random. If the first animals are easiest to catch there would be a tendency for group A to have more easily caught animals than group C.

Option 3. This would be even worse than option 2.

Option 4. This is the best method. It will result in equal numbers per group, and the animals are truly assigned at random

Page 128: Michael Festing - MedicReS World Congress 2011

128128

Test yourselfTest yourselfQuestion 3. Properties of Inbred strains and outbred stocks of mice and rats

Cheaper to buy

Phenotypically more uniform

Genetically more stable and less likely to change

Easier genetic quality control

Most commonly used by toxicologists

Most commonly used by geneticists

Like an immortal clone of genetically identical individuals

Large strain differences

Characteristics may change following selective breeding

Well established and widely used strain nomenclature

Inbred strains Outbred stocks Both

Page 129: Michael Festing - MedicReS World Congress 2011

129129

Test yourselfTest yourself

Relocation

Removable

Replacement

Research

Resolve

Resource

Results

Resurface

Revealed

Rewrite

Radish

Reading

Ready

Real

Relish

Reduction

Refinement

Refreshments

Related

Re-locatable

4. What are the “3Rs” of humane experimental technique?

Page 130: Michael Festing - MedicReS World Congress 2011

130130

Test yourselfTest yourself

Power. Increase it from 80% to 90%

Significance level. Increase it from 0.05 to 0.10

Standard deviation. Decrease it by choosing uniform animals

Alternative hypothesis. Make it one instead of two sided

Effect size. Increase it by increasing the dose

In the Power Analysis method of determining sample size, if you make the changes noted below, how would it alter the group (sample) size needed?

Increased Decreased

Sample size would be:

Page 131: Michael Festing - MedicReS World Congress 2011

131131

Test yourselfTest yourself

An investigator wants to find out whether a drug affects activity of mice when given access to a running wheel over a period of a week. The drug will be given in the food.

He plans to use two strains of mice, three dose levels and both sexes in a factorial design.

Using the Resource Equation method what groups size could be recommended?

2-3

4-5

6-7

8-9

>9

Feedback

Page 132: Michael Festing - MedicReS World Congress 2011

132132

FeedbackFeedback

With three doses, two strains and both sexes there are 12 treatment groups altogether.

With two animals per group E= 24-12 = 12

With three animals per group E=36-12= 24

E should be between about 10 and 20, so 2-3 animals per group would be adequate.

Page 133: Michael Festing - MedicReS World Congress 2011

133133

Page 134: Michael Festing - MedicReS World Congress 2011

134134

Most important pointsMost important points

State clearly the purpose of the study

Explain why you have chosen a particular animal model

Think about the 3Rs in relation to your experiments

Explicitly Identify your experimental unit

Explain how you decided sample size (power analysis, resource equation or fixed by availability)

Explain how the experimental units were randomised to the treatment groups

Use coded samples where possible to blind yourself (and others) to which treatment group a subject belongs

Think about ways of reducing the variability to increase power e.g. optimum/non-stressful housing, freedom from disease

If using rodents, use inbred strains or justify not using them

Page 135: Michael Festing - MedicReS World Congress 2011

135135

Most important points (continued)Most important points (continued)

Choose a suitable experimental design (completely randomised, randomised block, Latin square, split plot etc)

Consider using a factorial design to explore generality of your results

Decide how you are going to do the statistical analysis before starting the experiment, recognising that methods may need to be modified when the results are obtained.

Choose a good statistical package and learn how to use it.

Make use of graphical methods, particularly those showing individual points, to screen and display your results

Consider quoting/displaying the results in standard deviation units (this will also help those doing a meta-analysis)

Page 136: Michael Festing - MedicReS World Congress 2011

136136

Most important points (continued)Most important points (continued)

Learn some statistics (buy a good statistics textbook, take a course on statistics)

Learn to use the analysis of variance

One way (completely randomised design)

Two-way without interaction (randomised block design)

Two-or-more-way with interaction (factorial design)

Be very honest about deleting observations, but try analysis with/without to see if they make any difference

Use the ARRIVE guidelines to ensure that you have not missed anything when writing your paper/thesis

Page 137: Michael Festing - MedicReS World Congress 2011

137137

A fancy guinea-pig, click for main menu