Transcript Slide 1

The design of animal
experiments
Michael FW Festing
c/o Understanding Animal Research, 25 Shaftsbury
Av. London, UK.
[email protected]
1
Principles of Humane Experimental
Technique
(Russell and Burch 1959)
 Replacement
 e.g. in-vitro methods, less sentient animals
 Refinement
 e.g. anaesthesia and analgesia, environmental
enrichment
 Reduction
 Research strategy
 Controlling variability
 Experimental design and statistics
2
A well designed experiment

Absence of bias


High power





Low noise (uniform material, blocking, covariance)
High signal (sensitive subjects, high dose)
Large sample size
Wide range of applicability


Experimental unit, randomisation, blinding
Replicate over other factors (e.g. sex, strain): factorial
designs
Simplicity
Amenable to a statistical analysis
3
The animal as the experimental unit
N=8
n=4
Animals individually treated. May be individually housed or grouped
4
A cage as the Experimental
Unit.
Treated
Control
Treated
Control
N=4
n=2
Treatment in water or diet.
5
An animal for a period of time: repeated
measures or crossover design
N
Animal
4
1
2
4
4
3
Treatment 1
N=12
n= 6
Treatment 2
6
Teratology: mother treated,
young measured
N=2
n=1
Mother is the experimental unit.
7
Failure to identify the experimental unit
correctly in a 2(strains) x 3(treatments) x
6(times) factorial design
ELD group
ELD group
Single cage of 8 mice killed at each time point (288 mice in total)
8
Experimental units must be
randomised to treatments



Physical: numbers on cards. Shuffle and take
one
Tables of random numbers in most text books
Use computer. e.g. EXCEL or a statistical
package such as MINITAB
9
Randomisation
Original Randomised
1
2
1
3
1
3
1
1
2
2
2
1
2
2
2
1
3
3
3
2
3
3
3
1
NB Randomisation should include
housing and order in which
observations are made
10
Failure to randomise and/or blind
leads to more “positive” results
Blind/not blind
odds ratio
3.4 (95% CI 1.7-6.9)
Random/not random
odds ratio
3.2 (95% CI 1.3-7.7)
Blind Random/
not blind random
odds ratio
5.2 (95% CI 2.0-13.5)
290 animal studies scored for blinding, randomisation and
positive/negative outcome, as defined by authors
Babasta et al 2003 Acad. emerg. med. 10:684-687
11
Some factors (e.g. strain, sex) can not be
randomised so special care is needed to ensure
comparability
Six cages of 7-9 mice of each strain: error bars are SEMs
"CBA mice showed greater
variability in body weights than
TO mice..."
Outbred TO (8-12 weeks
commercial)
Inbred CBA (12-16
weeks Home bred)
12
A well designed experiment

Absence of bias


High power





Low noise (uniform material, blocking, covariance)
High signal (sensitive subjects, high dose)
Large sample size
Wide range of applicability


Experimental unit, randomisation, blinding
Replicate over other factors (e.g. sex, strain): factorial
designs
Simplicity
Amenable to a statistical analysis
13
High power: (good chance of detecting the effect
of a treatment, if there is one)
High
= High
= High
= High
Signal/Noise ratio
Standardized effect size
d=|m1-m2|/s
(Difference between means)/SD
Student’s t =( X1-X2)/Sqrt (2S2/n)
14
Power Analysis for sample size and
effects of variation






A mathematical relationship between six variables
Needs subjective estimate of effect size to be detected
(signal)
Has to be done separately for each character
Not easy to apply to complex designs
Essential for expensive, simple, large experiments
(clinical trials)
Useful for exploring effect of variability
A second method “The Resource Equation” is described later
15
Power analysis: the variables
Signal
a) Effect size of scientific interest
or b) actual response
Chance of a false positive
result. Significance level
(0.05)
Sample size
Sidedness of statistical
test (usually 2-sided)
Power of the
Experiment (80-90%?)
Noise
Variability of the
experimental material
16
Group size and Signal/noise
ratio
Bad
140
Power
90%
80%
120
Group size
100
80
Neutral
60
Good
40
20
0
0
0.5
1
1.5
2
2.5
3
Signal/noise
ratio
Effect size (Std.
Devs.)
Assuming 2-sample, 2 sided t-test and 5% significance level
17
Comparison of two anaesthetics for dogs
under clinical conditions
(Vet. Anaesthes. Analges.)
Unsexed healthy clinic dogs,
• Weight 3.8 to 42.6 kg.
• Systolic BP 141 (SD 36) mm Hg
Assume:
• a 20 mmHg difference between
anaesthetics is of clinical
importance,
• a significance level of a=0.05
• a power=90%
• a 2-sided t-test
Signal/Noise ratio 20/36 = 0.56
Required sample size 68/group
18
Power and sample size
calculations using nQuery Advisor
19
A second paper described:
• Male Beagles weight 17-23 kg
• mean BP 108 (SD 9) mm Hg.
• Want to detect 20mm
difference between groups (as
before)
With the same assumptions as
previous slide:
Signal/noise ratio = 20/9 = 2.22
Required sample size 6/group
20
Summary for two sources of dogs: aim is to
be able to detect a 20mmHg change in blood
pressure
Type of dog
SDev Signal/noise
Random dogs 36
Male beagles
9
0.56
2.22
Sample
size/gp(1)
68
6
%Power (n=8)
(2)
18
98
(1) Sample size: 90% power
The
scientific
dilemma:
(2) Power,
Sample size
8/group
With small sample sizes we can not detect an
Assumes a=5%, 2-sided t-test and effect size 20mmHg
important
effect in genetically heterogeneous animals.
We can detect the effect in genetically homogeneous
animals, but are they representative?
21
Variation in kidney weight in
58 groups of rats
90
80
70
Variability
60
Mycoplasma
50
Outbred
40
F1
F2
30
20
10
0
1
5
9 13 17 21 25 29 33 37 41 45 49 53 57
Sample numbe r
Gartner,K. (1990), Laboratory Animals, 24:71-77.
22
Required sample sizes
Factor
Type
Genetics
F1 hybrid
13.5
0.74
30
80
F2 hybrid
18.4
0.54
55
53
Outbred
20.1
0.49
67
46
Mycoplasma
free
18.6
0.54
55
53
With
Mycoplasma
43.3
0.23
298
14
Disease
Std.Dev Signal/
noise*
Sample Power**
size
*signal is 10 units, two sided t-test, a=0.05, power = 80%
** Assuming fixed sample size of 30/group
23
The randomised block design: another
method of controlling noise
Treaments A, B & C
B
C
A
B1
A
C
B
B2
B
A
C
B3
A
C
B
B4
B
C
A
•
•
•
•
•
•
Randomisation is within-block
Can be multiple differences
between blocks
Heterogeneous age/weight
Different shelves/rooms
Natural structure (litters)
Split experiment in time
B5
24
Apoptosis score
A randomised block
experiment
500
450
400
350
300
250
200
150
100
50
0
Control
CGP
STAU
365 398 421
1
Treatment effect p=0.023
(2-way ANOVA)
423 432
459
2
Week
308
320 329
3
25
Analysis of apoptosis data
Analysis of Variance for Score
Source
Block
Treatmen
Error
Total
DF
2
2
4
8
SS
21764.2
2129.6
379.1
24272.9
MS
10882.1
1064.8
94.8
F
114.82
11.23
P
0.000
0.023
26
Residual Model Diagnostics
Normal Plot of Residuals
I Chart of Residuals
20
Residual
Residual
10
0
-10
10
0
Mean=3.16E-14
-10
-20
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
LCL=-20.17
0
1
2
3
4
5
6
7
Normal Score
Observation Number
Histogram of Residuals
Residuals vs. Fits
8
9
10
3
2
Residual
Frequency
UCL=20.17
1
0
-10
0
-10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 7.5
Residual
300
350
400
Fit
450
27
Another method of determining sample size:
The Resource Equation

Depends on the law of diminishing returns
 Simple. No subjective parameters
 Useful for complex designs and/or multiple outcomes
(characters)
 Does not require estimate of Standard Deviation
 Crude compared with Power Analysis
E= (Total number of animals)-(number of groups)
10<E<20 (but give some tolerance)
28
Student's t, 5% critical value
The Resource Equation & Sample Size
12.0
E= (total numbers)-(number of groups)
9.5
10<E<20
7.0
4.5
2.0
0
5
10
15
20
25
30
35
Degrees of freedom
But if experimental subjects are cheap (e.g. multi-well plates, E can be much higher
29
A well designed experiment

Absence of bias


High power





Low noise (uniform material, blocking, covariance)
High signal (sensitive subjects, high dose)
Large sample size
Wide range of applicability


Experimental unit, randomisation, blinding
Replicate over other factors to (e.g. sex, strain) to increase
generality: factorial designs
Simplicity
Amenable to a statistical analysis
30
Factorial designs
Factorial design
Treated Control
E=16-4 = 12
Single factor design
Treated Control
E=16-2 = 14
One variable at a time (OVAT)
Treated Control Treated Control
E=16-2 = 14
E=16-2 = 14
31
Factorial designs
(By using a factorial design)”.... an experimental
investigation, at the same time as it is made more
comprehensive, may also be made more efficient if
by more efficient we mean that more knowledge
and a higher degree of precision are obtainable by
the same number of observations.”
R.A. Fisher, 1960
32
A 4x2 factorial design
Analysed with Student’s t-test: This is not appropriate because:
1. Each test is based on too few animals (n=3-4), so lacks power
2. It does not indicate whether there are strain differences in protein thiol status
3. It does not indicate whether dose/response differs between strains
4. A two-way design should be analysed using a 2-way ANOVA
33
Incorrect statistical analysis leading to
excessive numbers of animals
One experiment or
4 separate experiments?
8 mice per group
8 groups = 64 mice.
E= 64-8 =56
Alternative
3 mice per group:
8 groups
E=24-8 = 16
Saving:40 mice
Formal test of interaction
34
2 (strains) x 4 (Animal units)
factorial
35
Effect of chloramphenicol
(2000mg/kg) on RBC count
Strain Control
C3H
7.85
8.77
8.48
8.22
CD-1
9.01
7.76
8.42
8.83
Tests:
Treated
7.81
7.21
6.96
7.10
9.18
8.31
8.47
8.67
Should not be analysed
using two t-tests
1. Each test lacks power
due to small sample size
2. Will not give a test of
whether strains differ in
response
Use a two-way ANOVA with interaction
1. Do the treatment means averaged across strains differ?
2. Do the strains differ, averaged across treatments
3. Do the two strains respond to the same extent?
36
A 2x2 factorial design with
interaction
Source
strain
Treatment
strain*treat.
Error
Total
DF
1
1
1
12
15
Red bloodcellcount
9
SS
2.4414
0.8236
1.4702
2.2308
6.9659
MS
2.4414
0.8236
1.4702
0.1859
F
13.13
4.43
7.91
P
0.003
0.057
0.016
Pooled variance
CD-1
C3H
8.5
8
7.5
7
6.5
Control
Treated
Control
Strain and treatment
Treated
37
Use of several inbred strains to reduce
noise, increase signal and explore
generality
Effect of chloramphenicol on mouse haematology
Dose of chloramphenicol (mg/kg)
0
500
1000
1500
2000
2500
Outbred
CD-1
8
8
8
8
8
8
CBA
2
2
2
2
2
2
C3H
2
2
2
2
2
2
BALB/c
2
2
2
2
2
2
C57BL
2
2
2
2
2
2
Inbred
Festing et al (2001) Fd. Chem.Tox. 39:375
38
Example of a factorial compared with
a single factor design
Strain
CBA
CBA
C3H
C3H
BALB/c
BALB/c
C57BL
C57BL
WBC
Control Treated
1.90
0.40
2.60
0.20
2.10
0.40
2.20
0.40
1.60
1.30
0.50
1.40
2.30
0.80
2.20
1.10
CD-1
CD-1
CD-1
CD-1
CD-1
CD-1
CD-1
CD-1
3.00
1.70
1.50
2.00
3.80
0.90
2.60
2.30
1.90
1.90
3.50
1.20
2.30
1.00
1.30
1.60
Four inbred strains
One outbred stock
39
WBC counts following chloramphenicol at
2500mg/kg
White blood cell counts
Strain N
CD-1 16
0
2.23
Strain N
0
CBA
4 2.25
C3H
4 2.15
BALB/c 4 1.05
C57BL 4 2.25
Mean 16 1.93
Dose * strain
Signal
Noise
2500 (Difference) (SD) Signal/noise p
1.83
0.40
0.86
0.47
0.38
Signal
Noise
2500 (Difference) (SD)
0.30
1.95
0.34
0.40
1.85
0.34
1.35 (-0.30)
0.34
0.95
1.30
0.34
1.20
0.73
0.34
Signal/noise
p
5.73
5.44
(-0.88)
3.82
2.15
<0.001
<0.001
40
Genetics is important: Twenty two Nobel Prizes since 1960
for work depending on inbred strains
Cell mediated immunity
Immunological tolerance
H2 restriction, immune responses
Medawar, Burnet, Doherty, Zinkanagel
Benacerraf (G.pigs)
Genetics
Snell
ES cells & “knockouts”
Humoral immunity/antibodies
T-cell receptor
Tonegawa, Jerne
monoclonal antibodies
BALB/c mice
Kohler and Millstein
C.C. Little, DBA, 1909
Inbred Strains and derivatives
Jackson Laboratory
Evans, Capecchi, Smithies
Cancer
mmTV
Transmissable
encephalopathacies/prions
Pruisner
Smell
Axel & Buck
Retroviruses, Oncogenes & growth factors
Cohen, Levi-montalcini, Varmus, Bishop, Baltimore, Temin
41
18th Annual Short Course on Experimental
Models of Human Cancer
August 21-30, 2009
Bar Harbor, ME
courses.jax.org
42
Conclusions

Five requirements for a good design







Unbiased (randomisation, blinding)
Powerful (signal/noise ratio: control variability)
Wide range of applicability (factorial designs, common but
frequently analysed incorrectly)
Simple
Amenable to statistical analysis
Mistakes in design and analysis are common
Better training in experimental design would improve
the quality of research, save money, time and
animals
43
44
45