The following lecture has been approved for University Undergraduate Students This lecture may contain information, ideas, concepts and discursive anecdotes that may be.

Download Report

Transcript The following lecture has been approved for University Undergraduate Students This lecture may contain information, ideas, concepts and discursive anecdotes that may be.

The following lecture has been approved for
University Undergraduate Students
This lecture may contain information, ideas, concepts and discursive anecdotes
that may be thought provoking and challenging
It is not intended for the content or delivery to cause offence
Any issues raised in the lecture may require the viewer to engage in further
thought, insight, reflection or critical evaluation
Calculating
Sample Sizes
for
Research
Dr. Craig Jackson
Senior Lecturer in Health Psychology
Faculty of Health
www.hcc.uce.ac.uk/craig_jackson
Keep it simple
“Some people hate the very name of statistics but.....their power of
dealing with complicated phenomena is extraordinary. They are the
only tools by which an opening can be cut through the formidable
thicket of difficulties that bars the path of those who pursue the science
of man.”
Sir Francis Galton, 1889
How Many Make a Sample?
How Many Make a Sample?
“8 out of 10 owners who expressed a preference, said their cats
preferred it.”
How confident can we be about such statistics?
8 out of 10?
80 out of 100?
800 out of 1000?
80,000 out of 100,000?
Multiple Measurement of small sample
25 cell clusters
26
22 cell clusters
25
24
24 cell clusters
23
22
21
21 cell clusters
20
Total
Mean
SD
= 92 cell clusters
= 23 cell clusters
= 1.8 cell clusters
It all depends on the size of your needle
Small samples spoil research
N
Age
IQ
N
Age
IQ
N
Age
IQ
1
2
3
4
5
6
7
8
9
10
20
20
20
20
20
20
20
20
20
20
100
100
100
100
100
100
100
100
100
100
1
2
3
4
5
6
7
8
9
10
18
20
22
24
26
21
19
25
20
21
100
110
119
101
105
113
120
119
114
101
1
2
3
4
5
6
7
8
9
10
18
20
22
24
26
21
19
25
20
45
100
110
119
101
105
113
120
119
114
156
Total
Mean
SD
200
20
0
1000
100
0
Total
Mean
SD
216
21.6
± 4.2
1102
110.2
± 19.2
Total
Mean
SD
240
24
± 8.5
1157
115.7
± 30.2
Background on Surveys
• Large-scale
• Quantitative
• Can be descriptive
(“2% of women think they are beautiful”)
• Can be inferential
(“Significantly more single women think they’re beautiful than married women do”)
• Done with a sample of patients, respondents, consumers, or professionals
• Differences between any groups assessed with hypothesis testing
Important that sample size must be large enough to detect any
such difference if it truly exists
Importance of Sample Size
• “Forgotten” in many studies
• Little consideration given
• Appropriate sample size needed to confirm / refute hypotheses
• Small samples far too small to detect anything but the grossest difference
• Non-significant results are reported as “significant” – Type 2 error
• Too large a sample – unnecessary waste of (clinical) resources
• Ethical considerations – waste of patient time, inconvenience, discomfort
• Essential to make assessment of optimal sample size before starting
investigation
Qualitative studies need to sample wisely too…
Asian GPs’ attitudes to ANP
Objective:
To determine attitudes to ANP among Asian doctors in East Birmingham PCT
Method:
Send invitation to 55 Asian GPs (Approx 47% of East Birmingham PCT)
Intends to interview (30mins) with first 20 GPs who respond
Sample would be 36% of Asian GPs – and only 17% of GPs in PCT
Severely Biased Research (and ethically dodgy too)
Have Some Consideration – “The Good”
#1 Pulmonary Valve Replacement on Biventricular Function
following Tetralogy of Fallot
Q. How many participants will be recruited? How many of these participants
will be in a control group?
A. “Power analyses have been undertaken based on previous data provided
by Hazekamp et al. (2001). A sample size of 18 in each group will have
95% power to detect a difference in right-ventricular end-diastolic volume
of 78ml (the difference between preoperative mean of 292ml and the
postoperative mean of 214ml) assuming the common standard deviation
is 62ml and using a two-group t-test with a 5% two-sided significance
level.”
Have Some Consideration – “The Bad”
#2 Survey of knowledge and Attitudes regarding
ADHD in Adults among Specialist Adult Psychiatrists
It is a cross sectional questionnaire survey to assess the current
knowledge and attitudes regarding ADHD in Adults amongst ALL General
and Specialist Adult Consultants, Specialist Registrars and Staff-grade /
Associate Specialist Doctors in Birmingham and Solihull
Q. How many participants will be recruited? How many of these participants
will be in a control group?
A. “100.”
Have Some Consideration – “The Ugly”
#3 The Sepsis Study
This is a cross sectional study which will be conducted using a postal
questionnaire with a follow-up reminder letter to non-responders. The sample
will be taken from patients who have been admitted to the ITU department for
severe sepsis or septic shock between Feb 1st 2004 and Aug 1st 2004. Patients
will be over the age of 18 and will have spent at least one day on ITU. The
questionnaire will be a standard health related quality of life questionnaire.
Patients will be contacted by letter a maximum of two times. The patients’
personal details will be stored on a database kept in hospital to maintain
patient confidentiality. Names will not be published in the written report. The
database should highlight any patients who are deceased and obviously
questionnaires will not be sent to the addresses.
Have Some Consideration – “The Ugly”
#3 The Sepsis Study
Q. How many participants will be recruited? How many of these participants
will be in a control group?
A. “Between 30 and 60.”
Hypothesis testing
All about 2 types of errors
Hi Men perform better than women
Ho Men perform no better than women
Imagine: actual data really shows no difference between sexes
Decide to accept Ho
Decide to reject Ho
Ho true
Correct decision
Type 1 error
(false positive)
probability α
Ho false
Type 2 error
(false negative)
Correct decision
probability β
Errors in hypothesis testing
Type 1 errors “False positive”
Occurs if null-hypothesis rejected when it should be accepted
e.g. a “significant result” obtained when null hypothesis is in fact true
Probability of making Type 1 error denoted as “α”
Type 2 errors “False negative”
Occurs if null-hypothesis accepted when it should be rejected
e.g. a non-significant result obtained when null hypothesis is in fact not true
Probability of making Type 2 error denoted as “β”
Factors affecting Sample Size
Dependent upon 4 inter-related factors 1.
Possible to calculate each one if the other three are known
2.
1.
N=?
3.
4.
1. Power
Probability that study of given size would detect a real
statistically significant difference
Usually between 80% to 90%
.80
.85
.90
Higher power = higher chance of detecting a genuine significant difference
and low chance of making a type 2 error
With high power, can be reasonably sure any non-significant result is genuine
e.g. ok to accept null-hypothesis
2. Minimal Important Size of difference to be detected
• If difference between treatments is large, small samples can produce
significant results
• If difference between treatments is small, larger samples are needed
• Important to know if any differences are expected to be small
• Determine the min. difference between treatments considered clinically
relevant
• Given large enough sample, any difference can be made statistically
significant
Experience & Judgement needed in deciding minimal treatment effect that is
of any value – to justify effort, time and finance involved
2. Minimum Important Difference to be detected (MID)
Bronchodilator & Chronic Bronchitis Example
New bronchodilator causes a real increase in tidal volume
in patients (10ml average)
Standard deviation (natural variation) in tidal volume in this
clinical population is more than 10ml
Given huge sample a significant tidal volume increase in
users could be proved (but this is due to natural variation)
Expensive & Pointless
Such a small (but stat. significant) increase - the drug is of little clinical use
3. Standard Deviation & Variability
±
Larger the SD of 2 groups, relative to CID, then the larger the sample needed
Smaller the SD, the smaller the sample required
Ratio of MID to SD is the “standardized difference” – used in calculating
sample sizes
Estimated SD
Estimate of SD may not be available
1. Pilot study
2. Begin trial and estimate SD from initial patients
3. Use SD found in previous trials
4. Use SD found in similar patients / circumstances in other literature
4. Significance Level
• Significance level (α) important bearing on sample size required
P
• Relationship between significance level (α) and the chance of making type 2
error (β)
• Smaller significance level (e.g. P=0.01 rather than P=0.05) requires larger
sample size to avoid type 2 error
• As nominated significance level gets smaller, so does chance of type 2 error
• Significance level of P=0.05 implies a type 2 error will occur in every 20 trials
5 out of 100 studies will make type 2 errors - - purely by chance. Acceptable
Prob. of type 2 error should be approx. 4 times sig. level chosen e.g.
α =5% then power =80%
α =1% then power =95%
Calculating Sample Size
Sample size calculations available for all study designs, trials, and data types
e.g. categorical data, continuous data, means, proportions, multiple groups,
paired samples, unpaired samples, equal / unequal sized groups
Calculations are complex but easily done with a PC and www
Statistician helpful
(if s/he can communicate clearly!)
Two approaches for us non-statisticians
1. Altman’s Normogram
2. Internet
Altman’s Normogram
0.995
Standardized difference = Min. important difference
Standard deviation
0.0
0.99
0.1
0.98
0.2
0.3
N
0.97
0.96
0.95
0.4
0.90
0.5
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.6
0.7
0.8
0.9
1.0
1.1
1.2
0.10
0.05
Power
Example Calculation – Effects of Pesticide Study
IQ survey, concerning workers exposed to pesticides
What we already know…
Mean IQ score is 100 points
SD is ± 10 points e.g. Normal IQ= 90-110
What we need to do….
a) Decide on CID. A difference of 11 IQ points seems clinically important to me
b) Calculate Standardized Difference = Min Important Difference 11 = 1
Standard Deviation
10
c) Use Altman’s Normogram to observe N
0.995
0.0
0.1
N
0.99
0.98
0.3
0.97
0.96
0.95*
0.4
0.90*
0.5
0.85*
0.80*
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
=
11
10
0.2
0.6
1.1
Standardized difference = Min. important difference
Standard deviation
Altman’s Normogram - Effects of Pesticide Study
0.7
0.8
0.9
1.0
1.1
1.2
0.10
0.05
Power
2. Electronic Calculation of Sample Size
Not covered in most stats packages
e.g. SPSS, Statistica
Many sites available
Real time calculation
Hyperstat by David M Lane
www.davidmlane.com
Other additional software
e.g. Xlstat.com
Summary of Sample Size & Power
Correct sample size helps avoid type I & type II errors
A correct study has balance of four factors
Power (no less than .80)
Bigger = Better study
Min. clinical difference (effective difference)
Bigger = Better study
Standard deviation (variability)
Smaller = Better study
Significance level (0.05)
Smaller = Better study
Looking for big differences much easier than smaller differences