The following lecture has been approved for University Undergraduate Students This lecture may contain information, ideas, concepts and discursive anecdotes that may be.

Download Report

Transcript The following lecture has been approved for University Undergraduate Students This lecture may contain information, ideas, concepts and discursive anecdotes that may be.

The following lecture has been approved for
University Undergraduate Students
This lecture may contain information, ideas, concepts and discursive anecdotes
that may be thought provoking and challenging
It is not intended for the content or delivery to cause offence
Any issues raised in the lecture may require the viewer to engage in further
thought, insight, reflection or critical evaluation
or reading
or watching more TV
or listening to Radio 4
Selecting Samples
Deployment
Allocation
Group Testing
Prof Craig Jackson
Head of Psychology Division
Faculty of Education Law &
Social Sciences
BCU
[email protected]
Keep it simple
“Some people hate the very name of statistics but.....their power of
dealing with complicated phenomena is extraordinary. They are the
only tools by which an opening can be cut through the formidable
thicket of difficulties that bars the path of those who pursue the science
of man.”
Sir Francis Galton, 1889
How Many Make a Sample?
How Many Make a Sample?
“8 out of 10 owners who expressed a preference, said their cats
preferred it.”
How confident can we be about such statistics?
8 out of 10?
80 out of 100?
800 out of 1000?
80,000 out of 100,000?
Multiple Measurement of small sample
25 cell clusters
26
22 cell clusters
25
24
24 cell clusters
23
22
21
21 cell clusters
20
Total
Mean
SD
= 92 cell clusters
= 23 cell clusters
= 1.8 cell clusters
It all depends on the size of your needle
Small samples spoil research
N
Age
IQ
N
Age
IQ
N
Age
IQ
1
2
3
4
5
6
7
8
9
10
20
20
20
20
20
20
20
20
20
20
100
100
100
100
100
100
100
100
100
100
1
2
3
4
5
6
7
8
9
10
18
20
22
24
26
21
19
25
20
21
100
110
119
101
105
113
120
119
114
101
1
2
3
4
5
6
7
8
9
10
18
20
22
24
26
21
19
25
20
45
100
110
119
101
105
113
120
119
114
156
Total
Mean
SD
200
20
0
1000
100
0
Total
Mean
SD
216
21.6
± 4.2
1102
110.2
± 19.2
Total
Mean
SD
240
24
± 8.5
1157
115.7
± 30.2
Background on Surveys
• Large-scale
• Quantitative
• Can be descriptive
(“2% of women think they are beautiful”)
• Can be inferential
(“Significantly more single women think they’re beautiful than married women do”)
• Done with a sample of patients, respondents, consumers, or professionals
• Differences between any groups assessed with hypothesis testing
Important that sample size must be large enough to detect any
such difference if it truly exists
Importance of Sample Size
• “Forgotten” in many studies
• Little consideration given
• Appropriate sample size needed to confirm / refute hypotheses
• Small samples far too small to detect anything but the grossest difference
• Non-significant results are reported as “significant” – Type 2 error
• Too large a sample – unnecessary waste of (clinical) resources
• Ethical considerations – waste of patient time, inconvenience, discomfort
• Essential to make assessment of optimal sample size before starting
investigation
Qualitative studies need to sample wisely too…
Asian GPs’ attitudes to ANP
Objective:
To determine attitudes to ANP among Asian doctors in East Birmingham PCT
Method:
Send invitation to 55 Asian GPs (Approx 47% of East Birmingham PCT)
Intends to interview (30mins) with first 20 GPs who respond
Sample would be 36% of Asian GPs – and only 17% of GPs in PCT
Severely Biased Research (and ethically dodgy too)
Have Some Consideration – “The Good”
#1 Pulmonary Valve Replacement on Biventricular Function
following Tetralogy of Fallot
Q. How many participants will be recruited? How many of these participants
will be in a control group?
A. “Power analyses have been undertaken based on previous data provided
by Hazekamp et al. (2001). A sample size of 18 in each group will have
95% power to detect a difference in right-ventricular end-diastolic volume
of 78ml (the difference between preoperative mean of 292ml and the
postoperative mean of 214ml) assuming the common standard deviation
is 62ml and using a two-group t-test with a 5% two-sided significance
level.”
Have Some Consideration – “The Bad”
#2 Survey of knowledge and Attitudes regarding
ADHD in Adults among Specialist Adult Psychiatrists
It is a cross sectional questionnaire survey to assess the current
knowledge and attitudes regarding ADHD in Adults amongst ALL General
and Specialist Adult Consultants, Specialist Registrars and Staff-grade /
Associate Specialist Doctors in Birmingham and Solihull
Q. How many participants will be recruited? How many of these participants
will be in a control group?
A. “100.”
Have Some Consideration – “The Ugly”
#3 The Sepsis Study
This is a cross sectional study which will be conducted using a postal
questionnaire with a follow-up reminder letter to non-responders. The sample
will be taken from patients who have been admitted to the ITU department for
severe sepsis or septic shock between Feb 1st 2004 and Aug 1st 2004. Patients
will be over the age of 18 and will have spent at least one day on ITU. The
questionnaire will be a standard health related quality of life questionnaire.
Patients will be contacted by letter a maximum of two times. The patients’
personal details will be stored on a database kept in hospital to maintain
patient confidentiality. Names will not be published in the written report. The
database should highlight any patients who are deceased and obviously
questionnaires will not be sent to the addresses.
Have Some Consideration – “The Ugly”
#3 The Sepsis Study
Q. How many participants will be recruited? How many of these participants
will be in a control group?
A. “Between 30 and 60.”
Sampling a Population
Process of selecting units (e.g. people, organisations) from a population
Generalise results to the population
First question should be…
Who do you want to generalize findings to ?
POPULATIONS
The POPULATION
Sampling a Population
A POPULATION
REPRESENTATIVE SAMPLE
(theoretical)
ACCESSIBLE
SAMPLE
(actual)
Are this lot are REPRESENTATIVE of the POPULATION ?
Sampling a Population
Need to do one more thing...
Develop SAMPLING FRAME
(the method for selecting subjects to include in study)
The SAMPLING FRAME acts like crosshairs and allows selection and exclusion of
people into the study
Sampling Frames
Interest lies in Pig Farmers
Choose “Pig Farmers” from the phone book
list of farmers is the SAMPLING FRAME. Call all and see who will take part
“RANDOM-DIGIT-DIALING”
A more selective sampling frame would be:
Criteria
PIG FARMERS
WITH MORE THAN 1 FAMILY CAR
WITH £50,000 TURNOVER p/a
AND HAVE MORE THAN TWO CHILDREN”
1
2
3
4
But how many respondents would such specificity yield?
Specificity versus Generality
Specific
General
pig farmers
OEM
pig farmers
>1 family car
OM
National news
pig farmers
>1 family car
£50,000
>2 children
pig farmers
>1 family car
£50,000
Work & Stress
Rural Psychology
Regional news
Types of Sampling
CONSCRIPTIVE sampling
QUOTA SAMPLING sampling
Ethically unsound
Bias
Favourite of ICM and MORI
Quotas of the population
Efficient
Flaw potential
RANDOM sampling
OPPORTUNISTIC sampling
Theoretically ideal
Costly
Time-consuming
All elements of the population
Desperate measure
Take any subject available
Cheap
Fast
Bias
N of population
Distributions
5’6”
5’7”
5’8”
5’9”
5’10” 5’11”
Height
RANDOM sampling
OPPORTUNISTIC sampling
CONSCRIPTIVE sampling
QUOTA sampling
6’
6’1”
6’2” 6’3”
6’4”
How many makes a sample?
POWER OF STUDY CALCULATION
Statistical method of calculating the number of subjects needed in a project
Based upon…..
Expected variance of subjects’ scores
Useful size of any differences between groups
Significance level (e.g. 5 % or 1 %)
Power level
The larger the differences you are looking for between groups, then the fewer
subjects are needed. Looking for small differences between groups requires
larger numbers of subjects
Specificity and the acceptable N
Jackson’s paradox
Relative population size
As study populations become smaller, acceptable study sample sizes reduce
Population size
Acceptable sample size
General Pop
Working Pop
Specific Pop
Rare Pop
Specificity and the acceptable N
Student
Pop
I.D
Forces yachting training schools
E.M
Companies using stress counselling
S.M
Divers and ear barotrauma
N.O
Solvent exposure in Myanmar
V.W
Routine flu vaccinations
A.F
Dermatitis in hairdressers
S.M
O.H needs of NHS staff
T.R
NIHL in student employees
I.C
Blood tests in British Army pilots
O.Y
Upstream oil company deaths
A.A
Renal colic in flight deck crew
A.C
Hepatitis B in army regulars and territorials
N
300
150
142
80
900
102
23
14
408
161
254
476
indepth
yes
yes
Selection Bias
Gulf War
A&E Violence
Syndrome
C dif
Call
Centres
Sampling properly is Crucial
Samples may be askew
Specialist publications attract a specialist response group
Exists a self-selection bias of those with special interests
Controversial topics, or litigious areas
Depleted Uranium Weaponry
Organophosphate Pesticides
Stress
THIS IS AN INHERENT PROBLEM WITH
HEALTH RESEARCH
COMBAT IT WITH LARGE SAMPLES
AND CLEVER METHODOLOGY
Telecomms
Sampling Methods
Company with 1000 employees
Want to sample 10%
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
How workers are
listed is crucial
Surname
bias
Age
bias
Employee id.
bias
Salary
d.o.b
Sex
bias
bias
bias
Sampling Methods
Non-randomly select the first 100 employees
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
Sampling Methods
Routinely sample 100 employees
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
Every 10th
name is chosen
until reaching
criteria of 100
employees
Sampling Methods
Routinely Random sample 100 employees
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
Rolling a six
sided dice.
Count on x
from the last
name chosen.
There is a
problem…
Sampling Methods
Randomly sample 100 employees
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
Random number
generator
Scratch names
off the list
Blindfold & pin
Darts
Sampling Keywords
POPULATIONS
Can be mundane or extraordinary
SAMPLE
Must be representative
INTERNALY VALIDITY OF SAMPLE
Sometimes validity is more important than generalizability
SELECTION PROCEDURES
Random
Opportunistic
Conscriptive
Quota
Sampling Keywords
THEORETICAL
Developing, exploring, and testing ideas
EMPIRICAL
Based on observations and measurements of reality
NOMOTHETIC
Rules pertaining to the general case (nomos - Greek)
PROBABILISTIC
Based on probabilities
CAUSAL
How causes (treatments) effect the outcomes
Errors in hypothesis testing
Type 1 errors “False positive”
Occurs if null-hypothesis rejected when it should be accepted
e.g. a “significant result” obtained when null hypothesis is in fact true
Probability of making Type 1 error denoted as “α”
Type 2 errors “False negative”
Occurs if null-hypothesis accepted when it should be rejected
e.g. a non-significant result obtained when null hypothesis is in fact not true
Probability of making Type 2 error denoted as “β”
Factors affecting Sample Size
Dependent upon 4 inter-related factors 1.
Possible to calculate each one if the other three are known
2.
1.
N=?
3.
4.
1. Power
Probability that study of given size would detect a real
statistically significant difference
Usually between 80% to 90%
.80
.85
.90
Higher power = higher chance of detecting a genuine significant difference
and low chance of making a type 2 error
With high power, can be reasonably sure any non-significant result is genuine
e.g. ok to accept null-hypothesis
2. Minimal Important Size of difference to be detected
• If difference between treatments is large, small samples can produce
significant results
• If difference between treatments is small, larger samples are needed
• Important to know if any differences are expected to be small
• Determine the min. difference between treatments considered clinically
relevant
• Given large enough sample, any difference can be made statistically
significant
Experience & Judgement needed in deciding minimal treatment effect that is
of any value – to justify effort, time and finance involved
2. Minimum Important Difference to be detected (MID)
Bronchodilator & Chronic Bronchitis Example
New bronchodilator causes a real increase in tidal volume
in patients (10ml average)
Standard deviation (natural variation) in tidal volume in this
clinical population is more than 10ml
Given huge sample a significant tidal volume increase in
users could be proved (but this is due to natural variation)
Expensive & Pointless
Such a small (but stat. significant) increase - the drug is of little clinical use
3. Standard Deviation & Variability
±
Larger the SD of 2 groups, relative to CID, then the larger the sample needed
Smaller the SD, the smaller the sample required
Ratio of MID to SD is the “standardized difference” – used in calculating
sample sizes
Estimated SD
Estimate of SD may not be available
1. Pilot study
2. Begin trial and estimate SD from initial patients
3. Use SD found in previous trials
4. Use SD found in similar patients / circumstances in other literature
4. Significance Level
• Significance level (α) important bearing on sample size required
P
• Relationship between significance level (α) and the chance of making type 2
error (β)
• Smaller significance level (e.g. P=0.01 rather than P=0.05) requires larger
sample size to avoid type 2 error
• As nominated significance level gets smaller, so does chance of type 2 error
• Significance level of P=0.05 implies a type 2 error will occur in every 20 trials
5 out of 100 studies will make type 2 errors - - purely by chance. Acceptable
Prob. of type 2 error should be approx. 4 times sig. level chosen e.g.
α =5% then power =80%
α =1% then power =95%
Calculating Sample Size
Sample size calculations available for all study designs, trials, and data types
e.g. categorical data, continuous data, means, proportions, multiple groups,
paired samples, unpaired samples, equal / unequal sized groups
Calculations are complex but easily done with a PC and www
Statistician helpful
(if s/he can communicate clearly!)
Two approaches for us non-statisticians
1. Altman’s Normogram
2. Internet
Altman’s Normogram
0.995
Standardized difference = Min. important difference
Standard deviation
0.0
0.99
0.1
0.98
0.2
0.3
N
0.97
0.96
0.95
0.4
0.90
0.5
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.6
0.7
0.8
0.9
1.0
1.1
1.2
0.10
0.05
Power
Example Calculation – Effects of Pesticide Study
IQ survey, concerning workers exposed to pesticides
What we already know…
Mean IQ score is 100 points
SD is ± 10 points e.g. Normal IQ= 90-110
What we need to do….
a) Decide on CID. A difference of 11 IQ points seems clinically important to me
b) Calculate Standardized Difference = Min Important Difference 11 = 1
Standard Deviation
10
c) Use Altman’s Normogram to observe N
0.995
0.0
0.1
N
0.99
0.98
0.3
0.97
0.96
0.95*
0.4
0.90*
0.5
0.85*
0.80*
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
=
11
10
0.2
0.6
1.1
Standardized difference = Min. important difference
Standard deviation
Altman’s Normogram - Effects of Pesticide Study
0.7
0.8
0.9
1.0
1.1
1.2
0.10
0.05
Power
2. Electronic Calculation of Sample Size
Not covered in most stats packages
e.g. SPSS, Statistica
Many sites available
Real time calculation
Hyperstat by David M Lane
www.davidmlane.com
Other additional software
e.g. Xlstat.com
Summary of Sample Size & Power
Correct sample size helps avoid type I & type II errors
A correct study has balance of four factors
Power (no less than .80)
Bigger = Better study
Min. clinical difference (effective difference)
Bigger = Better study
Standard deviation (variability)
Smaller = Better study
Significance level (0.05)
Smaller = Better study
Looking for big differences much easier than smaller differences
Deploying Participants
Deployment is crucial
(as important as sampling correctly)
Deployment is only really an issue in “natural experiments”
e.g lab work or clinical trials
Independent Subjects
Matched Subjects
Repeated Subjects
Determined by
subject matter, environment, economics, organization
Deploying Participants
INDEPENDENT SUBJECTS
Subjects in x groups, who are measured in some way
Group means compared between the x groups
MATCHED SUBJECTS (Similar to INDEPENDENT SUBJECTS)
Subjects are in x groups
Each individual subject is matched with a subject in another group
on the basis of one or more matched variables e.g age, sex, ability
REPEATED SUBJECTS / MEASURES
Use one group only
Subjects are measured more than once (e.g. start, during and after)
Any Differences / changes within themselves are looked at
Independent Design
Subjects in x groups
Measured in some way and compared with another of the x groups
Comparing 2 groups is easiest
But can compare more than 2 groups
e.g comparing a high exposure group with a low exposure group
Hi exposure
Low exposure
mean = ?
mean = ?
Matched Design
Similar to INDEPENDENT SUBJECTS
Subjects are in x groups, and each individual subject is matched with a
subject in another group
Matched on the basis of one or more matched variables
e.g age, sex, ability, IQ, health status
Interested in controlling for prognostic factors
Hi expo
mean = ?
27
Lo expo
35
42
28
mean = ?
Repeated Design
Uses one group only
Subjects are measured more than once on different occasions
The differences between individuals themselves in time are compared
Comparison is still based on the group means not individual scores
Looking at differences in people over a set time period
e.g. time 1 and time 2
Post exposure
Pre exposure
t2
t1
t2
t1
t2
t1
mean = ?
mean = ?
Example 1 - Independent Design
Workers exposed to pesticide versus controls (not exposed to pesticide)
Independent T test
Age
Exposed
n=5
Controls
n=5
T
P
25.2 (sd 2.7)
26.4 (sd 2)
-.77
.46
14.8 (sd 4.9)
.65
.53
Psych 16.8 (sd 4.7)
Example 2 - Matched Design
Workers exposed to pesticide versus controls not exposed to pesticide
Paired Samples T test
Exposed
n=5
Controls
n=5
30.8 (sd 7.6)
30.8 (sd 7.6)
Psych 13.8 (sd 2.1)
19.8 (sd 4.5)
Age
T
P
-4.8
.008
Example 3 - Repeated Design
Workers before and after exposure to pesticide
Independent T test
Pre
n = 10
Psych 14.1 (sd 5.7)
Post
n = 10
T
P
19.9 (sd 4.2)
2.5
.02
N numbers doubled from independent methods
Repeated subjects is efficient
Sampling & Deployment
RANDOM SAMPLING
Selecting a sample from the POPULATION
Related to the EXTERNAL VALIDITY of the research,
Related to the GENERALIZABILITY of the findings to the POPULATION
RANDOM ASSIGNMENT
How to assign the sample into different treatments or groups
Related to the INTERNAL VALIDITY of the research
Ensures groups are similar (EQUIVALENT) to each other prior to TREATMENT
Both RANDOM SAMPLING and RANDOM ASSIGNMENT can be used together,
or singularly, or not all…
Waste of time randomly sampling but not randomly allocating
Having a choice in this matter is a luxury
Restrictions on Methods
Study may involve clinical interventions
No controls may be available for a comparison
e.g all workers are exposed
e.g all participants male
e.g. all participants need “treatment”
Data may be retrospective
Data collection may be made by other parties
Power Hierarchy of Study Designs
Best - Repeated Subjects / Repeated Measures
comparing like with like
each subject ”stays the same” in other factors
reduces the need for covariate adjustment in analyses
“doubles” the number of subjects
Middle - Matched subjects
important factors are matched between groups
unmatched covariates still need to be adjusted for
not comparing like with like in all respects
Weakest - Independent subjects
comparing groups which may be vastly different
covariate adjustment is needed
need to use strict exclusion criteria in order to maintain comparability
Final Points
Bias
Avoiding bias is a good aim to have
Not necessarily everything in research
Existence of some bias in a sample does not ruin a project entirely
Spector et al, (2000) shows the “inflating effect” of self-report bias may not be
so prominent
Mostly leads to underestimation rather than overestimation of any main
effects
Spector PE, Chen PY, O’Connell BJ. A longitudinal study of relations between
job stressors and job strains while controlling for prior negative affectivity
and strains. Journal of Applied Psychology 2000; 85: 211-218.
Final Points
Generalizability In epidemiological investigation
Basic principles:
Internal validity is always more important than its generalizability
Never appropriate to generalise an invalid finding
Mant et al. (1996)
Mant J, Dawes M, Graham-Jones S. Internal validity of trials is more important
than generalizability. British Medical Journal 1996; 312: 779.