5-SamplingTechniques(1431-32).ppt

Download Report

Transcript 5-SamplingTechniques(1431-32).ppt

Sampling Techniques
Dr. Shaik Shaffi Ahamed Ph.D.,
Assistant Professor
Department of Family & Community
Medicine
College of Medicine
King Saud University
Why should we take sample?, Can’t we study the whole ?
It is possible
depends on objective
-to know how many live in a country
--age and sex categories
--changing pattern of age structure
--when plan for country
CENSUS
--death in a hospital
record all the death
It is not possible
-to test the life of bulbs – burn bulbs till it lost its life
-count of RBW in blood – draw all the blood & count
-Count the stars in the sky
It is not necessary
- estimate Hb% in blood – a drop of blood is enough –
blood in any part of the body will provide same
Populations and Sampling
Reasons for using samples
There are many good reasons for studying a sample instead of an
entire population:

Samples can be studied more quickly than populations. Speed
can be important if a physician needs to determine something
quickly, such as a vaccine or treatment for a new disease.

A study of a sample is less expensive than a study of an entire
population because a smaller number of items or subjects are
examined. This consideration is especially important in the
design of large studies that require a long follow-up.

A study of the entire populations is impossible in most
situations.

Sample results are often more accurate than results based on a
population.
Sampling in Epidemiology

Why Sample?
–
–
–
–
Unable to study all members of a population
Reduce bias
Save time and money
Measurements may be better in sample than in
entire population
– Feasibility
Sampling
Sampling is the process or technique
of selecting a sample of
appropriate characteristics and
adequate size.
Terminology
Study Population
• A population may be defined as an aggregate of
all things / units possessing a common trait or
characteristic.
• The whole collection of units (“the universe”).
6
Terminology – Cont.
Target (Study) Population
• The population that possesses a characteristic
(parameter) which we wish to estimate or
concerning which, we wish to draw conclusion.
• The population you expect the eventual results
of the research to apply (target of inference).
• It may be real or hypothetical.
7
Terminology – Cont.
Sample
• A selected subset of the study population.
• Chosen by some process (e.g. sampling) with
the objective of investigating particular
characteristic (parameter) of the study
population.
Sampling
• Process of obtaining a sample from the target
population.
8
Terminology – Cont.
Sampling Frame
•
This is the complete list of sampling units in the
target population to be subjected to the sampling
procedure.
•
Completeness and accuracy of this list is essential
for the success of the study.
Sampling Units
These are the individual units / entities that make up
the frame just as elements are entities that make up
the population.
Terminology – Cont.
Sampling Error
This arises out of random sampling and is the
discrepancies between sample values and the
population value.
Sampling Variation
 Due to infinite variations among individuals
and their surrounding conditions.
 Produce differences among samples from the
population and is due to chance.
Repeat the same study, under exactly
similar conditions, we will not
necessarily get identical results.

Example: In a clinical trail of 200 patients we
find that the efficacy of a particular drug is
75%
If we repeat the study using the same drug in
another group of similar 200 patients we will
not get the same efficacy of 75%. It could be
78% or 71%.
“Different results from different trails though
all of them conducted under the same
conditions”
Example:
If two drugs have the same efficacy then the difference
between the cure rates of these two drugs should be
zero.
But in practice we may not get a difference of zero.
If we find the difference is small say 2%, 3%, or 5%, we
may accept the hypothesis that the two drugs are
equally effective.
On the other hand, if we find the difference to be large
say 25%, we would infer that the difference is very
large and conclude that the drugs are not of equally
efficacy.
Example:
If we testing the claim of pharmaceutical company
that the efficacy of a particular drug is 80%.
We may accept the company’s claim if we observe the
efficacy in the trail to be 78%, 81%, 83% or 77%.
But if the efficacy in trail happens to be 50%, we would
have good cause to feel that true efficacy cannot be
80%.
And the chance of such happening must be very low.
We then tend to dismiss the claim that the efficacy of
the drug is 80%.

THEREFORE
“WHILE TAKING DECISIONS BASED ON
EXPERIMENTAL DATA WE MUST GIVE SOME
ALLOWANCE FOR SAMPLING VARIATION “.
“VARIATION BETWEEN ONE SAMPLE AND
ANOTHER SAMPLE IS KNOWN AS SAMPLING
VARIATION”.
Terminology – Cont.
Study Participants

Subjects that are actually participating in
the study.

Subset of study population that were
contactable and consented / agreed to
participate.
Study Participants - Cont.
Study participants may still be not
representative of the target population even
with random sampling because of:
–
–
–
–
Sampling frame is out of date.
Failure to recruit eligible subjects.
Non consent or non response.
Drop Out / Withdrawal.
Decisions Required for selecting
sample
1.
Specify what is the target population. This is
entirely determined by the research objective.
2.
Specify what is the study population.
(e.g. who are eligible for inclusion in the study)
3.
Select a sampling design for obtaining a sample for
study.
4.
Strategy to ensure high response or participation
rate, otherwise inference must take account of
non-responses.
Decisions will have considerable impact on study validity
(soundness of conclusion or inference made).
Study populations and sampling summarized schematically
Target population:
real or
hypothetical
Select based on
judgment and
accessibility
Study Population
Probability
sampling
Sample
Consent or
respond
Participants in
study
How to sample ?
In general, 2 requirements
1. Sampling frame must be available, otherwise
construct one or use special sampling
techniques. Frame construction may not be
easy.
2. Choose an appropriate sampling method to
draw a sample from the frame.
The Sampling Design Process
Define the Population
Determine the Sampling Frame
Select Sampling Technique(s)
Determine the Sample Size
Execute the Sampling Process
Classification of Sampling
Techniques
Sampling Techniques
Non probability
Sampling Techniques
Convenience
Sampling
Judgmental
Sampling
Simple Random
Sampling
Systematic
Sampling
Probability
Sampling Techniques
Quota
Sampling
Stratified
Sampling
Snowball
Sampling
Cluster
Sampling
Other Sampling
Techniques
Simple Random Sampling
A sample may be defined as random if every
sampling unit in the study population has an
equal chance of being selected.
 Selection of SRS may be done by:

– Drawing the number or name from a hat or box.
– Using a Random Number Table.
– Using a computer to generate the numbers.
SRS Methods
 Lottery
Method
 Random Number Table method
Tables of random numbers
are used after numbers have been
assigned to numbers of the study
population. Use random number table
to select subject. Start anywhere.
Continue selecting until the desired
sample is reached
Random Number table
1
2
3
4
5
49486
93775
88744
80091
92732
94860
36746
04571
13150
65383
10169
95685
47585
53247
60900
12018
45351
15671
23026
55344
45611
71585
61487
87434
07498
89137
30984
18842
69619
53872
94541
12057
30771
19598
96069
89920
28843
87599
30181
26839
32472
32796
15255
39636
90819
How to select a simple random
sample
1.
2.
3.

Define the population
Determine the desired sample size
List all members of the population or the
potential subjects
For example:
–
–
4th grade boys who have demonstrated problem
behaviors
Lets select 10
Potential Subject Pool
1. Robert
2. Ralph
3. John
4. Andy
5. Joel
6. Thomas
7. Cooper
8. Maurice
9. Terry
10. Carl
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Ken
Wilmer
Alan
Kevin
James
Henry
Don
Walt
Doug
George
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
Steve
Larry
Rick
Bruce
Clyde
Sam
Kent
Travis
Woody
Brian
So our selected subjects are numbers 10, 22, 24,
15, 6, 1, 25, 11, 13, & 16.
1. Robert
2. Ralph
3. John
4. Andy
5. Joel
6. Thomas
7. Cooper
8. Maurice
9. Terry
10. Carl
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Ken
Wilmer
Alan
Kevin
James
Henry
Don
Walt
Doug
George
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
Steve
Larry
Rick
Bruce
Clyde
Sam
Kent
Travis
Woody
Brian

Simple random sampling
–
1.
2.
3.
4.
5.
6.
Estimate hemoglobin levels in patients with sickle cell
anemia
Determine sample size
Obtain a list of all patients with sickle cell anemia
in a hospital or clinic
Patient is the sampling unit
Use Lottery method/ a table of random numbers
to select units from the sampling frame
Measure hemoglobin in all patients
Calculate mean and standard deviation of sample

Simple random sampling
– Advantages
Simple process and easy to understand
 Easy calculation of means and variance

– Disadvantages
Not most efficient method, that is, not the most precise
estimate for the cost
 Requires knowledge of the complete sampling frame
 Cannot always be certain that there is an equal chance of
selection
 Non respondents or refusals

Sampling in Epidemiology
 Systematic
sampling
– The sampling units are spaced regularly
throughout the sampling frame, e.g., every 3rd
unit would be selected
– May be used as either probability sample or not
 Not
a probability sample unless the starting point is
randomly selected
 Non-random sample if the starting point is
determined by some other mechanism than chance
Systematic Sampling


The sample is chosen by selecting a random starting point
and then picking every i th element in succession from
the sampling frame.
The sampling interval, i, is determined by dividing the
population size N by the sample size n and rounding to
the nearest integer.
For example, there are 100,000 elements in the population
and a sample of 1,000 is desired. In this case the
sampling interval, i, is 100. A random number between 1
and 100 is selected. If, for example, this number is 23, the
sample consists of elements 23, 123, 223, 323, 423, 523,
and so on.
Example



If a systematic sample of 500 students were to be carried
out in a university with an enrolled population of 10,000,
the sampling interval would be:
I = N/n = 10,000/500 =20
All students would be assigned sequential numbers. The
starting point would be chosen by selecting a random
number between 1 and 20. If this number was 9, then the
9th student on the list of students would be selected along
with every following 20th student. The sample of
students would be those corresponding to student
numbers 9, 29, 49, 69, ........ 9929, 9949, 9969 and 9989.
Systematic Sampling
• Decide on sample size: n
• Divide population of N individuals into groups of
k individuals: k = N/n
• Randomly select one individual from the 1st group.
• Select every k-th individual thereafter.
N = 64
n=8
k=8
First Group
 Systematic
sampling
– Advantages
 Sampling
frame does not need to be defined in
advance
 Easier to implement in the field
 If there are unrecognized trends in the sample
frame, systematic sample ensure coverage of the
spectrum of units
– Disadvantages
 Variance
are made
cannot be estimated unless assumptions
Stratified Sampling

A two-step process in which the population is
partitioned into subpopulations, or strata.

The strata should be mutually exclusive and
collectively exhaustive in that every population
element should be assigned to one and only one
stratum and no population elements should be omitted.

Next, elements are selected from each stratum by a
random procedure, usually SRS.

A major objective of stratified sampling is to increase
precision without increasing cost.
 Stratified
random sample
– The sampling frame comprises
groups, or strata, with certain
characteristics
– A sample of units are selected from
each group or stratum
Sampling in Epidemiology

1.
2.
3.
4.
5.
Stratified random sample
– Assess dietary intake in adolescents
Define three age groups: 11-13, 14-16, 17-19
Stratify age groups by sex
Obtain list of children in this age range from
schools
Randomly select children from each of the 6
strata until sample size is obtained
Measure dietary intake
Stratified Random selection for drug trail in hypertension
Mild
Moderate
Severe
 Stratified random
– Advantages
sample
Assures that certain subgroups are represented in a
sample
 Allows investigator to estimate parameters in different
strata
 More precise estimates of the parameters because strata
are more homogeneous, e.g., smaller variance within
strata
 Strata of interest can be sampled most intensively, e.g.,
groups with greatest variance
 Administrative advantages

– Disadvantages

Loss of precision if small number of units is sampled
from strata
Cluster Sampling
The population is first divided into mutually
exclusively groups of elements called clusters.

Ideally, each cluster is a representative small-scale
version of the population (i.e. heterogeneous group).

 A simple
taken.
 All
random sample of the clusters is then
elements within each sampled (chosen) cluster
form the sample.
 Elements within a cluster should be as
heterogeneous as possible, but clusters themselves
should be as homogeneous as possible. Ideally, each
cluster should be a small-scale representation of the
population.

1.
2.
3.
4.
Cluster sampling
– Estimate the prevalence of dental caries in
school children
Among the schools in the catchments area, list
all of the classrooms in each school
Take a simple random sample of classrooms, or
cluster of children
Examine all children in a cluster for dental
caries
Estimate prevalence of caries within clusters
than combine in overall estimate, with variance
 Cluster
sampling
– Advantages
 The
entire sampling frame need not be enumerated
in advance, just the clusters once identified
 More economical in terms of resources than
simple random sampling
– Disadvantages
 Loss
of precision, i.e., wider variance, but can be
accounted for with larger number of clusters
Multistage Sampling

Similar to cluster sampling except that there are
two sampling events, instead of one
– Primary units are randomly selected
– Individual units within primary units randomly
selected for measurement
Multi–Stage Sampling
 This
sampling method is actually a
combination of the basic sampling
methods carried out in stages.
 Aim
of subdividing the population
into progressively smaller units by
random sampling at each stage.
Sampling in Epidemiology

1.
2.
3.
4.
5.
6.
Multistage sampling
– Estimate the prevalence of dental caries in school
children
Among the schools in the catchments area, list all of
the classrooms in each school
Take a simple random sample of classrooms, or
cluster of children
Enumerate the children in each classroom
Take a simple random sample of children within the
classroom
Examine all children in a cluster for dental caries
Estimate prevalence of caries within clusters than
combine in overall estimate, with variance
Classification of Sampling
Techniques
Sampling Techniques
Nonprobability
Sampling Techniques
Convenience
Sampling
Judgmental
Sampling
Simple Random
Sampling
Systematic
Sampling
Probability
Sampling Techniques
Quota
Sampling
Stratified
Sampling
Snowball
Sampling
Cluster
Sampling
Other Sampling
Techniques
Sampling Methods
Non-probability samples
Convenience Sampling
Convenience sampling attempts to obtain a sample of
convenient elements. Often, respondents are selected
because they happen to be in the right place at the right
time.
– use of students, and members of social organizations
– mall intercept interviews without qualifying the
respondents
– department stores using charge account lists
– “people on the street” interviews
 Convenience
sample
– Case series of patients with a particular
condition at a certain hospital
– “Normal” graduate students walking down
the hall are asked to donate blood for a
study
– Children with febrile seizures reporting to
an emergency room
Investigator decides who is enrolled in a
study
Judgmental Sampling
Judgmental sampling is a form of
convenience sampling in which the
population elements are selected
based on the judgment of the
researcher.
– It involves hand-picking from the accessible
population those individuals judged most
appropriate for the study.
QUOTA
SAMPLING
Quota Sampling
Quota sampling may be viewed as two-stage restricted judgmental sampling.
– The first stage consists of developing control categories, or quotas, of
population elements.
– In the second stage, sample elements are selected based on convenience or
judgment.
Population
composition
Control
Characteristic
Sex
Male
Female
Sample
composition
Percentage
Percentage
Number
48
52
____
100
48
52
____
100
480
520
____
1000
QUOTA SAMPLING
Snowball Sampling
In snowball sampling, an initial group of respondents is
selected, usually at random.
– After being interviewed, these respondents are asked
to identify others who belong to the target population
of interest.
– Subsequent respondents are selected based on the
referrals.
Consecutive sample

Consecutive sample
– A case series of consecutive patients with a
condition of interest
– Consecutive series means ALL patients
with the condition within hospital or clinic,
not just the patients the investigators
happen to know about
 Consecutive
sample
– Outcome of 1000 consecutive patients presenting
to the emergency room with chest pain
– Natural history of all 125 patients with HIVassociated TB during 5 year period
Explicit efforts must be made to identify and
recruit ALL persons with the condition of
interest
Sampling Methods
Non-probability samples




Depends on expert’s opinion,
Probabilities of selection not considered.
Advantages: include convenience, speed,
and lower cost.
Disadvantages;
– Lack of accuracy,
– lack of results generalizability.
Availability sampling:
selecting on the basis of
convenience.
Random sampling:
every combination of a given
size has an equal chance of
being chosen.
Cluster sampling:
dividing the population into
clusters, typically on the basis
of geography, and taking a
sample of the clusters.
Snowball sampling:
asking individuals studied to
provide references to others.
Multi-stage sampling:
sampling subunits within
sampled units.
Stratified sampling:
dividing the population into
groups on the basis of some
characteristic and then
sampling each group.
Quota sampling:
selecting fixed numbers of
units in each of a number of
categories.
Systematic sampling:
choosing every nth item from a
list, beginning at a random
point.
Strengths and Weaknesses of
Basic Sampling Techniques
Technique
Strengths
Weaknesses
Nonprobability Sampling
Convenience sampling
Least expensive, least
time-consuming, most
convenient
Low cost, convenient,
not time-consuming
Sample can be controlled
for certain characteristics
Can estimate rare
characteristics
Selection bias, sample not
representative, not recommended for
descriptive or causal research
Does not allow generalization,
subjective
Selection bias, no assurance of
representativeness
Time-consuming
Easily understood,
results projectable
Difficult to construct sampling
frame, expensive, lower precision,
no assurance of representativeness.
Can decrease representativeness
Judgmental sampling
Quota sampling
Snowball sampling
Probability sampling
Simple random sampling
(SRS)
Systematic sampling
Stratified sampling
Cluster sampling
Can increase
representativeness,
easier to implement than
SRS, sampling frame not
necessary
Include all important
subpopulations,
precision
Easy to implement, cost
effective
Difficult to select relevant
stratification variables, not feasible to
stratify on many variables, expensive
Imprecise, difficult to compute and
interpret results
Random . . .

Random Selection vs. Random Assignment
– Random Selection = every member of the
population has an equal chance of being
selected for the sample.
– Random Assignment = every member of the
sample (however chosen) has an equal chance
of being placed in the experimental group or
the control group.
 Random assignment allows for individual
differences among test participants to be
averaged out.
Subject Selection (Random
Selection)
Choosing which
potential subjects
will actually
participate in the
study
Subject Assignment (Random
Assignment)
Deciding which group or condition each subject
will be part of
Group A
Group B
Population: 200 8th Graders
40 High IQ
students
120 Avg.
IQ students
30
students
30
students
40 Low IQ
students
30
students
15
students
15
students
15
students
15
students
15
students
15
students
Group A
Group B
Group A
Group B
Group A
Group B
Randomization (Random
assignment to two treatments)

Randomization tends to produce study groups
comparable with respect to known and unknown
risk factors,

removes investigator bias in the allocation of
participants

and guarantees that statistical tests will have valid
significance levels

Trialist’s most powerful weapon against bias
Randomization (Cont)

Simple randomization: Toss a Coin
– AAABBAAAAABABABBAAAABAA…

Random permuted blocks (Block
Randomization)
– AABB-ABBA-BBAA-BAAB-ABAB-AABB…
Block Randomization

Each block contains
all conditions of the
experiment in a
randomized order.
E, C, C,
E
C, E, C,
E
Experimental
Group
N=6
E, E, C,
C
Control
Group
N=6
Prevalence and risk factors of HIV 1 and HIV 2 infection in Urban
and rural areas in TN. Int. J. of STD & AIDS 1998;9:98-103
Objective: Find prevalence and risk factors.
Setting: Centers in metropolitan city & municipality.
Subjects: Individuals in Tamil nadu.
Sampling Procedure:
“ Health camps were organized in 5 urban and 5 rural
centers to cover entire state graphically”
“ Every third person screened, in the active reproductive
age group, were recruited as a subject. At each camp the
inclusion of subjects continued until 200 persons were
recruited”
Sex differences in the use of asthma drugs: Cross-sectional study.
BMJ 1998; 317: 1434-7
Objective : To assess the use of asthma drugs. Design : Crosssectional study. Setting: Six general practices in East Anglia.
Subjects : Adults aged 20-54 with Asthma
Sampling method
“identify cases with asthma received drugs one year before –
through database from each participating practices. The sample
was stratified into three categories of severity corresponding the
prescribed drugs
Bronchodilator alone (mild)
38%
Steroids (moderate)
57%
Nebulizer treatment (severe)
5%
Use SRS to select subject in each practice based on proportion of
use of each type of drug within the practice
Genital ulcer disease and acquisition of HIV infection.
Indian J Med Microbiol 1992; 10(4):265-269
Objective : To find out the association of HIV infection with
genital ulcer disease .
Setting : Dept. of STD, GGH, Chennai.
Subjects : Individuals attending the STD dept.
Sampling procedure
‘ Blood samples from first 20 patients were taken for
analysis once a week for 40 weeks’.
Prevalence of series eye disease and visual
impairment in a north London population:
Population based, cross sectional study.
BMJ 1998; 316:1643-48.
Objective: To estimate eye disorders and of
visual impairment
Design: Cross-sectional survey.
Setting : General Practices in metropolitan in
England.
Subjects: aged 65 or older & registered
Sampling Procedure
17 general practice group
Random sampling
7 were selected
People age 65 or older were registered with the
general practices. Total 750-850 in each Gen Pract
Use SRS to select eligible people in each practice
One third in each practices were selected to form survey sample
Example

A medical student in a city in South Africa conducted a
survey to measure the prevalence of HIV in his village. He
used simple random sampling to select the subjects. At the
end of his study, he was able to estimate the prevalence in
the general population of the village. However, he was not
able to calculate the prevalence of HIV in some subgroups
such as homosexual due to the absence of this subgroup
from his sample. So, to guarantee the presence of such rare
group, what kind of sampling should he have used?
A. Systematic random sample.
B. Cluster sample.
C. Multistage-staged sample.
D. Stratified random sample.
E. None of the above.
Example
A post-graduate trainee of family medicine was assigned
a project to evaluate the effect of teachers’ smoking on
students’ behavior. He presented the following scenario
as an explanation of his method of subjects’ selection:
“Out of 400 schools in Riyadh 30 schools were selected
randomly and then all subjects (teachers) in each
selected school will be included in the study”
The type of sampling method is:
A. Multi-staged sample
B. Cluster sample
C. Simple random sample
D. Stratified random sample
E. None of the above
Example
Stratified random sample:
A. Make use of random number tables
B. Is one type of non-random sample
C. Divide the population into groups or clusters
according to characteristic of interest
D. Take all units in some clusters
E. Increase precision