Transcript MICS WS1

Multiple Indicator Cluster Surveys
Survey Design Workshop
Advanced Sampling
MICS Survey Design Workshop
Major steps in designing MICS sample
• Define objectives
– Key indicators
– Desired level of precision
– Sub-national domains of estimation
• Identify most appropriate sampling
– Most recent census of population and housing
– Master sample or sample for another survey
conducted recently
Major steps in designing MICS sample
• Determine sample size and
–Determine availability of previous
MICS or DHS results to provide
measures of sampling parameters
Sampling Frame
• Sampling frame:
– Nationally-representative
– Complete coverage
– Measures of size (households or
population) for small area units
• Generally most recent census is the most
effective sampling frame
Sampling Frame
• In some cases more recent pre-census
listing may be available
• When no census is available, identify
most complete geographic frame
available (e.g. list of villages/localities
with estimated population)
Sampling Frame
• Common problems with area frames:
– Coverage issues
– Census maps of poor quality
– Errors and changes in area boundaries
– Inappropriate type and size of area
– Lack of auxiliary information
• n is the required sample size (number of
• 4 is a factor to achieve the 95 percent level of
• r is the predicted or estimated value of the
indicator in target population
• deff is the design effect
• RR is the response rate
• pb is the proportion of the target
subpopulation in total population (upon
which the indicator, r, is based)
• AveSize is the average household size (that is,
average number of persons per household)
• e is the margin of error to be tolerated at the
95% level of confidence
• Currently, note that e = 0.12r [defined as 12%
of r, in this case the relative standard error of
r is 6% because e = 2 standard error (r)]
Previously in MICS2
• 2 different values for margin of error
– Margin of error was 5 percentage points for high
values of r (over 25%)
– Margin of error was 3 percentage points for low
values of r (25% or less)
• Difficulty for users in deciding on the sample
size for their surveys.
MICS template for sample size
calculation - EXCEL FILE
Selection of key indicators
• Choose an important indicator that will yield
the largest sample size
• Step 1: Select 2 or 3 target populations
representing each a small percentage of the
total population (pb); typically
– Children 12-23 months: 2-4% or
– Children under 5 years: 7%-20%
Selection of key indicators
• Step 2: Review important indicators for these
target groups but ignore indicators with very
low or very high prevalence (less 10% or over
40%, respectively)
• Do not choose from the desirably low
coverage indicators an indicator that is
already acceptably low
• Do no choose childhood and maternal
mortality ratios
Explicit Stratification
• Explicit stratification: dividing the sampling
frame into sub-groups (called strata) of
homogeneous (similar) PSUs.
• Advantages:
– Better precision because reduced variance
within stratum given similarity of units
– Flexible design, sub-national estimates for
smaller domains (differential sampling rates)
• Example of stratification: region, urban/rural
Implicit Stratification
• Sort the sampling frame according to certain
characters such as regions, urban-rural
residence, sub-regions, districts, etc., then
select a systematic pps sample.
• Ensures a representative sample for each
• Automatically provides proportional allocation
by size of subgroup
Allocation of sample to strata/domains
• Proportional allocation
– Effective for precision of estimates at the national level
• Equal allocation to each domain
– Used when each domain requires same level of precision
• Optimum allocation – takes into account differential
variance and costs by stratum
– For example, variability may be higher in urban areas and
enumeration costs may be higher in rural areas – use
higher sampling rate for urban areas
Subnational estimates
• Number of separate areas (domains) for which
separate, equally reliable estimates are wanted
affects sample size
• For example, if 10 regional estimates are wanted,
theoretically the sample should be increased by
factor of 10
• As a compromise, larger sampling errors accepted
for subnational estimates
– One proposal (by Dr. Vijay Verma) – increase national
sample size by factor of D0.65, where D is the number of
– Results in an average increase in the sampling errors for
domain estimates by a factor of about 1.5
Sampling Stages
• Ideal to have two-stage sample design, with
EAs defined as PSUs
• In some countries only frame of larger
administrative units available
– Three-stage sample design: larger area units
selected as PSUs
– Necessary to delineate smaller segments in each
sample PSU
Number of PSUs and Cluster Size
• Survey costs depend not only on number of
households but their distribution among
primary sampling units (PSUs)
• Important to determine effective balance
between number of sample PSUs and number
of sample households per cluster
• In general, the more PSUs the better for
reliability but the greater the cost (mostly
costs of travel and listing)
Number of PSUs and Cluster Size
• Example: 8000 households selected in 400
PSUs of 20 sample households each is a much
more reliable sample than 200 PSUs of 40
households each, but more expensive
• Number of sample households per cluster
should be as small as practical for reliability
• A range of 15-25 households for MICS appears
to be effective
Design Effect (DEFF)
• Deff - ratio of variance of estimate based on
stratified multi-stage sample design and
corresponding variance from simple random
sample of same size
• Measure of the relative efficiency of the
sample design
• Effective stratification reduces the deff
• Cluster sampling increases the deff
Design Effect (DEFF)
• In case of cluster sampling, deff generally measures
effect of clustering
deff  1   (m 1)
• δ = intraclass correlation coefficient, or measure of
within cluster
• m = average number of households per cluster
• Design effect increases with intraclass correlation
and cluster size
First Stage Selection of PSUs
• Standard methodology for MICS and other
household surveys – select EAs or clusters
systematically with PPS
• Important to sort frame before selection, in
order to ensure effective implicit stratification
• Traditional procedure – cumulate measures of
size, determine sampling interval and random
start, generate selection numbers
Large sample PSUs in PPS sampling
• Sometimes a PSU may have a measure of size larger
than the sampling interval
• PSU may be selected more than once in the
systematic PPS selection
• Option 1 – if the PSU is selected two or more times,
multiply the number of households to be selected by
the number of “hits”
• Option 2 – separate the large PSUs and include in
sample with a probability of 1
MICS Sampling Option 1 –
new sample with household listing
• Design new MICS sample
• Two stages with census as frame
• Use of implicit stratification, systematic selection
of census EAs at first stage with pps
• List households in selected EAs/segments
• Select households systematically from listing
• Interview selected households, no replacement will
be allowed
Sampling Option 1 - continued
• Advantages of option 2
- simple design
- probability-based
- if possible self-weighting (national level)
• Limitations of option 2
- expense of listing households
- time necessary to list households
[Example, sample size of 5000 households may require 25000 to
50000 households to be listed]
MICS Sampling Option 2 –
use an existing sample
• Design MICS as a rider to another survey if timely
and feasible
• Use sample from a previous survey and re-interview
households for MICS
• Or, use old survey sample EAs and construct new
listing of households to select for MICS
• Old sample must be probability-based, national in
• Possibilities – DHS, other national health survey, recent
labour force survey
• Important: design parameters must be known (such as
selection probability, stratification, etc.)
Sampling option 2 - continued
• Use of existing master sampling frame
• Some countries use master sample design for
intercensal national household surveys
• Master samples generally sufficiently large for
MICS; subsample of PSUs can be selected
• Advantage – updated maps may be available
for master sample of PSUs, and perhaps
updated listing
Sampling option 2 - continued
• Advantages of using previous sample
- cost savings
- maps available for interviewers
- appropriate sampling plan available
- simplicity
• Limitations of using old sample
- burden on respondents
- sample design may need modification
* sample size
* sub-national coverage
* number of PSUs or clusters
• Balance between loss and gain
Listing and Selection of Households
• Household listing manual is available
• Importance of new listing to represent current
• Problems with using previous listing (older
than 1 year)
– Does not represent newer households
– Distribution of sample population by age group
distorted, generally with higher median age
– Difficulty of finding households in old list
Listing and Selection of Households
• MICS recommends a separate household
listing operation
– More reliable as listing staff are less likely than
interviewers to bias the sample by excluding
households that are difficult to reach
– Allows household selection to be done in a
single central location using reliable and
uniform procedures
Listing and Selection of Households
• Household selection in the office:
– Advantages – conducted by specialized staff,
possible to avoid selection bias in the field,
possible to control overall sample size
– Disadvantage – increased costs from having two
field visits
• Selection in the field: use household selection table
– Advantage – cost savings of having one integrated
field operation
– Disadvantage - correct sampling may be difficult
for field staff, selection may be biased
Listing and Selection of Households
• Excel template for generating automatically
the sample of households based on the
number of households listed(see spreadsheet)
• Common problems found in listing operations
– Problem with quality of sketch maps – difficult to
determine segment boundaries
– Sometimes large differences found between
number of households in frame (census) and
number listed.
Sampling strategy for low fertility
• In MICS 4 and 5, some low fertility countries
are using second-stage stratification of listing
by households with and without children
under 5
• Higher sampling rate used for households
with children
• Increases number of households with children
in MICS sample, and therefore number of
sample children
Sampling strategy for low fertility
countries (continued)
• Improves the reliability of the child indicators
without increasing the sample size to a very high
• This procedure also increases the variability in the
weights and the design effects for the overall sample
• Important to avoid very large variability in the
weights for households with and without children
– Differential weights between households with and without
children generally should not exceed a factor of about 4
Implications of sampling strategy
on sample size calculations
• One parameter in the sample size calculation
template is the proportion of the indicator
• Using a higher sampling rate for households with
children increases the proportion of children under 5
in the sample
• The proportion of children under 5 (or smaller age
groups) should be multiplied by a factor that reflects
the increase in sample households with children
Implications of sampling strategy
on weighting procedures
• Under normal MICS sample design, weights
vary by sample cluster
• With second stage stratification by
households with and without children, two
weights need to be calculated for each
cluster: for households with and without
Survey weighting procedures
• Survey data collected using a complex design
featuring clustering, unequal probabilities of
selection and stratification:
– All analyses must apply survey weights in order to
prevent biased results
• Formulas for calculating weights depend on
the exact sample design used in each country
• MICS has 4 set of weights: households,
women, men and children
Survey weighting procedures
• Components of MICS survey weights:
– Design weight: inverse of the final probability of
selection for households
– Adjustment factors for nonresponse (cluster,
household, woman, child level)
• Normalized weights so that the total weighted
number of observations is equal to the total
unweighting number (sample size)
Survey weighting procedures
Sampling Error Estimation
Necessary to evaluate reliability of survey estimates
Possible only when probability sampling is used
Should be done for 30-50 important indicators
Methodology is complex and design-specific
Several software packages:
– SPSS Complex Samples module – used in MICS
– SAS, Stata, SUDAAN, Clusters, WesVar, CENVAR,
PCCarp, etc.
• Standard error, confidence intervals and DEFF
Sampling Error Estimation
SPSS Complex Samples module
• Advantages:
– Simple to use
– Template syntax available for standard indicators
– Supported by MICS Global and Regional staff
• Steps:
– Set up sampling parameter specifications file
– Define variables for stratum, PSU and weight
Sampling Error Estimation
SPSS Complex Samples module
• Stratum should be lowest level of explicit
stratification (for example, province,
• Necessary to have minimum of two sample
PSUs per stratum
Reducing bias
• Accuracy of survey results depends on both variance and bias
(mostly from nonsampling errors)
• Bias should be minimized with quality control for all survey
• Basic data quality determined during enumeration
– Important to have good training and supervision in the field
• Data capture should include 100% or sample verification
• Important to have quality control for editing and coding
• Computer consistency and range checks