Transcript Slide 1

Workshop on MDG Monitoring
Kampala, 5-8 May 2008
MDG data at the sub-national level:
relevance, challenges and IAEG
recommendations
United Nations Statistics Division
Contents
● IAEG recommendations
● Relevance and challenges of sub-national
data
● Examples
● Data sources
● Combining data sources
● GIS
● Conclusion
IAEG recommendations
●
The Inter-agency and Expert Group Meeting on
MDG Indicators
–
–
Recognized that sub-national data are needed for
showing differences within countries and for helping
countries to better allocate their resources.
In order to improve the availability of reliable sub-national
data, recommended the following:
• To draw up recommendations regarding the use of
censuses to localize the MDGs as well as the use of
small area estimation when data are not available;
• To investigate the availability of sub-national data.
Sub-national data
Relevant
● Key to identify disparities
within the country;
discrepancies can be
substantial (e.g. urban/rural)
● Helps countries to better
allocate their resources;
● Permits identifying areas
which should be prioritized in
policy interventions.
Challenging
● More resources
needed
–
–
Statistical capacity
Cost
● Methodological
difficulties
–
–
Sample design
Variability of
estimates
Sub-national data – Example 1
Relevant
●
Literacy total population:
93.3%
–
–
National average masks
variation within the
country
Population density drives
the national average
Sub-national data – Example 2
Challenging for health
indicators (deaths,
disease cases)
●
Areas with very high rates
are very close to areas with
very small rates
●
Are these dramatic contrasts
real?
Sub-national data – Example 2
Counting of random events (like deaths, disease
cases)
●
Observation behave like Poisson distributions because their
counts of random events
Nr infant deaths ~ Poisson (Nr births  Infant mortality rate)
●
●
●
Feature of Poisson distribution: mean = variance. This
implies that variance (nr infants deaths/nr births) is inversely
proportional to the number of births
Thus, the lower the number of births the higher the variability
of the infant mortality rate
Statistical artifact: Areas with the smaller number of births
are those with the lowest/highest rates infant mortality rates
Sub-national data – Example 2
Infant mortality in Portuguese counties
The lower the number of births the higher
the variability of the infant mortality rate
Infants 30
mortality 25
rate 20
Policy makers should be aware of this
statistical artifact
15
10
5
0
0
2000
4000
Number of births
6000
8000
Sub-national data – Example 2
How to cope with this statistical artifact
● Aggregate area with very small number of births,
so that all areas have approximately the same
number of births.
● Use smoothing methods, which produce estimates
for small areas taking into account the Poisson
variability.
Data sources
● Censuses
–
Universal coverage permits to obtain data for very small
areas (as long as confidentiality is not compromised);
● Administrative records
–
Sometimes have a close to universal coverage (e.g. civil
registration);
● Surveys
–
Larger sample sizes are required to provide estimates for
small areas (cost can be prohibitive). Defining prior to
survey the small areas needed in essential if sample
sizes are not too large.
Sub-national data
Census
Surveys
Pros
Topics covered in more detail;
Universal coverage permits to
More frequent;
obtain data for very small
Less expensive.
areas (as long as
Pros
confidentiality is not
compromised);
Cons
Infrequent (usually, every ten
years);
Few topics covered and with
little detail;
Costly.
Cons
Can not be used for small
areas unless the sample
sizes are large and planning
of small areas is done in
advance of survey taking.
Combining data from censuses and surveys
Combining data from censuses and surveys
Pros
Topics covered in more detail;
Universal coverage permits to
More frequent;
obtain data for very small
Less expensive.
areas (as long as
Pros
confidentiality is not
compromised);
Combining data sources - Example
Poverty maps
● Use survey data to:
–
Fit (regression) a model of logarithm of household
consumption/income with independent variables which
are common to the census and the survey (national
level).
● Use census data to:
–
Use the model above to predict for each small area the
logarithm of household consumption/income with
independent variables which are common to the census
and the survey (for each small area).
Combining data sources - Example
Use survey data to:
● Estimate a and b in the model:
Income/consumption = a + b f(x) + e,
where x are common variables
between census and survey which
are good predictors of income.
Combining data sources - Example
Use census data to:
● Predict income/consumption for
each small area, using the
estimated values of a and b and
the model:
Income/consumption = a + b f(x) + e,
Estimate of household
income/consumption for
small area
Census
data
Combining data from censuses and surveys
● Can provide small area estimates for topics not
included in the census.
However,
● There may be lack of consistency between the
definitions used in the surveys and those used in the
censuses. The impact of this should be carefully
assessed.
● Census and surveys may not be synchronized: they
may be conducted at periods quite distant in time.
Geographic information systems
● In order to present the disaggregated information on
maps, one needs to have some kind of geographic
location coordinate for each observation.
● Geographic information systems (GISs) are useful
computer software programs to handle
geographically referenced data as they use
geographic location as a reference for each
database record.
Geographic information systems
● These systems are used to integrate information
from:
–
–
Very different sources (e.g. surveys, census,
administrative data, satellite images, etc.) into a single
platform, where each observation is matched with the
identifier of the area it covers.
Data observed at different levels. For instance, poverty
status might be observed at the district level while
climate is recorded at the level of agro-climatic zones.
Conclusions
●
Small area estimates may be costly to produce from surveys.
They require larger samples sizes and preplanning of small
areas prior to survey taking.
●
Combining surveys with other data sources with universal
coverage (like census) may be an option. Administrative
sources can also be useful if they have a good coverage.
●
Maps with small area statistics can be misleading for health
indicators such as deaths and number of disease cases. High
and low rates may be a consequence of areas with small
population.
●
GIS is an useful tool to handle geographic data and to produce
small area estimates.
THANKS!