Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group odupriez@worldbank.org 6 June 2013

Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group [email protected] 6 June 2013

Transcript Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group [email protected] 6 June 2013

Creating a collection of
standardized datasets on
household consumption
Olivier Dupriez
World Bank, Development Data Group
[email protected]
6 June 2013
Initial objective
• Calculate poverty PPPs
• Had price data at basic heading level from the ICP
; needed consumption shares “at the poverty
line” for the same breakdown to be used as
weights.
• See: A. Deaton and O. Dupriez, Purchasing power
parity exchange rates for the global poor,
American Economic Journal: Applied, vol. 3, pp.
137-166 (2011), and also Global Poverty and
Global Price Indexes
Intermediary output – data files
• A collection of “standard” files
– Individual level: age, sex
– Household level: region, total expenditure (before and
after fixing outliers), adult equivalents, hhld size, etc
– Household + product level:
• Product code (original as in questionnaire, with labels) and
COICOP code
• Value purchased, home produced, received, total
• Deflated (when available) / non deflated
• NO information on quantities
– Format/structure of the data files is standard; content
not so much
Multiple uses and users
• Many potential applications
– IFC “Business Opportunities at the Base of the
Pyramid”
– Micro-macro modeling
– Poverty/inequality analysis
– Assessment of reliability and relevance of surveys
• E.g., list all items related to health with percentage of
respondents, for each survey
• E,g, list all categories not covered by questionnaires
– And many more
Method
• Use household consumption/expenditure surveys
– A VERY divers set of surveys (HBS, LSMS, HIES, etc)
– Ex-post harmonization has limits
• Map all products and services to COICOP
– From 6000+ items in Brazil survey to less than 50 in
other countries…
• Annualize values by product/service and hhld
• Fix outliers
• No attempt to fill gaps (no imputation of values
for missing products/services)
• Generate the 3 standard files
Principle – Full replicability
• One single Stata program per survey
– Calls one “generic” program to detect and fix
outliers
• Controlled vocabulary for file names, folder
names
• Survey ID to link to on-line metadata catalog
Mapping to COICOP
• ICP/COICOP: 110 basic headings for household
consumption
• 105 are relevant for household surveys
• Situations:
 Many to one (e.g., long list of vegetables)
 One to one
 One to many (lack of detail in questionnaire)
 No data to one (questionnaire missed items)
Grouped categories
• One to many: items in questionnaires are not
always detailed enough to be mapped to one
single COICOP basic heading
Missing categories
• No questionnaire found to cover all 105
categories of products and services
• On average, N basic headings missing
– Sometimes for know reasons (e.g., pork in muslim
countries)
– But questionnaire design needs improvement in
all countries
Splitting grouped categories
• Used breakdown from national accounts to
split grouped categories (data obtained from
ICP)
Correlation between SNA and surveys
• From almost perfect (very few cases) to very
low (many countries)
Bangladesh 2000
Survey
.05
.1
.05
0
0
Survey
.1
.15
.2
.15
Thailand 2002
0
.05
.1
National accounts
.15
.2
0
.05
.1
National accounts
.15
Annualization challenges
• Some problematic items:
– Durables (use value/expenditure)
– Imputed rents
– Out of pocket health expenditure
– Ceremonies, etc.
– Food away from home
• Validation: compare with official estimates
when available, and with PovCal aggregates
– Never replicate exactly
Detecting and fixing outliers
• Top outliers only
• Tried multiple options
• Based on per capita or per household depending
on item
• 75th percentile + 5 times interquartile range
• Replace with maximum valid value (zero values
not included in calculations)
• If outlier for multiple items, consider “rich”
household and do not fix
• Would deserve a specific research project
Outliers fixing – Significant impact
• Example: change in Ginis
http://datavizint.worldbank.org/t/DECDG/views/GiniAnalyses/Ginis?:embed=y&:display_count=no
Past and future
• 160 datasets “standardized” – 90+ low and
middle-income countries
• Many more survey datasets available at WB;
could expand and update the collection if
resources are available
• Conduct in-depth research work on outliers and
formulate recommendations to countries
• Feedback to countries on issues in questionnaire
design
• Dissemination of microdata?

Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group [email protected] 6 June 2013

Transcript Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group [email protected] 6 June 2013

Directory