Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group [email protected] 6 June 2013
Download ReportTranscript Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group [email protected] 6 June 2013
Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group [email protected] 6 June 2013 Initial objective • Calculate poverty PPPs • Had price data at basic heading level from the ICP ; needed consumption shares “at the poverty line” for the same breakdown to be used as weights. • See: A. Deaton and O. Dupriez, Purchasing power parity exchange rates for the global poor, American Economic Journal: Applied, vol. 3, pp. 137-166 (2011), and also Global Poverty and Global Price Indexes Intermediary output – data files • A collection of “standard” files – Individual level: age, sex – Household level: region, total expenditure (before and after fixing outliers), adult equivalents, hhld size, etc – Household + product level: • Product code (original as in questionnaire, with labels) and COICOP code • Value purchased, home produced, received, total • Deflated (when available) / non deflated • NO information on quantities – Format/structure of the data files is standard; content not so much Multiple uses and users • Many potential applications – IFC “Business Opportunities at the Base of the Pyramid” – Micro-macro modeling – Poverty/inequality analysis – Assessment of reliability and relevance of surveys • E.g., list all items related to health with percentage of respondents, for each survey • E,g, list all categories not covered by questionnaires – And many more Method • Use household consumption/expenditure surveys – A VERY divers set of surveys (HBS, LSMS, HIES, etc) – Ex-post harmonization has limits • Map all products and services to COICOP – From 6000+ items in Brazil survey to less than 50 in other countries… • Annualize values by product/service and hhld • Fix outliers • No attempt to fill gaps (no imputation of values for missing products/services) • Generate the 3 standard files Principle – Full replicability • One single Stata program per survey – Calls one “generic” program to detect and fix outliers • Controlled vocabulary for file names, folder names • Survey ID to link to on-line metadata catalog Mapping to COICOP • ICP/COICOP: 110 basic headings for household consumption • 105 are relevant for household surveys • Situations: Many to one (e.g., long list of vegetables) One to one One to many (lack of detail in questionnaire) No data to one (questionnaire missed items) Grouped categories • One to many: items in questionnaires are not always detailed enough to be mapped to one single COICOP basic heading Missing categories • No questionnaire found to cover all 105 categories of products and services • On average, N basic headings missing – Sometimes for know reasons (e.g., pork in muslim countries) – But questionnaire design needs improvement in all countries Splitting grouped categories • Used breakdown from national accounts to split grouped categories (data obtained from ICP) Correlation between SNA and surveys • From almost perfect (very few cases) to very low (many countries) Bangladesh 2000 Survey .05 .1 .05 0 0 Survey .1 .15 .2 .15 Thailand 2002 0 .05 .1 National accounts .15 .2 0 .05 .1 National accounts .15 Annualization challenges • Some problematic items: – Durables (use value/expenditure) – Imputed rents – Out of pocket health expenditure – Ceremonies, etc. – Food away from home • Validation: compare with official estimates when available, and with PovCal aggregates – Never replicate exactly Detecting and fixing outliers • Top outliers only • Tried multiple options • Based on per capita or per household depending on item • 75th percentile + 5 times interquartile range • Replace with maximum valid value (zero values not included in calculations) • If outlier for multiple items, consider “rich” household and do not fix • Would deserve a specific research project Outliers fixing – Significant impact • Example: change in Ginis http://datavizint.worldbank.org/t/DECDG/views/GiniAnalyses/Ginis?:embed=y&:display_count=no Past and future • 160 datasets “standardized” – 90+ low and middle-income countries • Many more survey datasets available at WB; could expand and update the collection if resources are available • Conduct in-depth research work on outliers and formulate recommendations to countries • Feedback to countries on issues in questionnaire design • Dissemination of microdata?