Slide - IAOS 2014 Conference

Download Report

Transcript Slide - IAOS 2014 Conference

Bridging economic statistics with people:

A role for alternative sources of data?

Zeynep Orhun Girard Statistician, ESCAP Statistics Division IAOS, Danang Viet Nam 9 October, 2014 DISCLAIMER: The views presented here are the author’s and do not necessarily reflect the views and position of the United Nations.

“No wind favors he who has no destined port”

Michel de Montaigne

“We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns science cannot. […] Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all”. Chris Anderson Editor of Wired Magazine

For official statistics to extract value from alternative sources of data like Big data

1) It has to be guided closely by statistical policy 2) with the goal of filling actual methodological and data gaps in different domains of statistics

Methodological/policy developments are guiding economic statistics Macroeconomic statistical frameworks are constantly updated, e.g. SNA 3 key policy-related initiatives are shaping the future of economics statistics 1936 1947 1952 1968 1993 2008

- Input-Output analysis - First econometric model of business cycle and the General Theory - Report on measurement of national income and the construction of social accounts SNA published - Allowed for national statistical policies, recommended IOT and constant prices - Introduced satellite accounts - Some non-market production in production boundary - Concept of employment introduced in the sub-sectoring of household sector - Use of PPPs for international comparison - Balance sheets and SAMs - Chapter on informal aspects of economy

SSF Commission Report G-20 Data Gaps Initiative Post-2015 development agenda

• • Five recommendations on material wellbeing Follow-up work on disparities in national accounts, distribution of Household Income, Consumption and Wealth (OECD) • Recommendations 15-20 on Sectoral and Other Financial and Economic Datasets • • Data revolution for targeted policy making Measurement of progress on sustainable development that complement GDP (SGD17)

We have witnessed a move towards an integrated approach to statistics and an emphasis of the household perspective and the distributional aspects of economic activity

Big Data: 3 v’s yes but not only…

• • • • •

Exhaustiveness in scope (n=all) Granularity Indexical in identification Relational Flexible in fields and scalable in size

Big data and economic statistics so far?

Data sources

Online search queries/web scraping

Substantive areas

Housing market, labour market, prices

Methodologies/results

Correlations and predictive modelling 𝑉 𝑡 𝑜𝑟 𝑡+1 = 𝑓(𝑃𝑟𝑜𝑥𝑦 𝑡−1, 𝑡−2,… , 𝑉𝑎𝑙𝑢𝑒 𝑡−1, 𝑡−2,… )

Use of some big data sources for

economic statistics 1. Housing market (Google Trends)

Bank of England: McLaren and Schanbhogue (2011) – Wu and Brynjolfsson (2009)

2. Labour/employment market (Google Trends and Word Tracker)

– Bank of England: McLaren and Schanbhogue (2011) – D’Amuri and Marcucci (2009) – Askitas (2009) – Ettredge et al. (2005)—Word Tracker

3. Prices (Scraping and non-traditional enumeration)

– Billion Prices – Premise (hybrid)

Common points of these studies

• • • Compare aggregate trends of online search data against official/administrative statistics Emphasize correlation rather than causality Find that that online search data can predict observed trends within the appropriate lead time (depends on the individuals and area of economic statistics)

What can big data do for economic statistics?

Beyond correlations and predictive modelling:

1. Enhance quality and granularity of economic statistics?

– Increase resolution and distributional information, e.g. demographics and geographical location

2. Enhance availability of economic statistics?

– Example: Components of a household balance sheet, e.g. consumer durables

Selecting the Main Source of Data

Define measurement objective

based on policy question, e.g. distribution of wealth across different quintiles of households at provincial level Identify approach based on statistical policy

Identify main data source based on FPOS and QAF

(Relevance, accuracy, timeliness, punctuality, accessibility, clarity, and comparability and consistency over time) + Cost efficiency Existing dataset Traditional Data Source (surveys, administrative records, registers) Design new data collection Data requirement X Alternative Data Source Big data set

Using big data for distributional aspect Select dataset

Example • Online search keyword, e.g. “insurance” and “repair/garage” for automobiles, yellow pages data for business address searches • Test correlations with any existing official statistics/other data source, e.g. household surveys covering consumer durables

Select variable of disaggregation

Example • Location, sex, age, etc.

• Test distribution of groups by demographic characteristics • Population Census data and demographic distribution at the national and sub-national levels • Household Income and Expenditure Data for the item in question, e.g. vehicle ownership and its distribution

Apply in analysis

Example • Use distribution of vehicle ownership obtained through big data sources on macroeconomic aggregates

Using big data for enhancing data availability Select dataset

Example • Value of vehicle owned through purchase and repair data, e.g. insurance databases

Process data

Example • Blow up to national (if possible sub-national) level figures • Calculate depreciation • Differentiate household enterprises

Apply in analysis

• In construction of balance sheets • Memo item for national accounts

Challenges: Big data in official statistics

• • • • Shift from planned data collection activities Possible mismatch between what big data can offer and what the economic policy makers need (comprehensiveness and comparability) Privacy of individuals and confidentiality of data Lack of code of conduct covering all stakeholders (public and private)

Opportunities: Big data in official statistics

• • • In the policy context we live in we need to integrate different data sources Alternative sources of data can respond to such needs (exhaustive, relational, flexible and scalable) Maintaining TRUST of individuals is key – “Fifty-four per cent of global consumers indicated that they would be comfortable with the use of information about them if they believed that the uses would not embarrass them, damage their interests, or otherwise harm them” (BCG Global Consumer Sentiment Survey 2013)

Conclusions

1. Big data to complement official statistics a. Conduct research for innovative statistics development; b. Provide quality insights through data confrontation and; c. Enhance availability of data by closing data gaps. 2. Statistical policy & actual methodological and data gaps need to guide big data research to allow for meaningful results that can be used 3. Big data has a potential role to bring in the distributional and household aspect to economic statistics

Next steps?

• • Multiply the number of proposals embedded in methodological and data needs Conduct studies with official and private sources of data

Thanks and for comments/questions: Zeynep Orhun Girard

[email protected]