Transcript PPT
New approaches for data collection and analyses
Per Nymand-Andersen
European Central Bank, Directorate General Statistics
CCSA session on International Statistics
Ankara, 5 SEPTEMBER 2013
Rubric
Agenda
1
Exploring statistics from the internet
2
Characteristics of the statistics
3
Exploring the statistics for analytical purposes
4
Preliminary results
5 Lessons learned and way forward
www.ecb.europa.eu
1Rubric
Exploring statistics from the internet
Using Google Trends data - http://www.google.com/trends
Increasing use of internet data for conducting consumer analysis
and as predictor for selective macro-economics indicators
The majority of literature is based on Google search; a database
storing the terms used in Google search (Search, YouTube,
Images)
Could be useful for now casting and short term forecasting of
consumer trends mainly where statistics is not available or to
gauge directions prior to official statistics is released
www.ecb.europa.eu
1Rubric
Exploring statistics from the internet
Using Google Trends data - http://www.google.com/trends
Free public available dataset; search per country, category, period
Google taxonomy of 256 categories (“jobs” including “job
listings” “career resources and planning”, “resumes & portfolios”,
“developing jobs)
Overview of increases and decreases in the use of search category
in real time (normalised within search categories)
www.ecb.europa.eu
2Rubric
Characteristics of the statistics
Using Google Trends Data - http://www.google.com/trends
www.ecb.europa.eu
2Rubric
Characteristics of the statistics
www.ecb.europa.eu
3Rubric
Exploring the statistics for analytical purposes
Using Google Trends data - http://www.google.com/trends
“Nowcasting unemployment rate in Turkey: let’s ask Google”
Meltem Gülenay Chadwick & Gönül Sengül (June 2012)
Central Bank of the Republic of Turkey.
Linear regression models and Bayesian Model Averaging to
nowcast non agriculture unemployment rate in Turkey.
Finds that using the Google trends perform statistically better
than using a benchmark model both in-sample and out of
sample results (RMSE)
www.ecb.europa.eu
3Rubric
Exploring the statistics for analytical purposes
New and increasing field for experimental nowcasting for mainly
consumption and selective macro-indicators
→ Since 2008, research institutions and universities are using
Google trends data: Ginsberg (2008) → influenza epidemics),
Hal Varian and Choi (2009) → retail sales, home sales, travel.
Vosen & Schmidt (2011) → private consumption in Germany
Carriere-Swallow (2011) → car purchases in Chile
Lynn Wu & Erik Brynjolfsson) → UK housing prices & sales
Hal Varian and Choi → unemployment rate in US
Hyunyoung Choi, Rob ON, Hal Varian (2011) – CPI !
www.ecb.europa.eu
3Rubric
Exploring the statistics for analytical purposes
ECB’s on-going research: “Nowcasting European Unemployment Using
Internet Search Data” (Morgan, Muzikarova & Onorante, 2013)
Data: individual Google Trends internet searches for DE, FR, IT, ES, and NL
starting in 2004; weekly & monthly frequency
Deliverable: euro area aggregate (using German, French, Italian, Spanish &
Dutch search terms) as an early diagnostic tool for euro area unemployment
Empirical method to assess each search term’s (or their combination)
explanatory power for unemployment: Bayesian Model Averaging
averaging models by their in-sample RMSE (hedging against
misspecification)
Tentative conclusions: Google appear informative, can substantially improve
on autoregressive models. The reduction in RMSFE in nowcasting varies
across countries but can reach 80% compared to the naïve model
www.ecb.europa.eu
4Rubric
Preliminary results
Usability
• nowcasting of retail
consumption and selective
macro-economic indicators
• conjunctural analysis
• consumer behaviour
• price index of products
• public and free, easy to use
Availability • one system for all countries
• comparability & timeliness
• large taxonomy of searches
Innovation
•
•
•
•
trends in communications
product loyalty
advertisement
social pattern in retail markets
www.ecb.europa.eu
4Rubric
Preliminary results
Robustness
Methodology
Quality
• stability of search terms
• volatility in analytical
results
• based on 1 search engine
• coverage and weights
• aggregation methods
• price information
• short time series
• differ across region
• no measurements; age
dependence
• rebasing and time lag
• home and host concept
www.ecb.europa.eu
5Rubric
Lessons learned and way forward
large potential for exploring new causality in understanding
consumer behaviour, retail market and certain macroeconomic
statistics, and ability to build new consumer indicators, indexes
of certain product classes and new economic consumer theories
Predominate results are tested for unemployment, tourism,
private consumption and housing markets
increasing use and developing literature
www.ecb.europa.eu
5Rubric
Lessons learned and way forward
applying data and statistics from the internet is subject to
obtaining sufficient information on the methodology applied
(new private data sources may consider this as an intellectual
competitive advantage)
new ideas for statistics input are always meet with a degree of
scepticism
simple, cheap and easy to put into statistics production
challenges the statistics communication function
creates dependencies though always free in the start up phase
Statisticians may need to explore private sources in meeting
increasing user demands for statistics
www.ecb.europa.eu