Observation and Reflection on Official Statistics against Big Data Challenge
Download
Report
Transcript Observation and Reflection on Official Statistics against Big Data Challenge
Observation and Reflection on Official
Statistics against Big Data Challenge
Yuan Pengfei
Research Institute of Statistical Sciences
National Bureau of Statistics of China
26 October 2013
1
The situation and our preparation
Big data rolls toward us.
In recent years,we constantly strengthen
the construction of statistical informatization
in NBS.
2
The characteristics of big data
Most of the big data are from automated generation.
There are many data sources of big data.
Unstructured data have taken a large proportion of
big data.
The value of big data need to be filtrated and
extracted.
From 3V to 6V.
3
Why from 3V to 6V
Volume: data volume is huge.
Value: application value is huge.
Variety: data types are various.
Velocity: processing speed is rapid.
Vender: the acquisition and transmission of
big data are flexible.
Veracity: veracity and accuracy.
4
The challenges and influences:
About system design
Statistical standard.
Statistical indicators.
Statistical range.
Statistical method.
5
The challenges and influences:
About data collection
By searching on the internet.
By purchasing.
By cooperation.
6
The challenges and influences:
About data processing
We must try our best to explore the methods and
techniques on how transform unstructured data into
structured ones.
7
The challenges and influences:
About data storage
Processing big data with high capacity,
high speed and complexity requires server
cluster to support a variety of tools.
Cloud computing is generally considered
as the most economical way.
8
The challenges and influences:
About data quality assessment
Accuracy
Timeliness
Applicability
Economy
9
The challenges and influences:
About data release
Release will be more timely.
The choice of release media will be more
diverse.
The content of release must be richer.
10
Some ideas for application:
CPI statistics
To collect online transaction price data by
searching on the internet.
To explore the cooperation with online stores,
thus to acquire online transaction price data.
To establish a system on which malls,
supermarkets and hospitals can submit their
transaction records to official statistical
departments.
11
12
13
Some ideas for application:
PPI statistics
Collecting relevant online data by means of
searching, thus provide useful supplements for the
compiling of PPI in NBS.
Establishing cooperation with related companies,
thus to collect the price information of related
industries for the evaluation and validation of PPI.
14
钢联大宗商品指数
领先1个半月左右
%
15
10
5
0
(5)
(10)
2008-1
2008-5
2008-9
2009-1
2009-5
2009-9
2010-1
2010-5
2010-9
2011-1
2011-5
2011-9
2012-1
2012-5
2012-9
40
30
20
10
0
(10)
(20)
(30)
(40)
2008-1
2008-5
2008-9
2009-1
2009-5
2009-9
2010-1
2010-5
2010-9
2011-1
2011-5
2011-9
2012-1
2012-5
2012-9
%
上海钢联中国大宗商品价格指数
PPI同比(右)
上海钢联中国大宗商品价格指数
PM I指数(右)
40
30
20
10
0
(10)
(20)
(30)
(40)
60
55
50
45
40
35
15
Some ideas for application:
Employment survey
Statistical analysis on big data related to
employment on the internet will, to some extent,
be very useful for learning about the situation in
the labor market.
16
Some ideas for application:
Agricultural statistics
The application of spatial data.
The application of data on the network of things.
The application of data on the Internet.
17
18
Some ideas for application: Wholesale
and retail statistics
Collecting the base data of E-commerce
transactions, including quality assessment.
Adding the indexes reflecting E-commerce
transactions into statistical report forms, such as
total volume of E-commerce transactions.
Building E-commerce index reflecting the level
of E-commerce transactions.
19
Some ideas for application:
Transportation statistics
Making use of the data collected from various
transportation infrastructures.
Making use of the data recorded and
transmitted by vehicles.
Making use of the data generated from the
object of transportation service.
20
Thank You!
21