Transcript When Urban Air Quality Meets Big Data Yu Zheng Lead Researcher, Microsoft Research.
When Urban Air Quality Meets Big Data
Yu Zheng
Lead Researcher, Microsoft Research
Background
• • • Air quality – NO2, SO2 – Aerosols: PM2.5, PM10 Why it matters – Healthcare – Pollution control and dispersal Reality – Building a measurement station is not easy – A limited number of stations (poor coverage) Beijing only has 15 air quality monitor stations in its urban areas (50kmx40km) Air quality monitor station
2PM, June 17, 2013
Challenges
• • Air quality varies by locations non-linearly Affected by many factors – Weathers, traffic, land use… – Subtle to model with a clear formula
A) Beijing (8/24/2012 - 3/8/2013)
0.30
0.25
0.20
0.15
0.10
0.05
>35% 0.00
0 40 80 120 160 200 240 280 320 360 400 Deviation of PM2.5 between S12 and S13 440 480
We do not really know the air quality of a location without a monitoring station!
30,000 + USD, 10ug/m 3 202 × 85 × 168 ( mm )
Inferring
Real-Time
and
Fine-Grained
air quality throughout a city using
Big Data
Meteorology Traffic Human Mobility POIs Road networks Historical air quality data Real-time air quality reports
http://www.uairquality.com/
Applications
• • • Location-based air quality awareness – Fine-grained pollution alert – Routing based on air quality Identify candidate locations for setup new monitoring stations A step towards identifying the root cause of air pollution
S1 S6 S2 S7 S8 S9 S1 S4 S10 S3 S5
B) Shanghai
Cloud
MS Azure
Cloud + Client
Clients
http://urbanair.msra.cn/
Difficulties
• Incorporate multiple heterogeneous data sources into a learning model – Spatially-related data: POIs, road networks – Temporally-related data: traffic, meteorology, human mobility • Data sparseness (little training data) – Limited number of stations – Many places to infer • Efficiency request – Massive data – Answer instant queries
Methodology Overview
• • Partition a city into disjoint grids Extract features for each grid from its impacting region – Meteorological features – – Traffic features Human mobility features – – POI features Road network features • Co-training-based semi-supervised learning model for each pollutant – – – Predict the AQI labels Data sparsity Two classifiers
Semi-Supervised Learning Model
• Philosophy of the model – States of air quality • Temporal dependency in a location • Geo-correlation between locations – Generation of air pollutants • • Emission from a location Propagation among locations – Two sets of features • Spatially-related • Temporally-related Spatial Classifier Temporal Classifier
s
1
s
2
t i t
2
s
1
s
2
t
1
s
1
s
2
l l l s
4
s
3
s
4
s
3
s
3
s
4 G eo sp ac e A location with AQI labels A location to be inferred Temporal dependency Spatial correlation Road Networks:
F r
POIs:
F p
Spatial
Traffic:
F t
Meteorologic: Human mobility:
F h F m
Temporal
Evaluation
• Datasets
POI Data sources
2012 Q1 2012 Q3 #.Segments
Highways
Road
Roads #. Intersec.
#. Station Hours
AQI
Time spans
Urban Size (grids) Beijing
271,634 272,109 162,246 1,497km 18,525km 49,981 22 23,300 8/24/2012 3/8/2013 50 × 50km (2500)
Shanghai
321,529 317,829 171,191 1,963km 25,530km KM 70,293 10 8,588 1/19/2013 3/8/2013 50 × 50km (2500)
Shenzhen
107,061 107,171 45,231 256km 6,100km 32,112 9 6,489 2/4/2013 3/8/2013 57 × 45km(2565)
Wuhan
102,467 104,634 38,477 1,193km 9,691km 25,359 10 6,741 2/4/2013-3/8/2013 45 × 25km (1165)
S1 S5 S4 S2 S6 S16 S8 S15 S7 S21 S13 S12 S6 S14 S22 S20 S9 S10 S11 S3 S16 S18 S17 S19 A) Beijing S6 S2 S8 S7 S1 S9 S1 S4 S10 B) Shanghai S3 S5 S5 S6 S1 S4 S2 S8 S9 S3 C) Shenzhen S7 S10 S2 S1 S5 S3 S6 S7 S8 S9 S4 D) Wuhan
Evaluation
•
Overall performance of the co-training
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
U-Air DT Linear CRF-ALL Guassian ANN-ALL Classical PM10 NO2 0.80
0.75
SC TC Co-Training 0.70
0.65
0 20 40 60 80 100 Num. of Iterations 120 140 160
Status
• • • • • • Publication at KDD 2013: U-Air: when urban air quality inference meets big data Website is publicly available via Azure A mobile client ”
Urban Air
” n WP App store Component of Urban Air is in CityNext platform On Bing Map China Now Working on prediction http://urbanair.msra.cn/
Thanks!
Yu Zheng
Homepage