When Urban Air Quality Meets Big Data Yu Zheng Lead Researcher, Microsoft Research.

Download Report

Transcript When Urban Air Quality Meets Big Data Yu Zheng Lead Researcher, Microsoft Research.

When Urban Air Quality Meets Big Data

Yu Zheng

Lead Researcher, Microsoft Research

Background

• • • Air quality – NO2, SO2 – Aerosols: PM2.5, PM10 Why it matters – Healthcare – Pollution control and dispersal Reality – Building a measurement station is not easy – A limited number of stations (poor coverage) Beijing only has 15 air quality monitor stations in its urban areas (50kmx40km) Air quality monitor station

2PM, June 17, 2013

Challenges

• • Air quality varies by locations non-linearly Affected by many factors – Weathers, traffic, land use… – Subtle to model with a clear formula

A) Beijing (8/24/2012 - 3/8/2013)

0.30

0.25

0.20

0.15

0.10

0.05

>35% 0.00

0 40 80 120 160 200 240 280 320 360 400 Deviation of PM2.5 between S12 and S13 440 480

We do not really know the air quality of a location without a monitoring station!

30,000 + USD, 10ug/m 3 202 × 85 × 168 ( mm )

Inferring

Real-Time

and

Fine-Grained

air quality throughout a city using

Big Data

Meteorology Traffic Human Mobility POIs Road networks Historical air quality data Real-time air quality reports

http://www.uairquality.com/

Applications

• • • Location-based air quality awareness – Fine-grained pollution alert – Routing based on air quality Identify candidate locations for setup new monitoring stations A step towards identifying the root cause of air pollution

S1 S6 S2 S7 S8 S9 S1 S4 S10 S3 S5

B) Shanghai

Cloud

MS Azure

Cloud + Client

Clients

http://urbanair.msra.cn/

Difficulties

• Incorporate multiple heterogeneous data sources into a learning model – Spatially-related data: POIs, road networks – Temporally-related data: traffic, meteorology, human mobility • Data sparseness (little training data) – Limited number of stations – Many places to infer • Efficiency request – Massive data – Answer instant queries

Methodology Overview

• • Partition a city into disjoint grids Extract features for each grid from its impacting region – Meteorological features – – Traffic features Human mobility features – – POI features Road network features • Co-training-based semi-supervised learning model for each pollutant – – – Predict the AQI labels Data sparsity Two classifiers

Semi-Supervised Learning Model

• Philosophy of the model – States of air quality • Temporal dependency in a location • Geo-correlation between locations – Generation of air pollutants • • Emission from a location Propagation among locations – Two sets of features • Spatially-related • Temporally-related Spatial Classifier Temporal Classifier

s

1

s

2

t i t

2

s

1

s

2

t

1

s

1

s

2

l l l s

4

s

3

s

4

s

3

s

3

s

4 G eo sp ac e A location with AQI labels A location to be inferred Temporal dependency Spatial correlation Road Networks:

F r

POIs:

F p

Spatial

Traffic:

F t

Meteorologic: Human mobility:

F h F m

Temporal

Evaluation

• Datasets

POI Data sources

2012 Q1 2012 Q3 #.Segments

Highways

Road

Roads #. Intersec.

#. Station Hours

AQI

Time spans

Urban Size (grids) Beijing

271,634 272,109 162,246 1,497km 18,525km 49,981 22 23,300 8/24/2012 3/8/2013 50 × 50km (2500)

Shanghai

321,529 317,829 171,191 1,963km 25,530km KM 70,293 10 8,588 1/19/2013 3/8/2013 50 × 50km (2500)

Shenzhen

107,061 107,171 45,231 256km 6,100km 32,112 9 6,489 2/4/2013 3/8/2013 57 × 45km(2565)

Wuhan

102,467 104,634 38,477 1,193km 9,691km 25,359 10 6,741 2/4/2013-3/8/2013 45 × 25km (1165)

S1 S5 S4 S2 S6 S16 S8 S15 S7 S21 S13 S12 S6 S14 S22 S20 S9 S10 S11 S3 S16 S18 S17 S19 A) Beijing S6 S2 S8 S7 S1 S9 S1 S4 S10 B) Shanghai S3 S5 S5 S6 S1 S4 S2 S8 S9 S3 C) Shenzhen S7 S10 S2 S1 S5 S3 S6 S7 S8 S9 S4 D) Wuhan

Evaluation

Overall performance of the co-training

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

U-Air DT Linear CRF-ALL Guassian ANN-ALL Classical PM10 NO2 0.80

0.75

SC TC Co-Training 0.70

0.65

0 20 40 60 80 100 Num. of Iterations 120 140 160

Status

• • • • • • Publication at KDD 2013: U-Air: when urban air quality inference meets big data Website is publicly available via Azure A mobile client ”

Urban Air

” n WP App store Component of Urban Air is in CityNext platform On Bing Map China Now Working on prediction http://urbanair.msra.cn/

Thanks!

Yu Zheng

[email protected]

Homepage