Driving with Knowledge from the Physical World Jing Yuan, Yu Zheng Microsoft Research Asia.

Download Report

Transcript Driving with Knowledge from the Physical World Jing Yuan, Yu Zheng Microsoft Research Asia.

Driving with Knowledge from
the Physical World
Jing Yuan, Yu Zheng
Microsoft Research Asia
What We Do
Finding the customized and practically fastest driving
route for a particular user using
(Historical and real-time) Traffic conditions
Driver behavior (of taxi drivers and end users)
Physical Routes
Traffic flows
Drivers
Application Scenarios
Driver A
Driver A
8:30
Driver B
13:20
13:20
Application Scenarios
Driver B
Driver B
13:20
13:20
Log user B’s
driving routes
for 1 month
Motivation
Taxi drivers are experienced drivers
GPS-equipped taxis are mobile sensors
Human Intelligence
Traffic patterns
Rank
Cities
Country/Region
Taxicabs
1
The Mexico city
Mexico
103,000+
2
Bangkok
Thailand
80,000+
3
Seoul
South Korea
73,000+
4
Beijing
China
67,000
5
Tokyo
Japan
60,000
6
Shanghai
China
50,000+
7
New York City
USA
48,300
8
buenos aires
Argentina
45,000
9
Moscow
Russia
40,000 (1000,000)
10
St.Paul
Brazil
37,000
11
Tianjin
China
35,000
12
Taipei
Taiwan
31,000+
13
New Taipei City
Taiwan
23,500
14
Singapore
Singapore
23,000
15
Osaka
Japan
20,000
16
Hong Kong
China
18,000+
17
Wuhan
China
18,000
18
London
England
17,000
19
Harbin
China
17,000
20
Guangzhou
China
16,000+
21
Shenyang
China
15,000+
22
Paris
France
15,000
What We Do
A time-dependent, user-specific, and self-adaptive
driving directions service using
GPS trajectories of a large number of taxicabs
GPS log of an end user
Physical Routes
Traffic flows
Drivers
System Overview
Cyber world
Service providing
2. Route
computing
Normal weather
Knowledge discovery
Historical trajectories
Severe weather
and weather
Offline
mining
Weekday
Landmark Graphs
Landmark Graphs
Weekend
Online
inference
Real/Future traffic
0
Physical world
1. Send a query
Q=(qs, qd, t, α)
5. Learning
new α
3. Route downloading
Real-time
taxi trajectories
4. Logging the
real travel with
a GPS trace
Offline Mining
Building landmark graphs
Mining taxi drivers’ knowledge
Challenges
Intelligence modeling
Data sparseness
Low-sampling-rate
Offline Mining
Building landmark graphs
p1
Tr5 Tr1
r1
Tr2
r5
Tr3
r4
r3
r3
r1
r1
e13
r2
r6
r6
Tr4
r10
r8
p3 p4
A) Matched taxi trajectories
r6
e96
r9
B) Detected landmarks
r3
e63
e16
r7
r9
p2
e93
r9
C) A landmark Graph
Mining Taxi Drivers’ Knowledge
Learning travel time distributions for each
landmark edge
Traffic patterns vary in time on an edge
Different land edges have different distributions
Differentiate taxi drivers’ experiences
in different regions
Sigmoid learning curve
𝑓 𝑛𝑖 =
1
1 + 𝑒 −(𝑎𝑛𝑖+𝑏)
1
0.8
0.6
0.4
0.2
0.8
3-5min
5-10min
10-14min
7
9
14
time of day (hour)
17
C) Distributions of travel time
19
24
familiarity
proportion
1.0
0.6
steep
acceleration
plateau
0.4
0.2
slow beginning
0.0
-6
-4
-2
0
ani+b
2
4
6
System Overview
Cyber world
Service providing
2. Route
computing
Normal weather
Knowledge discovery
Severe weather
Historical trajectories
and weather
Offline
mining
Weekday
Landmark Graphs
Weekend
Landmark Graphs
Online
inference
Real/Future traffic
Physical world
1. Send a query
Q=(qs, qd, t, α)
5. Learning
new α
3. Route downloading
Real-time
taxi trajectories
4. Logging the
real travel with
a GPS trace
Online Inference
Predict feature traffic conditions (F) on each landmark edge
based on the historical landmark graph (H) and
the recent GPS trajectories of taxis (R)
using a 𝑚th-order Markov chain
τ τ
R
H
H+R→F
φ=hτ
t1 t2 t3
F
tn
tn+h=tn+φ
Online Inference
Model: 𝑚th-order Markov Chain:
𝑃( 𝑌𝑛 = 𝑦𝑛 𝑌1 = 𝑦1 , 𝑌2 = 𝑦2 , … , 𝑌𝑛−1 = 𝑦𝑛−1
= 𝑃( 𝑌𝑛 = 𝑦𝑛 𝑌𝑛−𝑚 = 𝑦𝑛−𝑚 , … , 𝑌𝑛−1 = 𝑦𝑛−1
ℎ-step ahead transition probability
𝑃( 𝑌𝑚+ℎ = 𝑦𝑚+ℎ 𝑌1 = 𝑦1 , 𝑌2 = 𝑦2 , … , 𝑌𝑚 = 𝑦𝑚
τ τ
R
H
H+R→F
φ=hτ
t1 t2 t3
F
tn
tn+h=tn+φ
Online Inference
High dimensional embedding
Advantage: ℙ(ℎ) and 𝚸(ℎ) can be calculated online (ℎ >1)
Yj
Yj+1
Yj+m-1 Yj+m
The S space
𝚸 (1) → ℙ(1) → ℙ(ℎ) → 𝚸 (ℎ)
𝐒→
Yj
Yj+1
Yj+h
The Sm space
𝐒𝑚 →
𝐒𝑚 → 𝐒
System Overview
Cyber world
Service providing
2. Route
computing
Normal weather
Knowledge discovery
Severe weather
Historical trajectories
and weather
Offline
mining
Weekday
Landmark Graphs
Landmark Graphs
Weekend
Online
inference
Real/Future traffic
0
Physical world
1. Send a query
Q=(qs, qd, t, α)
5. Learning
new α
3. Route downloading
Real-time
taxi trajectories
4. Logging the
real travel with
a GPS trace
Route Computing
Rough routing
0.3
proportion
Given a user query (𝑞𝑠 , 𝑞𝑑 , t, 𝛼)
Search a landmark graph for a rough route:
a sequence of landmarks
Using a time-dependent routing algorithm
0.30
0.32
0.25
0.2
0.1
0.10
0.03
0.0
C1
C2
C3
C4
C5
clusters
0.97
1
1.00
0.72

0.7
0.40
0.3
0.10
0
150
A landmark graph
272
197
200
250
300
travel time (seconds)
350
Route Computing
Refined routing
Find out the fastest path connecting the consecutive landmarks
Can use speed constraints
Dynamic programming
Very efficient
Smaller search spaces
Computed in parallel
2
2
0.3
qs
r2
r4
0.2
1
r5
qe
r6
A) A rough route
r4.start
r2.start
0.3
qs
1
4.5
1.4
3.2
1 0.9
r6.start
1
qe
1
1.4
r2.end
r4.end
0.2
2.4
r5.end
B) The refined routing
r4.start 2.5
r2.start
qs
1.7
r5.start
2.8
r2.end
0.3
1
2.5
1
r6.end
r5.start
r6.start
1
1
r4.end
r5.end
C) A fastest path
0.9
0.2
r6.end
qe
Learning an end user’s drive behavior
Drive behavior
0.3
proportion
Vary in persons and places
Vary in progressing driving experiences
Custom factor: 𝜶 = {𝛼1 , 𝛼2 , … , 𝛼𝑛 }
0.30
0.25
0.2
0.10
0.1
0.03
0.0
𝛼𝑖
𝑀
C1
C2
= 𝐶𝐷𝐹𝑖 𝑇𝑖 (𝑀)
C3
C4
C5
clusters
Weighted Moving Average:
0.97
1
1.00
0.72
0.7

𝑀−𝑛+𝑖
𝑛
𝑗
𝛼
𝑖
𝑗=1
𝛼𝑖 𝑀+1 =
𝑛
𝑗=1 𝑗
𝑛
2
=
𝑗𝛼𝑖 𝑀−𝑛+𝑗
𝑛(𝑛 + 1)
𝑗=1
0.32
0.40
0.3
0.10
0
150
272
197
200
250
300
travel time (seconds)
350
Evaluations
Evaluation on traffic prediction
Datasets
Beijing taxi trajectories (on landmark graphs)
Singapore traffic data (on road segments)
Baselines
H method (T-Drive[GIS’10])
R method (ARIMA with AIC criterion)
Measurement: RMSE =
1
𝑀
𝑖
𝑥𝑖 − 𝑥𝑖
2
Evaluation on the self-adaptive routing
Datasets
Beijing taxi trajectories
Two users’ GPS logs of 1 year
Baseline: T-Drive[GIS’10]
Measurement: absolute percentage error (APE)
Evaluation – Beijing Datasets
Beijing Taxi Trajectories
33,000 taxis in 3 months
Total distance: 400 million km
Total number of points: 790M
Average sampling interval:
3.1 minutes, 600 meters
Beijing Road Network
106,579 road nodes
141,380 road segments
Driving history of users
GPS trajectories from GeoLife
project (Data released)
Evaluations – Singapore Dataset
For evaluating traffic prediction on road segments
We select 50 road segments with a 43-day history of traffic conditions
Each road segment is associated with an aggregated speed
Average update interval: 26 minutes
Evaluation on Traffic Prediction
Beijing Taxi Trajectories
200
160
H(T-Drive)
R(ARIMA)
H+R(Ours)
140
120
120
RMSE(s)
RMSE(s)
160
H(T-Drive)
R(ARIMA)
H+R(Ours)
80
100
40
80
60
0
2pm
3pm
4pm
5pm
6pm
40
30
7pm
60
90
120 150 180 210 240
(min)
time of day
H(T-Drive)
R(ARIMA)
H+R(Ours)
140
120
RMSE (s)
proportion
0.4
m=2,=30, =30
m=1,=30, =30
m=1,=30, =60
160
0.2
100
80
60
40
0.0
2pm
-300
-150
0
150
300
3pm
4pm
5pm
6pm
7pm
time of day
residual error (s)
Two months for offline, 12 days for online (6 weekdays, 6 weekends)
A) A simple road
R(ARIMA)
0.643
0.651
0.683
0.688
B) The traffic patterns on the simple road
12/3 23:54
0.689
12/3 19:37
0.686
12/3 15:22
H(T-Drive)
12/3 11:05
120min
12/3 06:07
12/3 00:46
12/2 20:21
12/2 16:05
12/2 11:52
12/2 06:54
12/2 01:38
12/1 21:17
12/1 16:25
12/1 12:01
12/1 07:19
12/1 02:44
11/30 22:04
11/30 17:48
11/30 13:26
90min
?
11/30 08:50
Methods
11/30 03:33
11/29 22:46
11/29 18:31
11/29 14:05
Ours (H+R)
11/29 09:24
11/29 04:18
speed(km/h)
Speed (km/h)
Evaluation on Traffic Prediction
The Singapore Dataset
Overall Precision (over 𝜑)
100
80
60
40
Evaluation on Traffic Prediction
The Singapore Dataset
H(T-Drive)
R(ARIMA)
H+R(Ours)
25
RMSE(km/h)
20
15
10
5
0
A) A complex road
60
B) Street view of the complex road
90
120
150
180
 (minutes)
80
60
40
Datasets
On all
segments
On good
segments
Beijing
3.1
8.7
Singapore
1.9
2.5
20
C) The traffic patterns on the complex road
12/3 20:05
12/3 15:52
12/3 11:28
12/3 06:36
12/3 01:09
12/2 20:49
12/2 15:59
12/2 11:52
12/2 06:54
12/2 01:38
12/1 21:17
12/1 16:25
12/1 12:01
12/1 07:19
12/1 02:44
11/30 22:04
11/30 17:48
11/30 13:26
11/30 08:50
11/30 03:33
11/29 22:46
11/29 18:31
11/29 14:05
11/29 09:24
0
11/29 04:18
speed(km/h)
100
Evaluation on Routing
User A on different routes
0.4
Self-adaptive 
T-Drive(=1.0)
T-Drive(=0.5)
0.7
0.6
0.5
0.3
APE
APE
0.8
Self-adaptive 
T-Drive(=1.0)
T-Drive(=0.3)
0.5
0.2
0.4
0.3
0.2
Route A
0.1
0.1
0
10
20
30
0
40
number of traverses
5
10
15
20
25
number of traverses
Route B
Two users on the same route
0.6
0.8
0.5
0.4
=(0.68,0.16,0.48,...)
APE
0.6
APE
Self-adaptive 
T-Drive(=1.0)
Self-adaptive 
T-Drive(=1.0)
=(0.25,0.11,0.58,...)
0.4
0.2
0.3
User A
0
2
4
6
8
10 12 14 16 18 20
number of traverses
0
2
4
6
8
10
number of traverses
12
User B
Conclusion
Model traffic patterns and taxi drivers’ intelligence with
landmark graphs
Historical + Real time  Future (m-th order Markov model)
Two stage routing algorithm
Self-adaptive to a user’s drive behavior
The practically fastest path is
Time-dependent
User-specific (for a particular user)
Self-adaptive
Released Datasets:
T-Drive: taxi trajectories
GeoLife: user-generated GPS trajectories
Thanks!
Yu Zheng
[email protected]