On the Effect of Trajectory Compression in Spatio-temporal Querying Elias Frentzos, and Yannis Theodoridis Data Management Group, University of Piraeus http://isl.cs.unipi.gr/db ADBIS, October 2 2007 Talk Outline  Problem.

Download Report

Transcript On the Effect of Trajectory Compression in Spatio-temporal Querying Elias Frentzos, and Yannis Theodoridis Data Management Group, University of Piraeus http://isl.cs.unipi.gr/db ADBIS, October 2 2007 Talk Outline  Problem.

On the Effect of Trajectory
Compression in Spatio-temporal
Querying
Elias Frentzos, and Yannis Theodoridis
Data Management Group, University of Piraeus
http://isl.cs.unipi.gr/db
ADBIS, October 2 2007
Talk Outline

Problem Statement

Background

Compressing Trajectories

Related work on Error Estimation

Estimating the Effect of Compression ST Querying

Evaluating the Effect of Compression ST Querying

Experimental Results


On the performance

On the quality
Conclusions and Future Work
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
2
Talk Outline

Problem Statement

Background

Compressing Trajectories

Related work on Error Estimation

Estimating the Effect of Compression ST Querying

Evaluating the Effect of Compression ST Querying

Experimental Results


On the performance

On the quality
Conclusions and Future Work
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
3
Problem Statement (1)

Trajectory is the data obtained from moving point objects and can be
seen as a string in the 3D space

Trajectory compression is a very promising field since moving objects
recording their position in time produce large amounts of frequently
redundant data

Existing work on trajectory compression is mainly driven by research
advances in the fields of line generalization and time series
compression.

Our interest is in lossy compression techniques which eliminate some
repeated or unnecessary information under well-defined error bounds.
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
4
Problem Statement (2)

The objectives for trajectory compression are:




To obtain a data series that still allows various computations at
acceptable (low) complexity;
To obtain a data series with known, small margins of error,
which are preferably parametrically adjustable.
Our goal is to calculate the mean error introduced in query results
over compressed trajectory data, which is by no means a
trivial task


To obtain a lasting reduction in data size;
We argue that this mean error can be used for deciding whether
the compressed data are suitable for the user needs
We restrict our discussion in a special type of spatiotemporal
query, the timeslice queries
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
5
Talk Outline

Problem Statement

Background

Compressing Trajectories

Related work on Error Estimation

Estimating the Effect of Compression ST Querying

Evaluating the Effect of Compression ST Querying

Experimental Results


On the performance

On the quality
Conclusions and Future Work
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
6
Compressing Trajectories: SED

Methods exploiting line simplification algorithms for compressing a
trajectory are based on the so called Synchronous Euclidean
Distance (SED)
Pi(xi,yi,ti)
SED(P,P’)
Pe(xe,ye,te)
Pi’(xi’,yi’,ti)
Ps(xs,ys,ts)

SED is the distance between the sampled point Pi (xi , yi , ti ) being
under examination, and the point of the line (Ps, Pe) where the
moving object would lie, supposed it was moving on this line, at
time instance ti determined by the point under examination
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
7
Compressing Trajectories: TD-TR algorithm

The TD-TR algorithm (Meratnia and By, EDBT 2004) is a
spatiotemporal extension of the quite famous Top – Down Douglas –
Peucker algorithm which was originally used in cartography

The algorithm tries (and achieves) to preserve directional trends in
the approximated line using a distance threshold
B
A

The TD-TR algorithm uses SED instead of the perpendicular
distance

It is a batch algorithm since it requires the full line at its start
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
8
Compressing Trajectories: OPW-TR algorithm

Opening window (OW) algorithms anchor the start point of
a potential segment, and then attempt to approximate the
subsequent data series with increasingly longer segments.

The algorithm also achieves to preserve directional trends in
the approximated line using a distance threshold
B
A
C

The OPW-TR algorithm (Meratnia and By, EDBT 2004) also
uses SED instead of the perpendicular distance

It can be used as an online algorithm
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
9
Talk Outline

Problem Statement

Background

Compressing Trajectories

Related work on Error Estimation

Estimating the Effect of Compression ST Querying

Evaluating the Effect of Compression ST Querying

Experimental Results


On the performance

On the quality
Conclusions and Future Work
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
10
Related work on Error Estimation
t

The only relative work estimates the average value
of the Synchronous Euclidean Distance (SED), also
termed as Synchronous Error, between an original
trajectory and its approximation.
tn
q
p
n 1 tk 1
AvgE ( p, q)  
E
k 1 tk
p ,q
( t )dt
t1
tk 1

tk

 2at  b
2at  b
b2  4ac
2
E p ,q ( t )dt 
at  bt  c 
arcsinh 
 4ac  b2
4a
8a a

tk 1



 tk
x
There is no obvious way on how to use it in order
to determine the error introduced in query results
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
11
Talk Outline

Problem Statement

Background

Compressing Trajectories

Related work on Error Estimation

Estimating the Effect of Compression in ST Querying

Evaluating the Effect of Compression in ST Querying

Experimental Results


On the performance

On the quality
Conclusions and Future Work
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
12
Estimating the Effect of Compression in ST
Querying: Preliminaries

Our goal is to provide closed-form formulas that estimate the number of
false hits introduced in query results over compressed trajectory datasets

Among the query types executed against trajectory datasets, we focus on a
special type or range query, the so-called timeslice query

Two types of errors are introduced in query results when executing a
timeslice query over a trajectory dataset

which originally qualified the query
but their compressed counterparts
were not retrieved

4
false negatives are the trajectories
1
t
y
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
Q3
Q4
Q5
t6
t4
Q1
t3
t2
false positives are the compressed
trajectories retrieved by the query
while their original counterparts are
not qualifying it
3
2
t1
Q2
xx
13
Estimating the Effect of Compression in ST
Querying: Analysis (1)

We first calculate AvgPi,P / AvgPi,N, which is the average probability of a
single compressed trajectory to be retrieved as false positive / negative,
regarding all possible timeslice query windows with sides a  b

We then sum-up these average probabilities of all dataset trajectories in
order to produce the global average probability

The error introduced
in the position of a trajectory can
be calculated as a
n
n
function
E  R   AvgP  R 
E  R of time AvgP  R 
P
ab

i 1
 xi (t )   xi , k   t  ti , k  
 yi (t )   yi , k   t  ti , k  
i,P
ab
ab
N
y
 xi , k 1   xi , k
b
ti , k 1  ti , k
 yi , k 1   yi , k
ti , k 1  ti , k
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
a
i 1
i,N
Query Window
Wj δx (t )
1 j
δx2(tj)
ab
t=tj
δy1(tj)
δy2(tj)
p2,j
p 2,† j
x
14
Estimating the Effect of Compression in ST
Querying: Analysis (2)

We calculate the average probability of a compressed trajectory Ti to be retrieved
as false positive / negative regarding a timeslice query window at timestamp tj


The quantity of timeslice query windows that may retrieve a compressed trajectory
as false positive / negative at timestamp tj can be extracted geometrically
We distinguish among 4 cases, regarding the signs of δx and δy values
[0,1][0,1], tj
δxi,j<0

W
δyi,j>0
W

Ai , j  a  b  a   xAi , j  b   yi , j

i,j

Finally by integrating the area Ai,j over all the timestamps inside the unit space
we obtain AvgPi,P / AvgPi,N
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
15
Estimating the Effect of Compression in ST
Querying: Analysis (3)

Summing up the average probabilities of all trajectories and performing the
necessary calculations, we obtain:
EN  Rab   EP  Rab  
t
 ti , k 




 b  xi , k   xi , k 1

a  yi , k   yi , k 1
e


 


2
2
6
i 1 k 1 (1  a )  (1  b) 


n mi 1
where
i , k 1
e  2  xi ,k  yi ,k  2  xi ,k 1 yi ,k 1   xi ,k  yi ,k 1   xi ,k 1 yi ,k
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
16
Talk Outline

Problem Statement

Background

Compressing Trajectories

Related work on Error Estimation

Estimating the Effect of Compression in ST Querying

Evaluating the Effect of Compression in ST Querying

Experimental Results


On the performance

On the quality
Conclusions and Future Work
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
17
Evaluating the Effect of Compression in ST
Querying

The evaluation of this formula is a costly operation O(nm); its calculation
requires to process the entire original dataset along with its compressed
counterpart

However, any compression algorithm evaluating SED, need also to calculate
δxi,k δyi,k in every timestamp
SEDi  t    xi  t    yi  t 
2

2
As a consequence, the evaluation of the average error in the query results,
can be integrated in the compressions algorithm, introducing only a small
overhead on its execution
EN  Rab   EP  Rab  
t
 ti , k 




 b  xi , k   xi , k 1

a  yi , k   yi , k 1
e


 


2
2
6
i 1 k 1 (1  a )  (1  b) 


n mi 1
i , k 1
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
18
Talk Outline

Problem Statement

Background

Compressing Trajectories

Related work on Error Estimation

Estimating the Effect of Compression in ST Querying

Evaluating the Effect of Compression in ST Querying

Experimental Results


On the performance

On the quality
Conclusions and Future Work
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
19
Experimental Study: Settings

Datasets



A synthetic dataset of 2000 trajectories generated using network-based
data generator and the San Joaquin road network
Implementation


One real trajectory dataset of a fleet of trucks (273 trajectories, 112K
entries)
We implemented the TD-TR algorithm and compressed the real and
synthetic datasets varying its threshold
Experiments


Average overhead introduced in the TD-TR algorithm
Average number of false positives and false negatives in 10000
randomly distributed timeslice queries
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
20
Experimental Study: On the performance



The algorithm’s execution time
reduces as the value of the TD-TR
threshold increases
The overhead introduced in the
algorithm’s execution, is typically
small (bellow 7%)
In absolute times, the overhead
introduced never exceeds 0.2
milliseconds per trajectory
1.6
1.4
Execution time (msec)
Scaling the value of the TD-TR
threshold
1.2
Model calculations included
1
Model calculations excluded
0.8
0.6
0.4
0.2
0
0.001
0.005
0.01
TD-TR threshold
0.015
0.02
Trucks dataset
1.2
1
Model calculations included
Execution time (msec)

0.8
Model calculations excluded
0.6
0.4
0.2
0
0.001
0.005
TD-TR0.01
threshold
0.015
0.02
Synthetic dataset
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
21
Experimental Study: On the quality (1)



The average number of false hits
(negatives and positives) is linear with
the value of the TD-TR compression
threshold
The average error in the estimation
for the synthetic dataset is around
6%, varying between 0.2% and 14%
In the trucks dataset the average
error increases around 10.6%, mainly
due to the error introduced in small
values of TD-TR threshold
False Negatives
0.07
Average False Hits
Scaling the value of the TD-TR
threshold
False P ositives
0.06
Estimation
0.05
0.04
0.03
0.02
0.01
0
0.001
0.005
0.01
0.015
TD-TR threshold
0.02
Trucks dataset
1.4
False Negatives
1.2
Average False Hits

0.08
False P ositives
Estimation
1
0.8
0.6
0.4
0.2
0
0.001
0.005
0.01
0.015
TD-TR threshold
0.02
Synthetic dataset
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
22
Experimental Study: On the quality (2)
Scaling the query size


The average number of false hits
(negatives and positives) is sub-linear
with the size of the query
The average error in the estimation
for the synthetic dataset is around
2.9%, varying between 0.2% and
8.7%
In the trucks dataset the average
error increases around 7.5%
Average False Hits

0.07
False Negatives
0.06
False P ositives
0.05
Estimation
0.04
0.03
0.02
0.01
0
0.05
0.1
0.15
0.2
0.25
Query size (a = b )
0.3
Trucks dataset
1
False Negatives
0.9
False P ositives
0.8
Average False Hits

Estimation
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.05
0.1
0.15
0.2
Query size (a = b )
0.25
0.3
Synthetic dataset
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
23
Summary and Future Work

We provided a closed formula of the average number of false
negatives and false positives covering the case of uniformly distributed
query windows and arbitrarily distributed trajectory data

Through an experimental study we demonstrated the efficiency of the
proposed model



We illustrated the applicability of our model under real-life requirements –
it turns out that the estimation of the model parameters introduce only a
small overhead in the trajectory compression algorithm
We presented the accuracy of our estimations, with an average error being
around 6%.
Future work:

Extension of our model in nearest neighbor and general range queries

Applicability of our model in the case of spatiotemporal warehouses
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
24
Acknowledgements

Research partially supported by:

GEOPKDD (“Geographic Privacy-aware Knowledge Discovery and
Delivery”) project funded by the European Community under FP6014915 contract
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
25
On the Effect of Trajectory Compression in
Spatiotemporal Querying
Thank you!
Frentzos and Theodoridis, ADBIS 2007
On the Effect of Trajectory Compression in Spatiotemporal Querying
26