On the Effect of Trajectory Compression in Spatio-temporal Querying Elias Frentzos, and Yannis Theodoridis Data Management Group, University of Piraeus http://isl.cs.unipi.gr/db ADBIS, October 2 2007 Talk Outline Problem.
Download ReportTranscript On the Effect of Trajectory Compression in Spatio-temporal Querying Elias Frentzos, and Yannis Theodoridis Data Management Group, University of Piraeus http://isl.cs.unipi.gr/db ADBIS, October 2 2007 Talk Outline Problem.
On the Effect of Trajectory Compression in Spatio-temporal Querying Elias Frentzos, and Yannis Theodoridis Data Management Group, University of Piraeus http://isl.cs.unipi.gr/db ADBIS, October 2 2007 Talk Outline Problem Statement Background Compressing Trajectories Related work on Error Estimation Estimating the Effect of Compression ST Querying Evaluating the Effect of Compression ST Querying Experimental Results On the performance On the quality Conclusions and Future Work Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 2 Talk Outline Problem Statement Background Compressing Trajectories Related work on Error Estimation Estimating the Effect of Compression ST Querying Evaluating the Effect of Compression ST Querying Experimental Results On the performance On the quality Conclusions and Future Work Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 3 Problem Statement (1) Trajectory is the data obtained from moving point objects and can be seen as a string in the 3D space Trajectory compression is a very promising field since moving objects recording their position in time produce large amounts of frequently redundant data Existing work on trajectory compression is mainly driven by research advances in the fields of line generalization and time series compression. Our interest is in lossy compression techniques which eliminate some repeated or unnecessary information under well-defined error bounds. Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 4 Problem Statement (2) The objectives for trajectory compression are: To obtain a data series that still allows various computations at acceptable (low) complexity; To obtain a data series with known, small margins of error, which are preferably parametrically adjustable. Our goal is to calculate the mean error introduced in query results over compressed trajectory data, which is by no means a trivial task To obtain a lasting reduction in data size; We argue that this mean error can be used for deciding whether the compressed data are suitable for the user needs We restrict our discussion in a special type of spatiotemporal query, the timeslice queries Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 5 Talk Outline Problem Statement Background Compressing Trajectories Related work on Error Estimation Estimating the Effect of Compression ST Querying Evaluating the Effect of Compression ST Querying Experimental Results On the performance On the quality Conclusions and Future Work Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 6 Compressing Trajectories: SED Methods exploiting line simplification algorithms for compressing a trajectory are based on the so called Synchronous Euclidean Distance (SED) Pi(xi,yi,ti) SED(P,P’) Pe(xe,ye,te) Pi’(xi’,yi’,ti) Ps(xs,ys,ts) SED is the distance between the sampled point Pi (xi , yi , ti ) being under examination, and the point of the line (Ps, Pe) where the moving object would lie, supposed it was moving on this line, at time instance ti determined by the point under examination Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 7 Compressing Trajectories: TD-TR algorithm The TD-TR algorithm (Meratnia and By, EDBT 2004) is a spatiotemporal extension of the quite famous Top – Down Douglas – Peucker algorithm which was originally used in cartography The algorithm tries (and achieves) to preserve directional trends in the approximated line using a distance threshold B A The TD-TR algorithm uses SED instead of the perpendicular distance It is a batch algorithm since it requires the full line at its start Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 8 Compressing Trajectories: OPW-TR algorithm Opening window (OW) algorithms anchor the start point of a potential segment, and then attempt to approximate the subsequent data series with increasingly longer segments. The algorithm also achieves to preserve directional trends in the approximated line using a distance threshold B A C The OPW-TR algorithm (Meratnia and By, EDBT 2004) also uses SED instead of the perpendicular distance It can be used as an online algorithm Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 9 Talk Outline Problem Statement Background Compressing Trajectories Related work on Error Estimation Estimating the Effect of Compression ST Querying Evaluating the Effect of Compression ST Querying Experimental Results On the performance On the quality Conclusions and Future Work Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 10 Related work on Error Estimation t The only relative work estimates the average value of the Synchronous Euclidean Distance (SED), also termed as Synchronous Error, between an original trajectory and its approximation. tn q p n 1 tk 1 AvgE ( p, q) E k 1 tk p ,q ( t )dt t1 tk 1 tk 2at b 2at b b2 4ac 2 E p ,q ( t )dt at bt c arcsinh 4ac b2 4a 8a a tk 1 tk x There is no obvious way on how to use it in order to determine the error introduced in query results Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 11 Talk Outline Problem Statement Background Compressing Trajectories Related work on Error Estimation Estimating the Effect of Compression in ST Querying Evaluating the Effect of Compression in ST Querying Experimental Results On the performance On the quality Conclusions and Future Work Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 12 Estimating the Effect of Compression in ST Querying: Preliminaries Our goal is to provide closed-form formulas that estimate the number of false hits introduced in query results over compressed trajectory datasets Among the query types executed against trajectory datasets, we focus on a special type or range query, the so-called timeslice query Two types of errors are introduced in query results when executing a timeslice query over a trajectory dataset which originally qualified the query but their compressed counterparts were not retrieved 4 false negatives are the trajectories 1 t y Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying Q3 Q4 Q5 t6 t4 Q1 t3 t2 false positives are the compressed trajectories retrieved by the query while their original counterparts are not qualifying it 3 2 t1 Q2 xx 13 Estimating the Effect of Compression in ST Querying: Analysis (1) We first calculate AvgPi,P / AvgPi,N, which is the average probability of a single compressed trajectory to be retrieved as false positive / negative, regarding all possible timeslice query windows with sides a b We then sum-up these average probabilities of all dataset trajectories in order to produce the global average probability The error introduced in the position of a trajectory can be calculated as a n n function E R AvgP R E R of time AvgP R P ab i 1 xi (t ) xi , k t ti , k yi (t ) yi , k t ti , k i,P ab ab N y xi , k 1 xi , k b ti , k 1 ti , k yi , k 1 yi , k ti , k 1 ti , k Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying a i 1 i,N Query Window Wj δx (t ) 1 j δx2(tj) ab t=tj δy1(tj) δy2(tj) p2,j p 2,† j x 14 Estimating the Effect of Compression in ST Querying: Analysis (2) We calculate the average probability of a compressed trajectory Ti to be retrieved as false positive / negative regarding a timeslice query window at timestamp tj The quantity of timeslice query windows that may retrieve a compressed trajectory as false positive / negative at timestamp tj can be extracted geometrically We distinguish among 4 cases, regarding the signs of δx and δy values [0,1][0,1], tj δxi,j<0 W δyi,j>0 W Ai , j a b a xAi , j b yi , j i,j Finally by integrating the area Ai,j over all the timestamps inside the unit space we obtain AvgPi,P / AvgPi,N Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 15 Estimating the Effect of Compression in ST Querying: Analysis (3) Summing up the average probabilities of all trajectories and performing the necessary calculations, we obtain: EN Rab EP Rab t ti , k b xi , k xi , k 1 a yi , k yi , k 1 e 2 2 6 i 1 k 1 (1 a ) (1 b) n mi 1 where i , k 1 e 2 xi ,k yi ,k 2 xi ,k 1 yi ,k 1 xi ,k yi ,k 1 xi ,k 1 yi ,k Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 16 Talk Outline Problem Statement Background Compressing Trajectories Related work on Error Estimation Estimating the Effect of Compression in ST Querying Evaluating the Effect of Compression in ST Querying Experimental Results On the performance On the quality Conclusions and Future Work Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 17 Evaluating the Effect of Compression in ST Querying The evaluation of this formula is a costly operation O(nm); its calculation requires to process the entire original dataset along with its compressed counterpart However, any compression algorithm evaluating SED, need also to calculate δxi,k δyi,k in every timestamp SEDi t xi t yi t 2 2 As a consequence, the evaluation of the average error in the query results, can be integrated in the compressions algorithm, introducing only a small overhead on its execution EN Rab EP Rab t ti , k b xi , k xi , k 1 a yi , k yi , k 1 e 2 2 6 i 1 k 1 (1 a ) (1 b) n mi 1 i , k 1 Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 18 Talk Outline Problem Statement Background Compressing Trajectories Related work on Error Estimation Estimating the Effect of Compression in ST Querying Evaluating the Effect of Compression in ST Querying Experimental Results On the performance On the quality Conclusions and Future Work Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 19 Experimental Study: Settings Datasets A synthetic dataset of 2000 trajectories generated using network-based data generator and the San Joaquin road network Implementation One real trajectory dataset of a fleet of trucks (273 trajectories, 112K entries) We implemented the TD-TR algorithm and compressed the real and synthetic datasets varying its threshold Experiments Average overhead introduced in the TD-TR algorithm Average number of false positives and false negatives in 10000 randomly distributed timeslice queries Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 20 Experimental Study: On the performance The algorithm’s execution time reduces as the value of the TD-TR threshold increases The overhead introduced in the algorithm’s execution, is typically small (bellow 7%) In absolute times, the overhead introduced never exceeds 0.2 milliseconds per trajectory 1.6 1.4 Execution time (msec) Scaling the value of the TD-TR threshold 1.2 Model calculations included 1 Model calculations excluded 0.8 0.6 0.4 0.2 0 0.001 0.005 0.01 TD-TR threshold 0.015 0.02 Trucks dataset 1.2 1 Model calculations included Execution time (msec) 0.8 Model calculations excluded 0.6 0.4 0.2 0 0.001 0.005 TD-TR0.01 threshold 0.015 0.02 Synthetic dataset Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 21 Experimental Study: On the quality (1) The average number of false hits (negatives and positives) is linear with the value of the TD-TR compression threshold The average error in the estimation for the synthetic dataset is around 6%, varying between 0.2% and 14% In the trucks dataset the average error increases around 10.6%, mainly due to the error introduced in small values of TD-TR threshold False Negatives 0.07 Average False Hits Scaling the value of the TD-TR threshold False P ositives 0.06 Estimation 0.05 0.04 0.03 0.02 0.01 0 0.001 0.005 0.01 0.015 TD-TR threshold 0.02 Trucks dataset 1.4 False Negatives 1.2 Average False Hits 0.08 False P ositives Estimation 1 0.8 0.6 0.4 0.2 0 0.001 0.005 0.01 0.015 TD-TR threshold 0.02 Synthetic dataset Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 22 Experimental Study: On the quality (2) Scaling the query size The average number of false hits (negatives and positives) is sub-linear with the size of the query The average error in the estimation for the synthetic dataset is around 2.9%, varying between 0.2% and 8.7% In the trucks dataset the average error increases around 7.5% Average False Hits 0.07 False Negatives 0.06 False P ositives 0.05 Estimation 0.04 0.03 0.02 0.01 0 0.05 0.1 0.15 0.2 0.25 Query size (a = b ) 0.3 Trucks dataset 1 False Negatives 0.9 False P ositives 0.8 Average False Hits Estimation 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.05 0.1 0.15 0.2 Query size (a = b ) 0.25 0.3 Synthetic dataset Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 23 Summary and Future Work We provided a closed formula of the average number of false negatives and false positives covering the case of uniformly distributed query windows and arbitrarily distributed trajectory data Through an experimental study we demonstrated the efficiency of the proposed model We illustrated the applicability of our model under real-life requirements – it turns out that the estimation of the model parameters introduce only a small overhead in the trajectory compression algorithm We presented the accuracy of our estimations, with an average error being around 6%. Future work: Extension of our model in nearest neighbor and general range queries Applicability of our model in the case of spatiotemporal warehouses Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 24 Acknowledgements Research partially supported by: GEOPKDD (“Geographic Privacy-aware Knowledge Discovery and Delivery”) project funded by the European Community under FP6014915 contract Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 25 On the Effect of Trajectory Compression in Spatiotemporal Querying Thank you! Frentzos and Theodoridis, ADBIS 2007 On the Effect of Trajectory Compression in Spatiotemporal Querying 26