Sampling Time-Based Sliding Windows in Bounded Space Rainer Gemulla and Wolfgang Lehner
Download ReportTranscript Sampling Time-Based Sliding Windows in Bounded Space Rainer Gemulla and Wolfgang Lehner
Sampling Time-Based Sliding Windows in Bounded Space
Rainer Gemulla and Wolfgang Lehner SIGMOD 2008 1 Chen Yi-Chun 2008/10/02
Outline
Motivation Priority sampling Bounded priority sampling Correctness and analysis Sampling multiple items Experimental results Conclusion 2 Chen Yi-Chun 2008/10/02
Motivation
Random sampling is an appealing approach to build synopses of large data streams.
In this paper, author is concerned with sampling schemes that maintain a uniform sample of a time-based sliding window in bounded space.
Main challenge is to guarantee an upper bound on the space consumption of the sample.
3 Chen Yi-Chun 2008/10/02
Notation definition
: the set of items from R with a timestamp smaller than or equal to t : a sliding window of length : the size of the window at time t Window size : the number of items in the window (N(t),varying) S(t) : uniform random sample 4 Chen Yi-Chun 2008/10/02
Priority sampling
The replacement set is the reason for the unbounded space consumption of the sampling scheme.
5 Chen Yi-Chun 2008/10/02
Bounded priority sampling
6 a) Arrival of item
e i
e i
becomes the new candidate item There is currently no candidate item b) Expiration of candidate item : becomes test item c) Double expiration of test item : discard Chen Yi-Chun 2008/10/02
Correctness and analysis
p’ e’ p max e max 8 Chen Yi-Chun 2008/10/02
Cont.
p’ e’ p max e max 9 Chen Yi-Chun 2008/10/02
Sampling Multiple Items
BPSWOR(BPS without-replacement): Modify BPS so as to store k candidates and k test items simultaneously.
p1 e1 |Scand|< k
p e
( |
Scand
| ) p 2 e2 10 Chen Yi-Chun 2008/10/02
Each item of the data stream consists of a 8-byte timestamp and 32 bytes of dummy data
Experimental results
A space budget of 32 kbytes At most 819 items can be stored in 32 kbytes space 11 Chen Yi-Chun 2008/10/02
Conclusion
It has studied bounded –space techniques for maintaining uniform samples over a time-based sliding window of a data stream.
12 Chen Yi-Chun 2008/10/02