Sampling Time-Based Sliding Windows in Bounded Space Rainer Gemulla and Wolfgang Lehner

Download Report

Transcript Sampling Time-Based Sliding Windows in Bounded Space Rainer Gemulla and Wolfgang Lehner

Sampling Time-Based Sliding Windows in Bounded Space

Rainer Gemulla and Wolfgang Lehner SIGMOD 2008 1 Chen Yi-Chun 2008/10/02

Outline

       Motivation Priority sampling Bounded priority sampling Correctness and analysis Sampling multiple items Experimental results Conclusion 2 Chen Yi-Chun 2008/10/02

Motivation

   Random sampling is an appealing approach to build synopses of large data streams.

In this paper, author is concerned with sampling schemes that maintain a uniform sample of a time-based sliding window in bounded space.

Main challenge is to guarantee an upper bound on the space consumption of the sample.

3 Chen Yi-Chun 2008/10/02

Notation definition

   : the set of items from R with a timestamp smaller than or equal to t : a sliding window of length  : the size of the window at time t    Window size : the number of items in the window (N(t),varying) S(t) : uniform random sample 4 Chen Yi-Chun 2008/10/02

Priority sampling

 The replacement set is the reason for the unbounded space consumption of the sampling scheme.

5 Chen Yi-Chun 2008/10/02

Bounded priority sampling

6  a) Arrival of item

e i

e i

 becomes the new candidate item There is currently no candidate item    b) Expiration of candidate item : becomes test item c) Double expiration of test item : discard Chen Yi-Chun 2008/10/02

Correctness and analysis

p’ e’ p max e max 8 Chen Yi-Chun 2008/10/02

Cont.

p’ e’ p max e max 9 Chen Yi-Chun 2008/10/02

Sampling Multiple Items

 BPSWOR(BPS without-replacement):  Modify BPS so as to store k candidates and k test items simultaneously.

p1 e1 |Scand|< k

p e

 ( |

Scand

| ) p 2 e2 10 Chen Yi-Chun 2008/10/02

Each item of the data stream consists of a 8-byte timestamp and 32 bytes of dummy data

Experimental results

 A space budget of 32 kbytes  At most 819 items can be stored in 32 kbytes space 11 Chen Yi-Chun 2008/10/02

Conclusion

 It has studied bounded –space techniques for maintaining uniform samples over a time-based sliding window of a data stream.

12 Chen Yi-Chun 2008/10/02