Sketching (1) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015
Download
Report
Transcript Sketching (1) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015
Sketching (1)
Alex Andoni
(Columbia University)
MADALGO Summer School on Streaming Algorithms 2015
131.107.65.14
Challenge: log statistics of the data, using small space 18.0.1.12
131.107.65.14
IP
Frequency
131.107.65.14
3
18.0.1.12
2
80.97.56.20
2
127.0.0.1
9
192.168.0.1
8
257.2.5.7
0
16.09.20.11
1
80.97.56.20
18.0.1.12
80.97.56.20
131.107.65.14
Streaming statistics
๏ฝ
๏ฝ
Let ๐ฅ๐ = frequency of IP ๐
1st moment (sum): โ๐ฅ๐
๏ฝ
๏ฝ
Trivial: keep a total counter
2nd moment (variance): โ๐ฅ๐2 = ||๐ฅ||2
๏ฝ
Trivially: ๐ counters โ too much space
๏ฝ
๏ฝ
Canโt do better
Better with small approximation!
๏ฝ
Via dimension reduction in โ2
IP
Frequency
131.107.65.14
3
18.0.1.12
2
80.97.56.20
2
โ๐ฅ๐ = 7
โ๐ฅ๐2 = 17
2nd frequency moment
๏ฝ
๏ฝ
๏ฝ
Let ๐ฅ๐ = frequency of IP ๐
2nd moment: โ๐ฅ๐2 = ||๐ฅ||2
Dimension reduction
๏ฝ
๏ฝ
๐ฅ
๐ ๐ฅ = ๐บ1 ๐ฅ, ๐บ2 ๐ฅ, โฆ ๐บ๐ ๐ฅ = ๐ฎ๐ฅ
each ๐บ๐ is n-dimensional Gaussian vector
Estimator:
๏ฝ
๏ฝ
๐ฎ
๐
Store a sketch of ๐ฅ
๏ฝ
๏ฝ
๐
1
||๐ฎ๐ฅ||2
๐
=
1
๐
๐บ1 ๐ฅ
2
+ ๐บ2 ๐ฅ
2
+ โฏ + ๐บ๐ ๐ฅ
Updating the sketch:
๏ฝ
๏ฝ
Use linearity of the sketching function ๐
๐ฎ ๐ฅ + ๐๐ = ๐ฎ๐ฅ + ๐ฎ๐๐
2
Correctness
๏ฝ
1
โ๐2 /2
๐
2๐
๐ธ[๐] = 0
๐ธ[๐2 ] = 1
Theorem [Johnson-Lindenstrauss]:
๏ฝ
๏ฝ
pdf =
||๐ฎ๐ฅ||2 = 1 ± ๐ ||๐ฅ||2 with probability 1 โ ๐ โ๐(๐๐
2)
Why Gaussian?
๏ฝ
๏ฝ
Stability property: ๐บ๐ ๐ฅ = โ๐ ๐บ๐๐ ๐ฅ๐ is distributed as | ๐ฅ| โ
๐,
where ๐ is also Gaussian
Equivalently: ๐บ๐ is centrally distributed, i.e., has random
direction, and projection on random direction depends only
on length of ๐ฅ
๐ ๐ โ๐ ๐ =
1 โ๐2/2 1 โ๐2/2
=
๐
๐
2๐
2๐
1 โ(๐2+๐2)/2
=
๐
2๐
Proof [sketch]
๏ฝ
๏ฝ
๏ฝ
๏ฝ
๏ฝ
2
=๏
๐ฅ
2
)
โ
๐2
1
๐
||๐ฅ|| โ
๐1 , โฆ , ||๐ฅ|| โ
๐๐
where each ๐๐ is distributed as 1D Gaussian
1
๐
Estimator: ||๐บ๐ฅ||2 = ||๐ฅ||2 โ
โ๐ ๐๐2
๏ฝ
๏ฝ
Expectation = ๏
๐บ๐ โ
๐ฅ
= ๐ฅ 2
2
๐ฎ๐ฅ is distributed as
๏ฝ
๏ฝ
Expectation: ๏
๐บ๐ โ
๐ฅ 2 = ๐ฅ 2
Standard deviation: ๏ณ[|๐บ๐ ๐ฅ|2 ] = ๐( ๐ฅ
Proof:
โ๐ ๐๐2 is called chi-squared distribution with ๐ degrees
Fact: chi-squared very well concentrated:
๏ฝ
๏ฝ
1
โ๐2 /2
๐
2๐
๐ธ[๐] = 0
๐ธ[๐2 ] = 1
Claim: for any ๐ฅ โ โ๐ , we have
๏ฝ
๏ฝ
pdf =
Equal to 1 + ๐ with probability 1 โ ๐ โฮฉ(๐
Akin to central limit theorem
2 ๐)
2nd frequency moment: overall
๏ฝ
Correctness:
๏ฝ
๏ฝ
๏ฝ
2
โ๐(๐๐ 2 )
||๐ฎ๐ฅ|| = 1 ± ๐ ||๐ฅ|| with probability 1 โ ๐
Enough to set ๐ = ๐(1/๐ 2 ) for const probability of success
Space requirement:
๏ฝ
๏ฝ
๏ฝ
2
๐ = ๐(1/๐ 2 ) counters of ๐(log ๐) bits
What about ๐ฎ: store ๐(๐๐) reals ?
Storing randomness [AMSโ96]
๏ฝ
๏ฝ
๏ฝ
Ok if ๐๐ โless randomโ: choose each of them as 4-wise
independent
Also, ok if ๐๐ is a random ±1
Only ๐(๐) counters of ๐ log ๐ bits
More efficient sketches?
๏ฝ
Smaller Space:
๏ฝ
๏ฝ
No: ฮฉ
1
log ๐
๐2
bits [JWโ11] โ Davidโs lecture
Faster update time:
๏ฝ
Yes: Jelaniโs lecture
Streaming Scenario 2
131.107.65.14
80.97.56.20
๐ฅ
18.0.1.12
18.0.1.12
๐ฆ
IP
Frequency
IP
Frequency
131.107.65.14
1
131.107.65.14
1
18.0.1.12
1
18.0.1.12
2
80.97.56.20
1
Focus: difference in traffic
1st moment: โ |๐ฅ๐ โ ๐ฆ๐ | = ๐ฅ โ ๐ฆ 1
2nd moment: โ |๐ฅ๐ โ ๐ฆ๐ |2 = ๐ฅ โ ๐ฆ
2
2
๐ฅโ๐ฆ
๐ฅโ๐ฆ
1
2
2
=2
=2
Similar Qs: average delay/variance in a network
differential statistics between logs at different servers, etc
Definition: Sketching
๏ฝ
Sketching:
๏ฝ
๏ฝ
๐ : objects โ short bit-strings
given ๐(๐ฅ) and ๐(๐ฆ), should be able to estimate some function
of ๐ฅ and ๐ฆ
IP
Frequency
IP
Frequency
131.107.65.14 1
131.107.65.14 1
18.0.1.12
18.0.1.12
1
80.97.56.20
1
2
๐ฆ
๐ฅ
๐
๐
010110
Estimate ๐ฅ โ ๐ฆ
010101
2
2
?
Sketching for โ2
๏ฝ
As before, dimension reduction
๏ฝ
๏ฝ
๏ฝ
Pick ๐ฎ (using common randomness)
๐(๐ฅ) = ๐ฎ๐ฅ
Estimator: ||๐(๐ฅ) โ ๐(๐ฆ)||22 = ||๐ฎ ๐ฅ โ ๐ฆ ||22
IP
Frequency
IP
Frequency
131.107.65.14 1
131.107.65.14 1
18.0.1.12
18.0.1.12
1
80.97.56.20
1
2
๐ฆ
๐ฅ
๐
๐
010110
๐บ๐ฅ
๐บ๐ฆ
010101
||๐บ๐ฅ โ ๐บ๐ฆ||22
Sketching for Manhattan distance (โ1 )
๏ฝ
Dimension reduction?
๏ฝ
Essentially no: [CSโ02, BCโ03, LNโ04, JNโ10โฆ]
๏ฝ
For ๐ points, ๐ท approximation: between ๐ฮฉ
[BC03, NR10, ANN10โฆ]
๏ฝ
1/๐ท2
even if map depends on the dataset!
๏ฝ
In contrast: [JL] gives ๐(๐ โ2 log ๐)
No distributional dimension reduction either
๏ฝ
Weak dimension reduction is the rescueโฆ
๏ฝ
and ๐(๐/๐ท)
Dimension reduction for โ1 ?
๏ฝ
๏ฝ
Can we do the โanalogโ of Euclidean projections?
For โ2 , we used: Gaussian distribution
๏ฝ
๏ฝ
๏ฝ
Is there something similar for 1-norm?
๏ฝ
๏ฝ
๏ฝ
๏ฝ
has stability property:
๐1 ๐ง1 + ๐2 ๐ง2 + โฏ ๐๐ ๐ง๐ is distributed as ๐ โ
||๐ง||
Yes: Cauchy distribution!
1-stable:
๐1 ๐ง1 + ๐2 ๐ง2 + โฏ ๐๐ ๐ง๐ is distributed as ๐ โ
||๐ง||1
Whatโs wrong then?
๏ฝ
๏ฝ
Cauchy are heavy-tailedโฆ
doesnโt even have finite expectation (of abs)
๐๐๐ ๐ =
1
๐(๐ 2 + 1)
Sketching for โ1 [Indykโ00]
๏ฝ
Still, can consider map as before
๏ฝ
๏ฝ
๐ ๐ฅ = ๐ถ1 ๐ฅ, ๐ถ2 ๐ฅ, โฆ , ๐ถ๐ ๐ฅ = ๐ช๐ฅ
Consider ๐ ๐ฅ โ ๐ ๐ฆ = ๐ช๐ฅ โ ๐ช๐ฆ = ๐ช ๐ฅ โ ๐ฆ = ๐ช๐ง
๏ฝ
๏ฝ
๏ฝ
where ๐ง = ๐ฅ โ ๐ฆ
each coordinate distributed as ||๐ง||1 ×Cauchy
Take 1-norm ||๐ช๐ง||1 ?
๏ฝ
๏ฝ
Can estimate ||๐ง||1 by:
๏ฝ
๏ฝ
does not have finite expectation, butโฆ
Median of absolute values of coordinates of ๐ช๐ง !
Correctness claim: for each ๐
๏ฝ
๏ฝ
14
Pr ๐ถ๐ ๐ง > ||๐ง||1 โ
(1 โ ๐) > 1/2 + ฮฉ(๐)
Pr ๐ถ๐ ๐ง < ||๐ง||1 โ
(1 + ๐) > 1/2 + ฮฉ(๐)
Estimator for โ1
๏ฝ
๏ฝ
Estimator: median ๐ถ1 ๐ง , ๐ถ2 ๐ง , โฆ ๐ถ๐ ๐ง
Correctness claim: for each ๐
๏ฝ
๏ฝ
๏ฝ
Pr ๐ถ๐ ๐ง > ||๐ง||1 โ
(1 โ ๐) > 1/2 + ฮฉ(๐)
Pr ๐ถ๐ ๐ง < ||๐ง||1 โ
(1 + ๐) > 1/2 + ฮฉ(๐)
Proof:
๏ฝ
๏ฝ
๐ถ๐ ๐ง = ๐๐๐ (๐ถ๐ ๐ง) is distributed as abs ||๐ง||1 ๐ = ||๐ง||1 โ
|๐|
Easy to verify that
๏ฝ
๏ฝ
๏ฝ
Pr ๐ > 1 โ ๐
Pr ๐ < 1 + ๐
> 1/2 + ฮฉ ๐
> 1/2 + ฮฉ ๐
Hence, if we have ๐ = ๐ 1/๐ 2
๏ฝ
median ๐ถ1 ๐ง , ๐ถ2 ๐ง , โฆ ๐ถ๐ ๐ง โ 1 ± ๐ ||๐ง||1
with probability at least 90%
To finish the โ๐ normsโฆ
๐
๏ฝ
๐-moment: ฮฃ๐ฅ๐ = ๐ฅ
๏ฝ
๐โค2
๏ฝ
๏ฝ
๐
๐
works via ๐-stable distributions [Indykโ00]
๐>2
๏ฝ
Can do (and need) ๐(๐1โ2/๐ ) counters
[AMSโ96, SSโ02, BYJKSโ02, CKSโ03, IWโ05, BGKSโ06, BO10, AKOโ11, Gโ11,
BKSVโ14]
๏ฝ
Will see a construction via Precision Sampling
A task: estimate sum
๏ฝ
Given: ๐ quantities ๐1 , ๐2 , โฆ ๐๐ in the range [0,1]
Goal: estimate ๐ = ๐1 + ๐2 + โฏ ๐๐ โcheaplyโ
๏ฝ
Standard sampling: pick random set ๐ฝ = {๐1, โฆ ๐๐} of size ๐
๏ฝ
๏ฝ
๏ฝ
๏ฝ
Estimator: ๐ =
๐
๐
โ
(๐๐1 + ๐๐2 + โฏ ๐๐๐ )
Chebyshev bound: with 90% success probability
1
๐ โ ๐(๐/๐) < ๐ < 2๐ + ๐(๐/๐)
2
For constant additive error, need ๐ = ฮฉ(๐)
Compute an estimate ๐ from ๐1, ๐3
a3
a1
a1
a2
a3
a4
Precision Sampling Framework
๏ฝ
Alternative โaccessโ to ๐๐ โs:
๏ฝ
๏ฝ
๏ฝ
For each term ๐๐ , we get a (rough) estimate ๐๐
up to some precision ๐ข๐ , chosen in advance: |๐๐ โ ๐๐ | < ๐ข๐
Challenge: achieve good trade-off between
๏ฝ
๏ฝ
quality of approximation to ๐
use only weak precisions ๐ข๐ (minimize โcostโ of estimating ๐)
Compute an estimate ๐ from ๐1 , ๐2 , ๐3 , ๐4
u1
a1
aฬ1
u2
a2
aฬ2
u3
aฬ3
a3
u4
aฬ4
a4
Formalization
Sum Estimator
Adversary
1. fix precisions ๐ข๐
1. fix ๐1, ๐2, โฆ ๐๐
3. given ๐1 , ๐2 , โฆ ๐๐ , output ๐ s.t.
โ๐ ๐๐ โ ๐พ๐ < 1 (for some small ๐พ)
๏ฝ
What is cost?
๏ฝ
๏ฝ
๏ฝ
2. fix ๐1 , ๐2 , โฆ ๐๐ s.t. |๐๐ โ ๐๐ | < ๐ข๐
Here, average cost = 1/๐ โ
โ 1/๐ข๐
to achieve precision ๐ข๐, use 1/๐ข๐ โresourcesโ: e.g., if ๐๐ is itself a sum ๐๐ =
โ๐๐๐๐ computed by subsampling, then one needs ฮ(1/๐ข๐) samples
For example, can choose all ๐ข๐ = 1/๐
๏ฝ
Average cost โ ๐
Precision Sampling Lemma
[A-Krauthgamer-Onakโ11]
๏ฝ
๏ฝ
Goal: estimate โ๐๐ from {๐๐ } satisfying |๐๐ โ ๐๐ | < ๐ข๐ .
Precision Sampling Lemma: can get, with 90% success:
๏ฝ
๐ additive error and1 +
O(1)
1.5๐ multiplicative error:
๐ ๐โโ๐๐1< <
๐ ๐<< 11.5
+ ๐โ
๐๐++๐(1)
๐
๏ฝ
๏ฝ
with average cost equal to O(log
๐(๐ โ3n)
log ๐)
Example: distinguish ฮฃ๐๐ = 3 vs ฮฃ๐๐ = 0
๏ฝ
Consider two extreme cases:
๏ฝ
if three ๐๐ = 1: enough to have crude approx for all (๐ข๐ = 0.1)
if all ๐๐ = 3/๐: only few with good approx ๐ข๐ = 1/๐, and the rest with
๐ข๐ = 1
Precision Sampling Algorithm
๏ฝ
Precision Sampling Lemma: can get, with 90% success:
๏ฝ
1 + ๐multiplicative error:
๐ additive error and 1.5
O(1)
๐ ๐โโ๐๐ 1< <
๐ <11.5
๐(1)
๐<
+ ๐โ
๐โ
+
๐+
๐ 1
๏ฝ
๏ฝ
Algorithm:
๏ฝ
๏ฝ
๏ฝ
with average cost equal to ๐(log
๐(๐ โ3๐)
log ๐)
concrete
Choose each ๐ข๐ ๏[0,1]
i.i.d. distrib. = minimum of ๐(๐ โ3 ) u.r.v.
+
Estimator: ๐ = count
number
of
๐โs
s.t.
๐
/
๐ข
> 6๐ข๐ โs(up to a
function of [๐๐ /๐ข๐ โ 4/๐]
๐
๐ and
normalization constant)
Proof of correctness:
๏ฝ
๏ฝ
๏ฝ
we use only ๐๐ which are 1.5-approximation to ๐๐
๐ธ[๐] โ โ Pr[๐๐ / ๐ข๐ > 6] = โ ๐๐ /6.
๐ธ[1/๐ข๐ ] = ๐(log ๐) w.h.p.
โ๐ via precision sampling
๏ฝ
๏ฝ
Theorem: linear sketch for โ๐ with ๐(1) approximation,
and ๐(๐1โ2/๐ log ๐) space (90% succ. prob.).
Sketch:
๏ฝ
๏ฝ
๏ฝ
๏ฝ
๏ฝ
Estimator:
๏ฝ
๏ฝ
๏ฝ
Pick random ๐๐๏{±1}, and ๐ข๐ as exponential r.v.
1/๐
let ๐ฆ๐ = ๐ฅ๐ โ
๐๐ /๐ข๐
throw into one hash table ๐ป, ๐ฅ = ๐ฅ1 ๐ฅ2 ๐ฅ3
๐ = ๐(๐1โ2/๐ log ๐) cells
max ๐ป ๐
๐
๐
๐ฆ๐
๐ฆ๐
๐ป = + ๐ฆ๐
Linear: works for difference as well
Randomness: bounded independence suffices
๐ข โผ ๐ โ๐ข
๐ฅ4
๐ฆ๐
+ ๐ฆ๐
+ ๐ฆ๐
๐ฅ5
๐ฅ6
Correctness of โ๐ estimation
๏ฝ
Sketch:
1/๐
where ๐๐๏{±1}, and ๐ข๐ exponential r.v.
๏ฝ
๐ฆ๐ = ๐ฅ๐ โ
๐๐ /๐ข๐
๏ฝ
Throw into hash table ๐ป
๐
๏ฝ
Theorem: max ๐ป ๐
๏ฝ
probability, for ๐ = ๐(๐1โ2/๐ log ๐ 1 ๐) cells
Claim 1: max |๐ฆ๐ | is a const approx to ||๐ฅ||๐
๐
๐
is ๐(1) approximation with 90%
๐
= max ๐ฅ๐ ๐ /๐ข๐
๏ฝ
max ๐ฆ๐
๏ฝ
๏ฝ
Fact [max-stability]: max ๐๐ /๐ข๐ distributed as โ๐๐ /๐ข
๐
max ๐ฆ๐ ๐ is distributed as ||๐ฅ||๐ /๐ข
๏ฝ
๐ข is ฮ(1) with const probability
๐
๐
๐
๐ฅ1
๐ฅ2
๐ฅ3
๐ฅ4
๐ฅ5
๐ฅ6
Correctness (cont)
๏ฝ
Claim 2:
๏ฝ
๏ฝ
for ๐ โ which maximizes |๐ฆ๐ โ |
How much โextra stuffโ is there?
1/๐
๐ฆ๐ = ๐ฅ๐ โ
๐๐ /๐ข๐
where ๐๐ ๏{±1}
๐ข๐ exponential r.v.
๏ฝ
๏ค2 = (๐ป[๐] โ ๐ฆ๐ โ )2 = (โ๐โ ๐ โ ๐ฆ๐ โ
๏ฃ[๐๏ฎ๐])2
๏ฝ
๐ธ ๏ค2 = โ๐โ ๐ โ ๐ฆ๐2 โ
๏ฃ[๐๏ฎ๐] = โ๐โ ๐ โ ๐ฆ๐2 /๐ โค ||๐ฆ||2 /๐
๏ฝ
We have: ๐ธ๐ข ||๐ฆ||2 โค ||๐ฅ||2 โ
๐ธ 1/๐ข1/๐ = ๐ log ๐ โ
||๐ฅ||2
๏ฝ
||๐ฅ||2 โค ๐1โ2/๐ ||๐ฅ||2๐
๏ฝ
By Markovโs: ๏ค2 โค ||๐ฅ||2๐ โ
๐1โ2/๐ โ
๐(log ๐)/๐ with prob 0.9.
Then: ๐ป[๐] = ๐ฆ๐ โ + ๐ฟ = ๏ 1 โ
||๐ฅ||๐ .
๏ฝ
๏ฝ
๐
๐ฆ๐
+ ๐ฆ๐
+ ๐ฆ๐
Consider a hash table ๐ป, and the cell ๐ where ๐ฆ๐ โ falls into
๏ฝ
๏ฝ
max |๐ป ๐ | = ๏ 1 โ
||๐ฅ||๐
๐ฆ๐
๐ฆ๐
๐ป = + ๐ฆ๐
Need to argue about other cells too โ concentration