Noise-Insensitive Boolean-Functions are Juntas Guy Kindler & Muli Safra
Download
Report
Transcript Noise-Insensitive Boolean-Functions are Juntas Guy Kindler & Muli Safra
Noise-Insensitive
Boolean-Functions
are Juntas
Guy Kindler & Muli Safra
Slides prepared with help of: Adi Akavia
1
Influential People
The theory of the influence of variables
on Boolean functions [BL, KKL] and
related issues, has been introduced to
tackle social choice problems,
furthermore has motivated a
magnificent sequence of works, related
to economics [K], percolation [BKS],
Hardness of approximation [DS]
Revolving around the Fourier/Walsh
analysis of Boolean functions…
And the real important question:
2
Where to go for Dinner?
Who has suggestions:
Each cast their vote in an
(electronic) envelope, and
have the system decided,
not necessarily according to
majority…
Power
It turns out someone –in the
Florida wing- has the power
to flip some votes
influence
3
Voting Systems
n agents, each voting either “for” (T) or
“against” (F) – a Boolean function over n
variables f is the outcome
The values of the agents (variables) may
each, independently, flip with probability
It turns out: one cannot design an f that
would be robust to such noise -that is,
would, on average, change value w.p. < O(1)unless taking into account only very few of
the votes
4
Dictatorship
Def: a Boolean function P([n]){-1,1} is a
monotone e-dictatorships --denoted fe-if:
T e x
fe x
F e x
5
Juntas
Def: a Boolean function f:P([n]){-1,1} is a
j-Junta if J[n] where |J|≤ j,
s.t. for every x[n]: f(x) = f(x J)
Def: f is an [, j]-Junta if
j-Junta f’ s.t. Pr f x f' x
x~Un
We would tend to omit p
Def: f is an [, j, p]-Junta if
j-Junta f’ s.t. Pr f x f' x
x~ p
p-biased, product distribution
6
Long-Code
n
2
{0,1}
In the long-code L:[n]
each element is
encoded by an 2n-bits
This is the most extensive binary code, having
one bit for every subset in P([n])
7
Long-Code
Encoding an element e[n]:
Ee legally-encodes an element e if Ee = fe
F
F
T
T
T
8
Long-Code Monotone-Dictatorship
The truth-table of a Boolean function
over n elements, can be considered as a
2n bits long string (each corresponding
to one input setting – or a subset of [n])
For a long-code, the legal code-words
are all monotone dictatorships
How about the Hadamard code?
9
Long-code Tests
Def (a long-code test): given a codeword w, probe it in a constant number of
entries, and
accept w.h.p if w is a monotone dictatorship
reject w.h.p if w is not close to any
monotone dictatorship
10
Efficient Long-code Tests
For some applications, it suffices if the test may
accept illegal code-words, nevertheless, ones
which have short list-decoding:
Def(a long-code list-test): given a code-word w,
probe it in 2/3 places, and
accept w.h.p if w is a monotone dictatorship,
reject w.h.p if w is not even approximately
determined by a short list of domain elements, that
is, if a Junta J[n] s.t. f is close to f’ and
f’(x)=f’(xJ) for all x
Note: a long-code list-test, distinguishes
between the case w is a dictatorship, to the
case w is far from a junta.
11
Background
Thm (Friedgut): a Boolean function f with small
average-sensitivity is an [,j]-junta
Thm (Bourgain): a Boolean function f with small highfrequency weight is an [,j]-junta
Thm (Kindler&Safra): a Boolean function f with small
high-frequency weight in a p-biased measure is an
[,j]-junta
Corollary: a Boolean function f with small noisesensitivity is an [,j]-junta
Parameters:
average-sensitivity [BL,KKL,F]
high-frequency weight [KKL,B]
noise-sensitivity [BKS]
13
Noise-Sensitivity
How often does the value of f changes
when the input is perturbed?
z
I
x
[n]
14
Noise-Sensitivity
zI
x
Def(,p,x[n] ): Let 0<<1, and xP([n]).
Then y~,p,x, if y = (x\I) z where
I~[n] is a noise subset, and
z~ pI is a replacement.
Def(-noise-sensitivity): let 0<<1, then
ns f =
Pr
x~p[n] ,y~ [n],p,x
f x f y
[ When p=½ equivalent to flipping each
coordinate in x w.p. /2.]
15
Fourier/Walsh Transform
Write f:{-1, 1}n{-1, 1} as a polynomial
What would be the monomials?
For every set S[n] we have a monomial which is the
product of all variables in S (the only relevant
powers are either 0 or 1)
?????
Make sense now to consider the degree of f or
to break it according to the various degrees of
the monomials..
16
High/Low Frequencies and
their Weights
Def: the high-frequency portion of f: fk
f S
S
f S
S
S k
Def: the low-frequency portion of f:
fk
S k
Def: the high-frequency-weight is: f
Def: the low-frequency-weight is:
k 2
2
k 2
f
2
2
f S
S k
2
f S
S k
17
Low High-Frequency Weight
Prop: the -noise-sensitivity can be expressed in Fourier
transform terms as
2
2 ns f =1 1 f S
S
S
Prop: Low ns Low high-freq weight
Proof: By the above proposition, low noise-sensitivity
implies
S 2
å (1 - l ) f (S) ~ 1
S
nevertheless, f being {-1, 1} function, by Parseval
formula (that the norm 2 of the function and its
Fourier transform are equal) implies
2
f S 1
S
18
Average and Restriction
Def: Let I[n], xP([n]\I),
the restriction function is
I
y
fI x : P I 1,1
x
fI x y f x y
Def: the average function is
AI f : P I
AI f x E f x y
yP I
[n]
y
y y
y
y
I
[n]
x
Note: AI f x yEP I fI x y
19
Fourier Expansion
Prop: fI x f T T x S
SI T I S
Prop????:
E g x
xP I
g S
SI
Corollary: AI f
SI
f(S) S
20
Variation
Def: the variation of f:
variationI f E var fI x y
xP I
yP I
Prop: the following are equivalent
definitions to the variation of f:
2
variationI f f AI f
2
2
f S
SI
21
Low-freq Variation and
Low-freq Average-Sensitivity
Def: the low-frequency variation is:
k
I
variation
f variationI f
2
f S
k
SI
S k
Def: the average sensitivity is as f
variation f
i
i[n]
2
And in Fourier representation: as f f (S) S
S
Def: the low-frequency average sensitivity is:
as
k
2
f variation f f (S) S
i[n]
k
i
S k
25
Main Result
Theorem:
constant >0 s.t. any Boolean function
k 2
f:P([n]){-1,1} satisfying f 2 O 2
k
is an [,j]-junta for j=O(-2k32k).
Corollary:
fix a p-biased distribution p over P([n]).
Let >0 be any parameter.
Set k=log1-(1/2).
Then constant >0 s.t. any Boolean function
2
f:P([n]){-1,1} satisfying ns f O
k
-2
3
2k
is an [,j]-junta for j=O( k ).
27
Where to go for Dinner?
Of course
they’ll have to
Who has suggestions:discuss it over
dinner….
Each cast their vote in an
(electronic) envelope, and
have the system decided,
not necessarily according to
majority…
Power
It turns out someone –in the
Florida wing- has the power
to flip some votes
influence
28
First Attempt:
Following Freidgut’s Proof
Thm: any Boolean function f is an [,j]-junta for
O as f /
j =2
Proof:
1.
2.
Specify the junta J i| variationi f
where, let k=O(as(f)/) and fix =2-O(k)
Show the complement of J has small variation
P([n])
J
29
P([n])
Following Freidgut - Cont
Lemma: variationJ f
2
Proof:
J
variationJ f variation
k
J
f
f
k 2
2
Now, lets bound each argument:
Prop: fk 2 as f
k
2
Proof: characters of size k contribute to the
k 2
average-sensitivity
at least k f 2
2
(since as f f S S
)
S
31
we do not know
True
only as(f)
sinceisthis
is a
whether
small!
{-1,0,1}
function.
So we cannot proceed
this way with only ask!
Following Freidgut - Cont
Prop: variationJk f
4
Proof:
k
J
variation
f variation f
k
i
iJ
2
iJ iS, S k
2O(k)
iJ
2
O(k)
2
2O(k)
f(S) S
iJ iS, S k
2
f(S) S
iS
2
r
2O(k)
influence i f
2
O(k)
f(S) S
iS
as f
r
f(S)
iJ
2/ r
iJ
4/r
S
2
2
r
33
If k were 1
Easy case (!?!): If we’d have a bound on the nonlinear weight, we should be done.
The linear part is a set of independent
characters (the singletons)
In order for those to hit close to 1 or -1 most of
the time, they must avoid the law of large
numbers, namely be almost entirely placed on
one singleton [by Chernoff like bound]
Thm[FKN, ext.]: Assume f is close to linear,
then f is close to shallow ( a constant
function or a dictatorship)
34
How to Deal with Dependency
between Characters
Recall variationJ f f
f
k 2
2
k 2
2
+ variationJk f
k (theorem’s premise)
O 1
2
I2
P([n])
I
I1
Ir
J
Idea: Let J i| variationik f
Partition [n]\J into I1,…,Ir, for r >> k
w.h.p fI[x] is close to linear (low freq
characters intersect I expectedly by 1
element, while high-frequency weight is low).
35
So what?
I2
I1
P([n])
I
Ir
J
fI[x] is close to linear
By FKN fI[x] is either a constant-function
or a dictatorship, for any x
Still, fI[x] could be a different
dictatorship for every x, hence the
variation of each iI might be low
36
almost linear
almost shallow
Theorem([FKN]): global constant M,
s.t. Boolean function f,
shallow Boolean function g, s.t.
2
fg 2 M f
1 2
2
Hence, ||fI[x]>1||2 is small fI[x] is
close to shallow!
37
Dictatorship and its Singleton
Prop: if fI[x] is a dictatorship, then
coordinate i s.t. fI x
i p (where p is the
bias).
weight
Total weight of no more than 1-p
Characters
{1} {2}
{i}
{n} {1,2} {1,3}
{n-1,n}
S
{1,..,n}
Corollary (from [FKN]): global constant M,
s.t. Boolean function h, either h
i p
2
or variationn h M h1
2
38
fI[x] Mostly Constant
Lemma: >0, s.t. for any and any
function g:P([m])
4k
k 4
k 2
Pr g x M1
g
M2 g
2
2
x~ m
Def: Let DI be the set of xP(I), s.t.
fI[x] is a dictatorship
DI
x P I : i I, s.t. f xi p
I
Next we show, that |DI| must be small,
hence for most x, fI[x] is constant.
39
Prev lemma
fI x S
f T
T [n],
T I S
T
Parseval
|DI| must be small
k 2
Lemma: Pr x DI M1 M2 f
x~
i , then
Proof: let gi x fI x
4k
[n]
2
Each S is counted only for
Pr x DI Pr gi x p one index iI. (Otherwise,
x~
iI xP I
if S was counted for both i
4k
k 4
k 2
M1 g
M2 g
and j in I, then |SI|>1!)
2
2
[n]
iI
i
iI
i
4
M1 4k
iI S[n], S k,SI i
M2
f S S
2
2
2
iI S [n], S k,S I i
2
4k
k
M1
f
S
M
f
2
iI S[n], S k,SI i
iI
f S S
2
2
2
40
Simple Prop
Prop: let {ai}iI be sub-distribution, that
is, iIai1, 0ai, then iIai2maxiI{ai}.
2
2
1
a
a
Proof: i
amax max amax
iI
ai
no more than 1
1
1
ai
2
3
max
n
1/amax
1
1
2
3
n
41
|DI| must be small - Cont
Therefore
2
2
2
k
f
S
max
f
S
max
variation
i f
iI
iI
iI S[n], S k,
S[n], S k,
S
I
i
S
I
i
(since
S[n], S k,SI
i
2
f S 1
),
Hence
Pr[n] x DI M1
x~
M2 f
4k
k 2
2
42
Obtaining the Lemma
It remains to show that indeed:
EI~ J variationI f O
1
r
4k
f
k 2
2
Prop1: variationI f Ex~ variationI fI x
I
•Recall variationI f E var fI x y
xP I yP I
fI x0 y
•However variationI fI x0 yvar
P I
Prop2: Ex~
I
2
2
fI x S f S
SI
SI
{S}S are orthonormal, and fI x S
f T
T I
T
x
43
Obtaining the Lemma – Cont.
2
k
f
S
O
f
SI 1
Prop3: E
I~
Proof: separate by freq:
1
2
Small freq:
EI~ J ,x~ I
J
I
1 ,x~
r
2
r
k2 2
2
Pr S I 1 | S k
r O 1
f
S
SI 1
I
1 kr
Sk
Large freq: EI~
J
1
r
,x~ I
2
fk
f
S
SI 1
Sk
2
2
Corollary(from props 2,3):
1 2
k
EI~J ,x~ I fI x
O f
1
2
r
2
2
44
Obtaining the Lemma – Cont.
Recall: by corollary from [FKN],
1 2
Either f i p or variationn f M f 2
Hence
1 2
EI~ ,x~ variationI fI x Pr x DI EI~ ,x~ M fI x
2
I~ ,x~
J
1
r
I
|DI| is small
J
1
r
J
1
r
I
I
1 2
EI~J ,x~ I fI x
O fk
1
2
r
2
By Corollary
2
Combined with Prop1 we obtain:
EI~ J variationI f O
1
r
4k
f
k 2
2
45
Important Lemma
Lemma: >0, s.t. for any and any
function g:P([m]) , the following
holds:
4k
k 4
k 2
Prm g x O
g
g
x~
Low-freq
2
2
high-freq
47
Beckner/Nelson/Bonami Inequality
Def: let T be the following operator on f
T f x
E
y
f y
1 ,p,x
Thm: for any p≥r and ≤((r-1)/(p-1))½
T f f r
p
Corollary: for f s.t.
f>k=0
4
g 4
4k
4
g2
48
Probability Concentration
t
Simple Bound: Prm g x g t
x~
Proof:
t
Low-freq Bound: Let g:P([m]) be of
degree k and >0, then >0 s.t.
4
4 4k
Prm g x
g2
x~
4
Proof: recall the corollary: g 4
4k
4
g2
50
Lemma’s Proof
Now, let’s prove the lemma:
Bounding low and high freq separately:
, Pr g x
x~ m
Prm gk x Prm gk x
x~
x~
simple bound
Low-freq bound
4
4k
k 4
g
2
2
g
k 2
2
51
Shallow Function
Def: a function f is linear, if only singletons
have non-zero weight
Def: a function f is shallow, if f is either a
constant or a dictatorship.
Claim: Boolean linear functions are shallow.
weight
Character
size
0
1
2
3
k
n
52
Boolean Linear Shallow
Claim: Boolean linear functions are
shallow.
Proof: let f be Boolean linear function,
we next show:
1.
{io} s.t. f f f i0 i
(i.e.S , i0 ,f S 0 )
2. And conclude, that eitherf f or f f i0
i
0
i.e. f is shallow
0
53
f xab f xa'b'
Claim 1
f 1 1 xab 1 xa'b'
f 2 2 xab 2 xa'b'
Claim 1: let f be boolean linear function,
then {io} s.t. f f f i0 i
Proof: w.l.o.g assume f 1 f 2 0
0
for any z{3,…,n}, consider
x00=z, x10=z{1}, x01=z{2}, x11=z{1,2}
then a,b a',b' : f xab f xa'b' min f 1 , f 2.
Next value must be far from {-1,1},
A contradiction! (boolean function)
Therefore f 2 0
1
?
-1
54
Claim 2
Claim 2: let f be boolean function, s.t.
f f f i0 i
Then either f f or f f i0 i
Proof: consider f() and f(i0):
0
0
f f f i0
f i0 f f i0
1
f
0
f i0
f i0
-1
Then f i0 f 2 f i0
f i0 f 0,2
but f is boolean, hence
therefore f i0 0,1
56
Proving FKN:
almost-linear close to shallow
Theorem: Let f:P([n]) be linear,
f 1
2
Let
let i0 be the index s.t. f i0 is maximal
then
2
2
f f f i0 i0 1 o 1
2
n
Note: f is linear, hence f f f
i i
i1
w.l.o.g., assume i0=1, then all we need
to show
n
2
is:
f
i 1 o 1
i2
We show that in the following claim and lemma.
58
Corollary
Corollary: Let f be linear, and f 1 2
then a shallow boolean function g s.t.
2
f g 2 9 o 1
2
Proof: let f f i0 , let g be the
boolean function closest to l.
Then, f g 2 3 o 1
this is true, as
f l 2 is small (by theorem),
and additionally l g 2 is small, since
f 1
2
2
59
Claim 1
Claim 1: Let f be linear.
w.l.o.g., assume f 1 f 2 ... f n
then global constant c=min{p,1-p}
s.t. i 2,...,n : f
i c
weight
Each of weight no more than c
Characters
{} {1} {2}
{i}
{n} {1,2} {1,3}
{n-1,n}
S
{1,..,n}
60
Proof of Claim1
Proof: assume f 2
f xab f xa'b'
f 1 1 xab 1 xa'b'
f 2 2 xab 2 xa'b'
c
for any z{3,…,n}, consider
x00=z, x10=z{1}, x01=z{2}, x11=z{1,2}
then a,b a',b' : f xab f xa'b' min f 1 , f 2 c
Next value must be far from {-1,1} !
2
f 1 )
A contradiction! (to
2
1
?
-1
61
note g
0 2
2
n
2
f
i m
i
Lemma
Lemma: Let2 g be linear, let 2 g 1 2
0
0
g
c
g
1 o 1
assume
, then
2
2
Corrolary: The theorem follows from the
combination of claim1 and the nlemma:
2
i c 2
Let m be the minimal index s.t. f
i m
i
Consider g b f
2
n
2
i m
If m=2: the theorem is obtained (by lemma)
Otherwise -- a contradiction to minimality
of m : n 2
f i c 1 o 1 c o 1
im1
63
Lemma’s Proof
Lemma’s Proof: Note
var g g 1 2
2
g
0 2
2
var g
Hence, all we need to show is that
Intuition:
var g var g
Note that |g| and |b| are far from 0
(since |g| is -close to 1, and c-close to b).
Assume b>0, then for almost all inputs x, g(x)=|g(x)|
0 2
(as g 2 c )
Hence E[g] E[|g(x)|], and
therefore var(g) var(|g|)
64
Proof-map:
|g|,|b| are far from 0
g(x)=|g(x)| for almost all x
E[g] E[|g|]
var(g) var(|g|)
E2[g] - E2[|g|] = 2E2[|g|1{f<0}] o()
(by Azuma’s inequality)
We next show var(g) var(|g|):
By the premise var g
however
var g g 2 E2 g var g E2 g E2 g
2
therefore var g var g o
65
Variation Lemma
I2
P([n])
I
I1
EI~ J variationI f O
1
r
J
Lemma(variation): >0, and r>>k s.t.
4k
Ir
f
k 2
2
Corollary: for most I and x, fI[x] is almost
constant
variationI f E var fI x y
xP I
y I
66
Using Idea2
O
4k
f
k 2
2
I
I1
2
2
Ir
J
4k
(set r )
Let f’(x) = sign( AJ[f](xJ) ).
f’ is the boolean function closest to AJ[f],
2
therefore
f f'
2
f f' 1 O
I2
By union bound on I1,…,Ir:
variationJ f r O 4k fk
P([n])
4k
f
k 2
2
Hence f is an [,j]-junta.
2
f AJ f
2
2O
k 2
4k
2
f
2
AJ f f'
2
2
67
variation-Lemma - Proof Plan
Lemma(variation): >0, and r>>k s.t.
EI~ J variationI f O
1
r
4k
f
k 2
2
O
2
4k
f
k 2
2
Ok
Sketch for proving the variation lemma:
1.
w.h.p fI[x] is almost linear
2. w.h.p fI[x] is close to shallow
3. fI[x] cannot be close to dictatorship
too often.
68
The End
69
XOR Test
Let be a random procedure for
choosing two disjoint subsets x,y s.t.:
i[n],
ix\y w.p 1/3,
iy\x w.p 1/3, and
ixy w.p 1/3.
Def(XOR-Test): Pick <x,y>~,
Accept if f(x)f(y),
Reject otherwise.
70
Example
Claim: Let f be a dictatorship, then f
passes the XOR-test w.p. 2/3.
Proof: Let i be the dictator, then
Pr<x,y>~[f(x)f(y)]=Pr<x,y>~ [ixy]=2/3
Claim: Let f’ be a -close to a
dictatorship f, then f’ passes the XORtest w.p. 2/3 – 2/3(-2).
Proof: see next slide…
71
Pr f' x f' y
x,y ~
Pr f x f y f' x f x f' y f y
x,y ~
Pr f x f y f' x f x f' y f y
x,y ~
Pr f x f y f' x f x f' y f y f' x f x f' y f y
x,y ~
2
2 3 1 2 3 2 1 3 2 1
2
2 3 1 2 1
2 3 2 3 2
72
Local Maximality
Def: f is locally maximal with respect to a
test,
if f’ obtained from f by a change on one
input x0, that is,
f x x x0
f' x
f x0 x x0
Pr<x,y>~[f(x)f(y)] Pr<x,y>~[f’(x)f’(y)]
Def: Let x be the distribution of all y such
that <x,y>~.
Claim: if f is locally maximal then
f(x) = -sign(Ey~(x)[f(y)]).
73
Claim: E f y
y~ (x)
f S
S x
Proof: immediate from the Fourierexpansion, and the fact that yx=
74
Conjecture: Let f be locally maximal
(with respect to the XOR-test),
assume f passes the XOR-test w.p
1/2 + , for some constant >0,
then f is close to a junta.
75