Noise-Insensitive Boolean-Functions are Juntas Guy Kindler & Muli Safra

Download Report

Transcript Noise-Insensitive Boolean-Functions are Juntas Guy Kindler & Muli Safra

Noise-Insensitive
Boolean-Functions
are Juntas
Guy Kindler & Muli Safra
Slides prepared with help of: Adi Akavia
1
Influential People


The theory of the influence of variables
on Boolean functions [BL, KKL] and
related issues, has been introduced to
tackle social choice problems,
furthermore has motivated a
magnificent sequence of works, related
to economics [K], percolation [BKS],
Hardness of approximation [DS]
Revolving around the Fourier/Walsh
analysis of Boolean functions…
And the real important question:
2
Where to go for Dinner?
Who has suggestions:
Each cast their vote in an
(electronic) envelope, and
have the system decided,
not necessarily according to
majority…
Power
It turns out someone –in the
Florida wing- has the power
to flip some votes
influence
3
Voting Systems



n agents, each voting either “for” (T) or
“against” (F) – a Boolean function over n
variables f is the outcome
The values of the agents (variables) may
each, independently, flip with probability

It turns out: one cannot design an f that
would be robust to such noise -that is,
would, on average, change value w.p. < O(1)unless taking into account only very few of
the votes
4
Dictatorship
Def: a Boolean function P([n]){-1,1} is a
monotone e-dictatorships --denoted fe-if:
T e  x
fe  x   
F e  x
5
Juntas
Def: a Boolean function f:P([n]){-1,1} is a
j-Junta if J[n] where |J|≤ j,
s.t. for every x[n]: f(x) = f(x  J)
Def: f is an [, j]-Junta if
 j-Junta f’ s.t. Pr f  x   f' x   
x~Un
We would tend to omit p
Def: f is an [, j, p]-Junta if
 j-Junta f’ s.t. Pr f  x   f' x   
x~ p
p-biased, product distribution
6
Long-Code


n
2
{0,1}
In the long-code L:[n]
each element is
encoded by an 2n-bits
This is the most extensive binary code, having
one bit for every subset in P([n])
7
Long-Code


Encoding an element e[n]:
Ee legally-encodes an element e if Ee = fe
F
F
T
T
T
8
Long-Code  Monotone-Dictatorship

The truth-table of a Boolean function
over n elements, can be considered as a
2n bits long string (each corresponding
to one input setting – or a subset of [n])
For a long-code, the legal code-words
are all monotone dictatorships
How about the Hadamard code?
9
Long-code Tests

Def (a long-code test): given a codeword w, probe it in a constant number of
entries, and


accept w.h.p if w is a monotone dictatorship
reject w.h.p if w is not close to any
monotone dictatorship
10
Efficient Long-code Tests
For some applications, it suffices if the test may
accept illegal code-words, nevertheless, ones
which have short list-decoding:
Def(a long-code list-test): given a code-word w,
probe it in 2/3 places, and


accept w.h.p if w is a monotone dictatorship,
reject w.h.p if w is not even approximately
determined by a short list of domain elements, that
is, if  a Junta J[n] s.t. f is close to f’ and
f’(x)=f’(xJ) for all x
Note: a long-code list-test, distinguishes
between the case w is a dictatorship, to the
case w is far from a junta.
11
Background

Thm (Friedgut): a Boolean function f with small
average-sensitivity is an [,j]-junta

Thm (Bourgain): a Boolean function f with small highfrequency weight is an [,j]-junta

Thm (Kindler&Safra): a Boolean function f with small
high-frequency weight in a p-biased measure is an
[,j]-junta

Corollary: a Boolean function f with small noisesensitivity is an [,j]-junta

Parameters:
average-sensitivity [BL,KKL,F]
high-frequency weight [KKL,B]
noise-sensitivity [BKS]
13
Noise-Sensitivity
How often does the value of f changes
when the input is perturbed?
z
I
x
[n]
14
Noise-Sensitivity

zI
x
Def(,p,x[n] ): Let 0<<1, and xP([n]).
Then y~,p,x, if y = (x\I) z where


I~[n] is a noise subset, and
z~ pI is a replacement.
Def(-noise-sensitivity): let 0<<1, then
ns f =
Pr
x~p[n] ,y~ [n],p,x
f  x   f  y  
[ When p=½ equivalent to flipping each
coordinate in x w.p. /2.]
15
Fourier/Walsh Transform
Write f:{-1, 1}n{-1, 1} as a polynomial
What would be the monomials?

For every set S[n] we have a monomial which is the
product of all variables in S (the only relevant
powers are either 0 or 1)
?????
Make sense now to consider the degree of f or
to break it according to the various degrees of
the monomials..
16
High/Low Frequencies and
their Weights
Def: the high-frequency portion of f: fk 
 f S
S
 f S
S
S k
Def: the low-frequency portion of f:
fk 
S k
Def: the high-frequency-weight is: f
Def: the low-frequency-weight is:
k 2
2
k 2
f
2

2
 f S
S k

2
 f S
S k
17
Low High-Frequency Weight
Prop: the -noise-sensitivity can be expressed in Fourier
transform terms as
2
2  ns f =1   1    f S 
S
S
Prop: Low ns Low high-freq weight
Proof: By the above proposition, low noise-sensitivity
implies
S 2
å (1 - l ) f (S) ~ 1
S
nevertheless, f being {-1, 1} function, by Parseval
formula (that the norm 2 of the function and its
Fourier transform are equal) implies
2
 f S   1
S
18
Average and Restriction
Def: Let I[n], xP([n]\I),
the restriction function is
I
y
fI  x  : P I   1,1
x
fI  x   y   f  x  y 
Def: the average function is
AI f : P I  
AI f  x   E f  x  y 
yP I 
[n]
y
y y
y
y
I
[n]
x
Note: AI f  x   yEP I fI x  y 
 
19
Fourier Expansion



Prop: fI  x      f T  T  x   S
SI T I S


Prop????:
E g  x  
xP I 

 g  S 
SI 
Corollary: AI f 

SI 
f(S) S
20
Variation
Def: the variation of f:


variationI  f   E var fI x   y  

xP I  
 yP I 
Prop: the following are equivalent
definitions to the variation of f:
2
variationI  f  f  AI f 
2
2
 f S 
SI 
21
Low-freq Variation and
Low-freq Average-Sensitivity
Def: the low-frequency variation is:
k
I
variation
 f  variationI  f   
2
f S
k
SI 
S k
Def: the average sensitivity is as  f  
 variation  f
i
i[n]
2
And in Fourier representation: as  f    f (S) S
S
Def: the low-frequency average sensitivity is:
as
k
2
 f   variation  f   f (S) S
i[n]
k
i
S k
25
Main Result
Theorem:
 constant >0 s.t. any Boolean function
k 2
f:P([n]){-1,1} satisfying f 2  O  2
k
is an [,j]-junta for j=O(-2k32k).
 
Corollary:
fix a p-biased distribution p over P([n]).
Let >0 be any parameter.
Set k=log1-(1/2).
Then  constant >0 s.t. any Boolean function
2
f:P([n]){-1,1} satisfying ns  f   O 
k
-2
3
2k
is an [,j]-junta for j=O( k  ).
 
27
Where to go for Dinner?
Of course
they’ll have to
Who has suggestions:discuss it over
dinner….
Each cast their vote in an
(electronic) envelope, and
have the system decided,
not necessarily according to
majority…
Power
It turns out someone –in the
Florida wing- has the power
to flip some votes
influence
28
First Attempt:
Following Freidgut’s Proof
Thm: any Boolean function f is an [,j]-junta for
O as f /  
j =2
Proof:
1.
2.


Specify the junta J  i| variationi  f   
where, let k=O(as(f)/) and fix =2-O(k)
Show the complement of J has small variation
P([n])
J
29
P([n])
Following Freidgut - Cont
Lemma: variationJ  f   
2
Proof:
J
variationJ  f   variation
k
J
 f 
f
k 2
2
Now, lets bound each argument:
Prop: fk 2  as  f 
k
2
Proof: characters of size k contribute to the
k 2
average-sensitivity
at least k  f 2
2
(since as  f    f S   S
)
S
31
we do not know
True
only as(f)
sinceisthis
is a
whether
small!
{-1,0,1} 
function.
So we cannot proceed
this way with only ask! 
Following Freidgut - Cont
Prop: variationJk  f  
4
Proof:
k
J
variation
 f   variation  f 
k
i
iJ
2


iJ iS, S k
 2O(k)  
iJ
2
O(k)
2
 2O(k)  
f(S) S
iJ iS, S k
2
 f(S) S
iS
2
r
 2O(k)  
   influence i  f  
2
O(k)
f(S) S
iS

as  f 

r
 f(S) 
iJ
2/ r
iJ

4/r

S
2



2
r
33
If k were 1
Easy case (!?!): If we’d have a bound on the nonlinear weight, we should be done.
The linear part is a set of independent
characters (the singletons)
In order for those to hit close to 1 or -1 most of
the time, they must avoid the law of large
numbers, namely be almost entirely placed on
one singleton [by Chernoff like bound]
Thm[FKN, ext.]: Assume f is close to linear,
then f is close to shallow ( a constant
function or a dictatorship)
34
How to Deal with Dependency
between Characters
Recall variationJ  f   f
f
k 2
2
k 2
2
+ variationJk  f 
 k  (theorem’s premise)
O 1
2
I2
P([n])
I
I1
Ir
J
Idea: Let J  i| variationik  f   
 Partition [n]\J into I1,…,Ir, for r >> k
 w.h.p fI[x] is close to linear (low freq
characters intersect I expectedly by 1
element, while high-frequency weight is low).
35
So what?
I2
I1
P([n])
I
Ir
J
fI[x] is close to linear
By FKN fI[x] is either a constant-function
or a dictatorship, for any x
Still, fI[x] could be a different
dictatorship for every x, hence the
variation of each iI might be low
36
almost linear
 almost shallow
Theorem([FKN]): global constant M,
s.t. Boolean function f,
shallow Boolean function g, s.t.
2
fg 2  M f

1 2
2
Hence, ||fI[x]>1||2 is small  fI[x] is
close to shallow!
37
Dictatorship and its Singleton

Prop: if fI[x] is a dictatorship, then
coordinate i s.t. fI x 
i   p (where p is the
bias).
weight
Total weight of no more than 1-p
Characters
{1} {2}

{i}
{n} {1,2} {1,3}
{n-1,n}
S
{1,..,n}
Corollary (from [FKN]): global constant M,
s.t. Boolean function h, either h 
i   p
2
or variationn h   M h1
2
38
fI[x] Mostly Constant

Lemma: >0, s.t. for any  and any
function g:P([m]) 
4k
k 4
k 2
Pr   g  x      M1
g
 M2 g
2
2
x~  m

Def: Let DI be the set of xP(I), s.t.
fI[x] is a dictatorship
DI

x  P I : i  I, s.t. f xi   p
I
Next we show, that |DI| must be small,
hence for most x, fI[x] is constant.
39
Prev lemma
fI x  S  


 f T  
T [n],
T I S
T
Parseval
|DI| must be small
k 2
Lemma: Pr x  DI   M1   M2 f
x~ 
i  , then
Proof: let gi  x  fI x  
4k
[n]
2
Each S is counted only for
Pr  x  DI    Pr  gi  x   p  one index iI. (Otherwise,
x~ 
iI xP I 
if S was counted for both i
4k
k 4
k 2
 M1  g
 M2  g
and j in I, then |SI|>1!)
2
2
[n]
iI
i
iI
i
4
 M1 4k 

iI S[n], S k,SI i
 M2 
f  S  S
2
2
2

iI S [n], S k,S I i
2


4k
k
 M1  
f
S

M
f




2


iI  S[n], S k,SI i
iI

f  S  S
2
2
2
40
Simple Prop


Prop: let {ai}iI be sub-distribution, that
is, iIai1, 0ai, then iIai2maxiI{ai}.
2
2
1
a


a
Proof:  i
amax max  amax
iI
ai
no more than 1
1
1
ai
2
3
max
n
1/amax
1
1
2
3
n
41
|DI| must be small - Cont

Therefore
2




2
2




k
f
S

max
f
S

max
variation




 


i  f  
 

iI
iI
iI  S[n], S k,
S[n], S k,



S

I

i
S

I

i







(since 
S[n], S k,SI 
i

2
f S  1

),
Hence
Pr[n] x  DI   M1
x~ 
  M2 f
4k
k 2
2
42
Obtaining the Lemma

It remains to show that indeed:

EI~  J  variationI  f   O 
1
r

4k
 f
k 2
2

Prop1: variationI  f  Ex~ variationI  fI x
I
•Recall variationI  f   E  var fI x  y  
xP I   yP I 

fI x0   y 
•However variationI  fI x0   yvar
P I 

Prop2: Ex~
I
2
2


  fI x S     f S 
SI
 SI
{S}S are orthonormal, and fI x S 
  f T  
T I 
T
x
43
Obtaining the Lemma – Cont.



2
k
f
S

O
f


 

 SI 1


Prop3: E
I~ 

Proof: separate by freq:

1

2

Small freq:
EI~  J ,x~  I

J
I
1 ,x~ 
r
2
r


k2 2
2

  Pr  S  I  1 | S  k  
r  O 1 
f
S





 SI 1

I
1  kr
Sk



Large freq: EI~ 
J
1
r
,x~  I
2

  fk
f
S



 SI 1

Sk

2
2
Corollary(from props 2,3):
 
1 2 
k

EI~J ,x~ I fI x
O f


1
2

r
2
2
44
Obtaining the Lemma – Cont.
Recall: by corollary from [FKN],
1 2
Either f i   p or variationn  f  M f 2
 Hence
1 2 

EI~  ,x~   variationI  fI x  Pr x  DI   EI~  ,x~  M fI x 

2
I~  ,x~ 

J
1
r


I
|DI| is small
J
1
r
J
1
r
I
I


 
1 2 

EI~J ,x~ I fI x
 O fk

1
2

r
2
By Corollary
2
Combined with Prop1 we obtain:

EI~  J  variationI  f   O 
1
r
4k
 f
k 2
2

45
Important Lemma

Lemma: >0, s.t. for any  and any
function g:P([m]) , the following
holds:
4k
k 4
k 2
Prm  g  x      O 
g
 g
x~ 

Low-freq
2
2

high-freq
47
Beckner/Nelson/Bonami Inequality
Def: let T be the following operator on f
T f  x 
E

y
f  y  
1   ,p,x
Thm: for any p≥r and ≤((r-1)/(p-1))½
T f  f r
p
Corollary: for f s.t.
f>k=0
4
g 4 
4k
4
g2
48
Probability Concentration


t
Simple Bound:  Prm  g  x      g t
x~ 
Proof:
t


Low-freq Bound: Let g:P([m])  be of
degree k and >0, then >0 s.t.
4
4 4k
Prm  g  x       
g2
x~ 

4
Proof: recall the corollary: g 4  
4k
4
g2
50
Lemma’s Proof


Now, let’s prove the lemma:
Bounding low and high freq separately:
, Pr   g  x     
x~  m
Prm  gk  x      Prm  gk  x       
x~ 
x~ 
simple bound
Low-freq bound
4
 
4k
k 4
g
2
    
2
g
k 2
2
51
Shallow Function
Def: a function f is linear, if only singletons
have non-zero weight
Def: a function f is shallow, if f is either a
constant or a dictatorship.
Claim: Boolean linear functions are shallow.



weight
Character
size
0
1
2
3
k
n
52
Boolean Linear  Shallow


Claim: Boolean linear functions are
shallow.
Proof: let f be Boolean linear function,
we next show:
1.
{io} s.t. f  f    f i0  i 
(i.e.S  , i0 ,f S   0 )
2. And conclude, that eitherf  f   or f  f i0  
i 
0
i.e. f is shallow
0
53
f  xab   f  xa'b' 
Claim 1

 f 1  1  xab   1  xa'b'  
f 2  2  xab   2  xa'b'  
Claim 1: let f be boolean linear function,
then {io} s.t. f  f    f i0  i 
Proof: w.l.o.g assume f 1  f 2  0
0






for any z{3,…,n}, consider
x00=z, x10=z{1}, x01=z{2}, x11=z{1,2}
then   a,b    a',b' : f  xab   f  xa'b'   min  f 1 , f 2. 
Next value must be far from {-1,1},
A contradiction! (boolean function)
Therefore f 2  0
1
?
-1
54
Claim 2

Claim 2: let f be boolean function, s.t.
f  f    f i0  i 
Then either f  f   or f  f i0  i 
Proof: consider f() and f(i0):
0
0

f    f    f i0 
f i0   f    f i0 



1
f  
0
f i0 
f i0 
-1
Then f i0   f    2 f i0 
f i0   f    0,2
but f is boolean, hence
therefore f i0   0,1
56
Proving FKN:
almost-linear  close to shallow

Theorem: Let f:P([n])  be linear,
f 1
2

Let 

let i0 be the index s.t. f i0  is maximal
then
2
2
f  f    f i0  i0    1  o 1   

2
n

Note: f is linear, hence f  f     f 
i  i
i1
w.l.o.g., assume i0=1, then all we need
to show
n
2
is:
f 
i   1  o 1   

i2
We show that in the following claim and lemma.
58
Corollary

Corollary: Let f be linear, and  f  1 2
then  a shallow boolean function g s.t.
2
f  g 2   9  o 1  
2

Proof: let f    f i0  , let g be the
boolean function closest to l.
Then, f  g 2  3  o 1  
this is true, as


f  l 2 is small (by theorem),
and additionally l  g 2 is small, since 
f 1
2
2
59
Claim 1

Claim 1: Let f be linear.
w.l.o.g., assume f 1  f 2  ...  f n
then global constant c=min{p,1-p}
s.t. i  2,...,n : f 
i   c
weight
Each of weight no more than c
Characters
{} {1} {2}
{i}
{n} {1,2} {1,3}
{n-1,n}
S
{1,..,n}
60
Proof of Claim1

Proof: assume f 2




f  xab   f  xa'b' 
 f 1  1  xab   1  xa'b'  
f 2  2  xab   2  xa'b'  
 c
for any z{3,…,n}, consider
x00=z, x10=z{1}, x01=z{2}, x11=z{1,2}
then   a,b   a',b' : f  xab   f  xa'b'   min  f 1 , f 2   c
Next value must be far from {-1,1} !
2
f 1   )
A contradiction! (to
2

1
?
-1
61
note g


0 2
2
n
2
 f
i m
i 
Lemma
Lemma: Let2 g be linear, let 2  g  1 2
0
0
g

c

g
 1  o 1   
assume
, then
2
2
Corrolary: The theorem follows from the
combination of claim1 and the nlemma:
2
i   c  2 
 Let m be the minimal index s.t.  f 
i m
i 
 Consider g  b   f 
2
n
2
i m


If m=2: the theorem is obtained (by lemma)
Otherwise -- a contradiction to minimality
of m : n 2
 f i   c  1  o 1    c  o 1  
im1
63
Lemma’s Proof

Lemma’s Proof: Note
var  g   g  1 2  
2


g
0 2
2
 var  g 

Hence, all we need to show is that

Intuition:
var  g   var  g 




Note that |g| and |b| are far from 0
(since |g| is -close to 1, and c-close to b).
Assume b>0, then for almost all inputs x, g(x)=|g(x)|
0 2
(as g 2  c )
Hence E[g]  E[|g(x)|], and
therefore var(g)  var(|g|)
64
Proof-map:
|g|,|b| are far from 0 
g(x)=|g(x)| for almost all x
E[g]  E[|g|]

var(g)  var(|g|)



E2[g] - E2[|g|] = 2E2[|g|1{f<0}]  o()
(by Azuma’s inequality)
We next show var(g)  var(|g|):
 By the premise   var  g 

however

var  g   g 2  E2  g   var  g   E2 g  E2  g 
2


 
therefore var g  var  g   o   
65
Variation Lemma
I2
P([n])
I
I1


EI~  J  variationI  f   O 
1
r

J
Lemma(variation): >0, and r>>k s.t.
4k
Ir
 f
k 2
2

Corollary: for most I and x, fI[x] is almost
constant
variationI  f   E var fI  x   y   

xP I  
 y I 
66
Using Idea2




O 
4k
f
k 2
2

I
I1
2
2
Ir
J

4k
(set r   )
Let f’(x) = sign( AJ[f](xJ) ).
f’ is the boolean function closest to AJ[f],
2
therefore
f  f' 
2

f  f' 1  O 

I2
By union bound on I1,…,Ir:
variationJ  f   r  O  4k  fk

P([n])
4k
f
k 2
2

Hence f is an [,j]-junta.
2
 f  AJ f
2
 2O 
k 2

4k
2
f
2
 AJ f   f'
2
2

67
variation-Lemma - Proof Plan
Lemma(variation): >0, and r>>k s.t.

EI~  J  variationI  f   O 
1
r
4k
 f
k 2
2
  O 
 2
4k
 f
k 2
2

 Ok 
Sketch for proving the variation lemma:
1.
w.h.p fI[x] is almost linear
2. w.h.p fI[x] is close to shallow
3. fI[x] cannot be close to dictatorship
too often.
68
The End
69
XOR Test


Let  be a random procedure for
choosing two disjoint subsets x,y s.t.:
i[n],
ix\y w.p 1/3,
iy\x w.p 1/3, and
ixy w.p 1/3.
Def(XOR-Test): Pick <x,y>~,


Accept if f(x)f(y),
Reject otherwise.
70
Example




Claim: Let f be a dictatorship, then f
passes the XOR-test w.p. 2/3.
Proof: Let i be the dictator, then
Pr<x,y>~[f(x)f(y)]=Pr<x,y>~ [ixy]=2/3
Claim: Let f’ be a -close to a
dictatorship f, then f’ passes the XORtest w.p. 2/3 – 2/3(-2).
Proof: see next slide…
71
Pr f' x   f' y  
x,y ~
Pr  f  x   f  y     f' x   f  x     f' y   f  y  
x,y ~
 Pr  f  x   f  y     f' x   f  x     f' y   f  y   
x,y ~
 Pr  f  x   f  y     f' x   f  x   f' y   f  y      f' x   f  x   f' y   f  y   
x,y ~
2
 2 3   1     2 3   2  1 3  2    1   
2
 2 3   1      2    1   
 2 3  2 3     2 


72
Local Maximality



Def: f is locally maximal with respect to a
test,
if f’ obtained from f by a change on one
input x0, that is,
 f  x  x  x0
f' x   
f  x0  x  x0
Pr<x,y>~[f(x)f(y)]  Pr<x,y>~[f’(x)f’(y)]
Def: Let x be the distribution of all y such
that <x,y>~.
Claim: if f is locally maximal then
f(x) = -sign(Ey~(x)[f(y)]).
73


Claim: E f  y  
y~ (x)
 f S
S x
Proof: immediate from the Fourierexpansion, and the fact that yx=
74

Conjecture: Let f be locally maximal
(with respect to the XOR-test),
assume f passes the XOR-test w.p
 1/2 + , for some constant >0,
then f is close to a junta.
75