Transcript Part I

Techniques for Time-Space Tradeoff Lower
Bounds for Branching Programs: Part I
Paul Beame
University of Washington
joint work with Erik Vee, Mike Saks, T.S. Jayram, Xiaodong Sun
1
Branching programs
To compute
f:{0,1}n  {0,1}
on input (x1,…,xn) x3
follow path from
x5
source to sink
x1
1
x2
x4
x5
Time T= length of x1
longest path
x=(1,1,0,1,...)
x3
x7
Space S
= log2 (# of nodes)
0
x2
x7
0
1
x8
2
Branching program properties

Simulate random-access machines


Multi-way version for xi in domain D


same time T and space S
good for modeling RAM input registers
BPs will be leveled wlog.


same time T
at most 2S nodes per level
3
Overall approach to lower bounds

If f:Dn {0,1} is computed using small time
and space

then f-1(1) has a special combinatorial
structure.

Lower bounds for f follow if f-1(1) does not
have the structure
How do we find such structures?
4
Levelled BPs and Layers
v0
L1
kn
r
Assume time T  kn
and wlog that the BP is levelled
( 2S nodes per level)
L2
Break BP into r layers L1,…,Lr
of height kn/r
kn
r
Lr
0
1
Partition (a subset of) the
layers Lj into sets 1, 2,…, p
p2
5
The Trace of an Input
v0
v1
kn
r
v2
kn
r
L1
Partition of (a subset of) the
layers Lj into sets 1, 2,…, p
p2
L2
The trace of input x
• the sequence of nodes reached
on input x as the computation
moves from one set i to another
•E.g. trace(x) =(v1,v2,v3)
v3
L5
0
1
• a = length of trace = # of
alternations in the partition
•  2Sa possible traces
6
Branching program time-space lower bounds using
these ideas

Oblivious - same variable queried per level


(Syntactic) read k - no variable queried k
times on any path


[Chandra-Furst-Lipton 83], [Alon-Maass 86],
[Babai-Nisan-Szegedy 89]
[Borodin-Razborov-Smolensky 89], [Okol’nishnikova 89]
General BP’s

[B-Jayram-Saks 98], [Ajtai 99a], [Ajtai 99b],
[B-Saks-Sun-Vee 00], [B-Vee 02]
7
The Case of Oblivious BP’s
v0
L1
v1
kn
r
L2
v2
kn
r
v3
L5
0
1
Partition of the layers Lj into
sets 1, 2,…, p p  2
When the BP is oblivious
• Each i is associated with the
subset Ai of variables read in
levels in i
• trace(x) can be used as the
messages on input x in a
communication protocol
between p players computing f,
where the ith player has values of
the variables in Ai
8
The Oblivious Case



Let C= ip Ai be the common variables
for the players and A’i = Ai - C
For any assignment s to C, the trace can
be used to compute fs
Space bound

S  CC(fs;A’1,…,A’p)/a
for any s
 Want:
n-|A’i| large for all i
small # of alternations a
9
The Read-k Case

Wlog first make the
read-k BP uniform



For any pair of nodes
u,v the multi-set of
variables queried
between u and v is the
same on any path
Call the set Auv
u
v
Add extra
‘dummy’
queries on
each path if
necessary
Then apply levelling
etc.
10
Read-k Case Argument Overview

v0
Variation of the usual argument



First fix the node sequence
s=(v0,v1,…,vr) for the r layers
Defines sets of inputs Av0v1,…,Avr-1vr
read during these layers
fs is an AND of functions defined on
these sets of variables



(k,r)-rectangle
Then choose a layer partition 1, 2
that is good for Av0v1,…,Avr-1vr
Subsequence of (v0,v1,…,vr) at
0
alternations forms the trace - also good
v1
v2
v3
v4
vr
1
11
Partitioning the layers


r layers (of height  kn/r)
Let Layers(x,i) be the set of layers in
which variable xi is read on input x


|Layers(x,i)|  k
For a set  of layers,



unread(x, ) = { i : Layers(x,i)   =  }
core(x, ) = { i : Layers(x,i)   }
Partition is good if these are large for = 1, 2
12
How to partition the layers

Assign every layer to 1 or 2

A = core(x, 1) = unread(x, 2)
B = core(x, 2) = unread(x, 1)

C = set of variables read in common


Two techniques, both using probabilistic
method


[Borodin-Razborov-Smolensky 89]
k+1, a  r  k22k
 |A|, |B|  n/2
[Okol’nishnikova 89]
O(k), |B|  n/2, a = 2k, r = 2k2
 |A|  n/k
13
The Read-k Case: Fixing the Trace
v0
L1
v1
kn
r
L2
v2
kn
r
Again, by uniformity, the trace
determines which variables are
read in each component of the
partition
v3
L5
0
Fix a node sequence and then
partition the layers Lj into sets
1, 2 yielding a trace t
Define
ft(x)=1  f(x)=1 and x follows t
1
vf
ft(x)=g(xAC)  h(xBC)
ft-1(1) is a pseudo-rectangle
14
Rectangles and Pseudo-rectangles

Ordinary combinatorial rectangle in {0,1}n



Partition [n] into A and B
RARB for sets RA {0,1}A and RB {0,1}B
Alternatively


{x : xA RA and xBRB}
Pseudo-rectangle

[n] =D E, sets RD {0,1}D and RE {0,1}E


{x : xD RD and xE RE}
Or, partition [n] into A, B and C

{x: xAC RAC and xBC RBC}
15
Read-k lower bounds
If f is computed by a (nondeterministic) read k
branching program of size 2S then
The ones of f, f-1(1), can be covered by 2Sa pseudorectangles R with |A| and |B| large and f(R)=1
k+1, ak22k [BRS 89]
 |A|, |B|  n/2
O(k), |B|  n/2, a=2k [Okol 89]
 |A|  n/k
Prove upper bound F on # of inputs in any such
pseudo-rectangle on which f is constant 1
2S

(|f-1(1)|/F)1/a
or
S
1
log (|f-1(1)|/F)
a
16
Lower bounds for general BPs [BST 98]

Major problem to handle


Fixing the node sequence and the layer
partition does not fix sets A = core(x, 1) or
B = core(x, 2)
Solutions

Apply one layer partition for all inputs


Ignore inputs for which partition is bad


Use extension of [BRS 89] partition method
Prob method argument bounds # of bad inputs
Partition remaining inputs based on the values of
core(x, 1) and core(x, 2) as well as on their
traces
17
Lower bounds for general BPs [BST 98]

Number of rectangles increases



Multiply 2Sa by the number of choices of
core(x, 1) and core(x, 2)
A priori bound is  3n since sets are disjoint
Observation



a pseudo-rectangle w.r.t A,B,C remains a pseudorectangle w.r.t A’,B’,C’ if
A’ A, B’  B, and C’=C  (A-A’)  (B-B’)
Partition based on only the first m=n/2k+1 elements
of core(x, 1) and core(x, 2)
# of choices is at most  n   n 2

 
 m,m   m 
18
Lower bounds for general BPs [BST 98]
If f is computed by a (nondeterministic) time kn
branching program of size 2S
2
n
Then most of f-1(1) can be covered by 2Sa  m 
 
pseudo-rectangles with |A|=|B|=m=n/2k+1 where
ak22k (the cover is a partition if the program is
deterministic)
# of pseudo-rectangles is at most
24log2(n/m) m+Sa =
24(k+1)m+Sa
Is that good?
19
Using the Bound: Embedded Rectangles




Pseudo-rectangles are hard to reason
about
Easier objects: Embedded rectangles
Start with an pseudo-rectangle on A,B,C
Fix an assignment to the common set C


we get a simpler object with
 a combinatorial rectangle RAxRB on AxB
 an assignment s to C=AB
spine
Result is an embedded rectangle
20
Partition of most of f-1(1) into embedded
rectangles

Input space is Dn

Each pseudo-rectangle can be partitioned into at
most |D|n-2m embedded rectangles R with
|A|=|B|=m=n/2k+1
A,B feet of R

Total number of such embedded rectangles
partitioning most of f-1(1)
24(k+1)m+Sa |D|n-2m

Total number of inputs is |D|n

Non-trivial only if, e.g. |D|  23(k+1) large domain
21
Lower bound on embedded rectangle size for
which f is constant

Suppose |f-1(1)|  d|D|n



Since at most 24(k+1)m+Sa |D|n-2m embedded
rectangles, average size is at least
d 2-4(k+1)m-Sa-1 |D|2m
and at least 1/4 of f-1(1) is covered by those 
d 2-4(k+1)m-Sa-2 |D|2m
Such a rectangle defined by (s,A,B,RA,RB) must
have |RA|/|Dm|,|RB|/|Dm|  d 2-4(k+1)m-Sa-2
Typical 2-party communication complexity
results* say |RA|/|Dm|,|RB|/|Dm|  |D|-em
*With extra work to handle s and easiest A,B
22
The time space tradeoff lower bounds [BST 98]


Therefore for such a hard f
d 2-4(k+1)m-Sa-2  |D|-em
So if d is constant and |D|  29(k+1)/e
Sa  [e log |D|  4(k+1)] m  c
 (e/2) m log |D|

Since m=n/2k+1 and ak22k for some C  1
S  C-k n log |D|

Therefore T/n=k  c’log ((n log|D|)/S), i.e.

 n log | D |  
T    n log 

S



23
What functions are this hard?

Computing xTMx  0 (mod q) qn [BST 98]



Non-optimal bound when M is Sylvester matrix
Let g 1/2 and c  2/(1H2(g))
HAMg:[nc]n  {0,1}: Is any pair (xi,xj) close in
Hamming distance D(xi,xj) gclog n?



Any two sets in [nc]m each of density  n-bm contain a
pair of coordinates that are within gclog n of each
other
Defined in [Ajtai 99a] where weaker lower bounds
proved using generalization of [Okol 89] instead of
[BRS 89]
Best bounds follow immediately from [BST 98]
24
What functions are this hard?

Computing xTMyx  0 (mod q)
for x GF(q)n, y GF(q)2n-1, qn



Function defined in [Ajtai 99b] and case q=2 used for
Boolean lower bounds
Key to improvement: For some y, My has better
rigidity properties than Sylvester matrices have
Defining these matrices and analyzing their rigidity
properties is the key contribution of [Ajtai 99b]

Most of the hard work in Boolean lower bounds is in the
second half of [Ajtai 99a], much of which does not fit in the
STOC version
25
Ajtai’s matrices
y1
y2
0
My is constant on anti-diagonals
below the main diagonal
y3
y4
yn
yn+1 yn+2 y2n-2 y2n-1
My
26
xTMyx on an embedded (m,a)-rectangle
B
A
x
For every s on AUB,
f (xAUB,s,y)
A
= xAT MAB xB
+ g(xA,y)
+ h(xB,y)
B
My
x
27
Rectangles, rank, & rigidity

Largest rectangle on which xATMxB is
constant has density  q-rank(M)


[BRS 89]
Lemma [Ajtai 99b] Can fix y s.t. every dndn
minor MAB of My has
rank(MAB)  c dn/log2(1/d)  d1+en

better than comparable rigidity bound of d2n for
Sylvester matrices
[BRS 89], [BST 98]
28
How to partition the layers

Assign every layer to 1 or 2

A = core(x, 1) = unread(x, 2)
B = core(x, 2) = unread(x, 1)

C = set of variables read in common


Two techniques for read-k case, both using
probabilistic method


[Borodin-Razborov-Smolensky 89]
k+1, a  r  k22k
 |A|, |B|  n/2
[Okol’nishnikova 89]
O(k), |B|  n/2, a = 2k, r = 2k2
 |A|  n/k
29
Read-k case:
Branching program with node sequence
v0
L1
v1
kn
r
v2
kn
r
vr-1
L2
Lr
0
1
vr
30
Partitioning the layers


r layers (of height  kn/r)
Let Layers(x,i) be the set of layers in
which variable xi is read on input x


|Layers(x,i)|  k
For a set  of layers,



unread(x, ) = { i : Layers(x,i)   =  }
core(x, ) = { i : Layers(x,i)   }
Partition is good if these are large for = 1, 2
31
Partitioning the layers [Okol’nishnikova 89]



Fix node sequence s and x that follows s
Choose a random subset 1 of k of the r
layers

For each index i

Thus
PrL1 Layers(x,i)  L1   1/
 
r 
 
k 
 
EL1  # i : Layers(x,i)  L1  n /
 
r 
 
k 
Fix a partition achieving the average
32
Partitioning the layers [Okol’nishnikova 89]

I.e., for each such x
core(x, L1)  n/

r 
 
k 
 
Only k layers of height kn/r


At most a=2k alternations
Total  k2n/r  n/2 vars read in 1 if r=2k2
 core (x, 2)  n/2
core(x, L1 )  n /
 2
 2k 
 k 


 n / k O(k)
33
Partitioning the layers [BRS 89]

Assign each layer independently


Pr[Li  1]=Pr[Li  2]=1/2
 for  =1 or 2


Let ci=1 if Layers(x,i)   and 0 otherwise
Pr[ci]=Pr[Layers(x,i)  ]  1/2k


each variable is read in at most k layers
E[ici ]=E[ #{ i: Layers(x,i)  } ]  n/2k
k
 i.e., E[|core(x, )|]  n/2
E[|unread(x, )|]  n/2k
34
Modification for general BP [BST 98]

Let l(i) =|Layers(x,i)|
 i l(i)  kn

Pr[ci] = Pr[Layers(x,i)  ] = 2 l(i)

E[|core(x, )|] = E[ici ] = i 2l(i)

By arithmetic-geometric mean inequality this is 
 l(i)/n
n2
i
 n2k
35
Second Moment Method [BRS 89][BST 98]

If r is big enough |core(x,)| is concentrated
around its mean
 Bound Var[|core(x, )|] = Var[ ci ]
i

Events for ci, cj correlated only if xi and xj read in
the same layer
At most l(i)kn/r vars read in the same layer as xi

Each contributes at most Pr[ci]=1/2 l(i) to variance


FKG-like inequality
Var[ici ] = (kn/r) i l(i) 2 l(i)
of Chebyshev - terms
l(i)
anti-correlated
 (k/r) (j l(j)) i 2are
 (k2n/r) i 2 l(i) = (k2n/r)
E[|core(x, )|]
36
Second Moment Method [BRS 89][BST 98]


Var[|core(x, )|]  (k2n/r) E[|core(x, )|]
= (k2n/r) m
By Chebyshev’s inequality

Pr[ m/2  |core(x, )|  3m/2]
 1  Var[|core(x, )|]/( m/2)2
 1  4k22k/r
since m  n/2k

Choose r=8k22k
37
The Boolean case is much harder

[BST 98] Showed only T  1.017n for S=o(n) for
quadratic form problem


Uses pseudo-rectangles but specialized to splitting BP only
at the T/2 level, deterministic
[Ajtai 99a] Shows lower bounds for Element
Distinctness over [n2] that work for density 2-em



Embedded rectangles not pseudo-rectangles, deterministic
[Ajtai 99b] T=O(n)  S=W(n) for Boolean BP’s!!!
[B-Saks-Sun-Vee 00] Improved bounds and extension to
O(n/T)-error randomized case
 Talk later
38
Power of the Large Domain Technique

For oblivious BPs, best bound using twoparty CC is



T=(n log (n/S)) [Alon-Maass 86]
Bounds match for general BPs over large
domains
Best oblivious BP bounds use multiparty CC


T=(n log2(n/S)) [Babai-Nisan-Szegedy 89]
[B-Vee 02] Matching bounds for general BPs over
large domains

Erik Vee talk later
39