Transcript Part I
Techniques for Time-Space Tradeoff Lower
Bounds for Branching Programs: Part I
Paul Beame
University of Washington
joint work with Erik Vee, Mike Saks, T.S. Jayram, Xiaodong Sun
1
Branching programs
To compute
f:{0,1}n {0,1}
on input (x1,…,xn) x3
follow path from
x5
source to sink
x1
1
x2
x4
x5
Time T= length of x1
longest path
x=(1,1,0,1,...)
x3
x7
Space S
= log2 (# of nodes)
0
x2
x7
0
1
x8
2
Branching program properties
Simulate random-access machines
Multi-way version for xi in domain D
same time T and space S
good for modeling RAM input registers
BPs will be leveled wlog.
same time T
at most 2S nodes per level
3
Overall approach to lower bounds
If f:Dn {0,1} is computed using small time
and space
then f-1(1) has a special combinatorial
structure.
Lower bounds for f follow if f-1(1) does not
have the structure
How do we find such structures?
4
Levelled BPs and Layers
v0
L1
kn
r
Assume time T kn
and wlog that the BP is levelled
( 2S nodes per level)
L2
Break BP into r layers L1,…,Lr
of height kn/r
kn
r
Lr
0
1
Partition (a subset of) the
layers Lj into sets 1, 2,…, p
p2
5
The Trace of an Input
v0
v1
kn
r
v2
kn
r
L1
Partition of (a subset of) the
layers Lj into sets 1, 2,…, p
p2
L2
The trace of input x
• the sequence of nodes reached
on input x as the computation
moves from one set i to another
•E.g. trace(x) =(v1,v2,v3)
v3
L5
0
1
• a = length of trace = # of
alternations in the partition
• 2Sa possible traces
6
Branching program time-space lower bounds using
these ideas
Oblivious - same variable queried per level
(Syntactic) read k - no variable queried k
times on any path
[Chandra-Furst-Lipton 83], [Alon-Maass 86],
[Babai-Nisan-Szegedy 89]
[Borodin-Razborov-Smolensky 89], [Okol’nishnikova 89]
General BP’s
[B-Jayram-Saks 98], [Ajtai 99a], [Ajtai 99b],
[B-Saks-Sun-Vee 00], [B-Vee 02]
7
The Case of Oblivious BP’s
v0
L1
v1
kn
r
L2
v2
kn
r
v3
L5
0
1
Partition of the layers Lj into
sets 1, 2,…, p p 2
When the BP is oblivious
• Each i is associated with the
subset Ai of variables read in
levels in i
• trace(x) can be used as the
messages on input x in a
communication protocol
between p players computing f,
where the ith player has values of
the variables in Ai
8
The Oblivious Case
Let C= ip Ai be the common variables
for the players and A’i = Ai - C
For any assignment s to C, the trace can
be used to compute fs
Space bound
S CC(fs;A’1,…,A’p)/a
for any s
Want:
n-|A’i| large for all i
small # of alternations a
9
The Read-k Case
Wlog first make the
read-k BP uniform
For any pair of nodes
u,v the multi-set of
variables queried
between u and v is the
same on any path
Call the set Auv
u
v
Add extra
‘dummy’
queries on
each path if
necessary
Then apply levelling
etc.
10
Read-k Case Argument Overview
v0
Variation of the usual argument
First fix the node sequence
s=(v0,v1,…,vr) for the r layers
Defines sets of inputs Av0v1,…,Avr-1vr
read during these layers
fs is an AND of functions defined on
these sets of variables
(k,r)-rectangle
Then choose a layer partition 1, 2
that is good for Av0v1,…,Avr-1vr
Subsequence of (v0,v1,…,vr) at
0
alternations forms the trace - also good
v1
v2
v3
v4
vr
1
11
Partitioning the layers
r layers (of height kn/r)
Let Layers(x,i) be the set of layers in
which variable xi is read on input x
|Layers(x,i)| k
For a set of layers,
unread(x, ) = { i : Layers(x,i) = }
core(x, ) = { i : Layers(x,i) }
Partition is good if these are large for = 1, 2
12
How to partition the layers
Assign every layer to 1 or 2
A = core(x, 1) = unread(x, 2)
B = core(x, 2) = unread(x, 1)
C = set of variables read in common
Two techniques, both using probabilistic
method
[Borodin-Razborov-Smolensky 89]
k+1, a r k22k
|A|, |B| n/2
[Okol’nishnikova 89]
O(k), |B| n/2, a = 2k, r = 2k2
|A| n/k
13
The Read-k Case: Fixing the Trace
v0
L1
v1
kn
r
L2
v2
kn
r
Again, by uniformity, the trace
determines which variables are
read in each component of the
partition
v3
L5
0
Fix a node sequence and then
partition the layers Lj into sets
1, 2 yielding a trace t
Define
ft(x)=1 f(x)=1 and x follows t
1
vf
ft(x)=g(xAC) h(xBC)
ft-1(1) is a pseudo-rectangle
14
Rectangles and Pseudo-rectangles
Ordinary combinatorial rectangle in {0,1}n
Partition [n] into A and B
RARB for sets RA {0,1}A and RB {0,1}B
Alternatively
{x : xA RA and xBRB}
Pseudo-rectangle
[n] =D E, sets RD {0,1}D and RE {0,1}E
{x : xD RD and xE RE}
Or, partition [n] into A, B and C
{x: xAC RAC and xBC RBC}
15
Read-k lower bounds
If f is computed by a (nondeterministic) read k
branching program of size 2S then
The ones of f, f-1(1), can be covered by 2Sa pseudorectangles R with |A| and |B| large and f(R)=1
k+1, ak22k [BRS 89]
|A|, |B| n/2
O(k), |B| n/2, a=2k [Okol 89]
|A| n/k
Prove upper bound F on # of inputs in any such
pseudo-rectangle on which f is constant 1
2S
(|f-1(1)|/F)1/a
or
S
1
log (|f-1(1)|/F)
a
16
Lower bounds for general BPs [BST 98]
Major problem to handle
Fixing the node sequence and the layer
partition does not fix sets A = core(x, 1) or
B = core(x, 2)
Solutions
Apply one layer partition for all inputs
Ignore inputs for which partition is bad
Use extension of [BRS 89] partition method
Prob method argument bounds # of bad inputs
Partition remaining inputs based on the values of
core(x, 1) and core(x, 2) as well as on their
traces
17
Lower bounds for general BPs [BST 98]
Number of rectangles increases
Multiply 2Sa by the number of choices of
core(x, 1) and core(x, 2)
A priori bound is 3n since sets are disjoint
Observation
a pseudo-rectangle w.r.t A,B,C remains a pseudorectangle w.r.t A’,B’,C’ if
A’ A, B’ B, and C’=C (A-A’) (B-B’)
Partition based on only the first m=n/2k+1 elements
of core(x, 1) and core(x, 2)
# of choices is at most n n 2
m,m m
18
Lower bounds for general BPs [BST 98]
If f is computed by a (nondeterministic) time kn
branching program of size 2S
2
n
Then most of f-1(1) can be covered by 2Sa m
pseudo-rectangles with |A|=|B|=m=n/2k+1 where
ak22k (the cover is a partition if the program is
deterministic)
# of pseudo-rectangles is at most
24log2(n/m) m+Sa =
24(k+1)m+Sa
Is that good?
19
Using the Bound: Embedded Rectangles
Pseudo-rectangles are hard to reason
about
Easier objects: Embedded rectangles
Start with an pseudo-rectangle on A,B,C
Fix an assignment to the common set C
we get a simpler object with
a combinatorial rectangle RAxRB on AxB
an assignment s to C=AB
spine
Result is an embedded rectangle
20
Partition of most of f-1(1) into embedded
rectangles
Input space is Dn
Each pseudo-rectangle can be partitioned into at
most |D|n-2m embedded rectangles R with
|A|=|B|=m=n/2k+1
A,B feet of R
Total number of such embedded rectangles
partitioning most of f-1(1)
24(k+1)m+Sa |D|n-2m
Total number of inputs is |D|n
Non-trivial only if, e.g. |D| 23(k+1) large domain
21
Lower bound on embedded rectangle size for
which f is constant
Suppose |f-1(1)| d|D|n
Since at most 24(k+1)m+Sa |D|n-2m embedded
rectangles, average size is at least
d 2-4(k+1)m-Sa-1 |D|2m
and at least 1/4 of f-1(1) is covered by those
d 2-4(k+1)m-Sa-2 |D|2m
Such a rectangle defined by (s,A,B,RA,RB) must
have |RA|/|Dm|,|RB|/|Dm| d 2-4(k+1)m-Sa-2
Typical 2-party communication complexity
results* say |RA|/|Dm|,|RB|/|Dm| |D|-em
*With extra work to handle s and easiest A,B
22
The time space tradeoff lower bounds [BST 98]
Therefore for such a hard f
d 2-4(k+1)m-Sa-2 |D|-em
So if d is constant and |D| 29(k+1)/e
Sa [e log |D| 4(k+1)] m c
(e/2) m log |D|
Since m=n/2k+1 and ak22k for some C 1
S C-k n log |D|
Therefore T/n=k c’log ((n log|D|)/S), i.e.
n log | D |
T n log
S
23
What functions are this hard?
Computing xTMx 0 (mod q) qn [BST 98]
Non-optimal bound when M is Sylvester matrix
Let g 1/2 and c 2/(1H2(g))
HAMg:[nc]n {0,1}: Is any pair (xi,xj) close in
Hamming distance D(xi,xj) gclog n?
Any two sets in [nc]m each of density n-bm contain a
pair of coordinates that are within gclog n of each
other
Defined in [Ajtai 99a] where weaker lower bounds
proved using generalization of [Okol 89] instead of
[BRS 89]
Best bounds follow immediately from [BST 98]
24
What functions are this hard?
Computing xTMyx 0 (mod q)
for x GF(q)n, y GF(q)2n-1, qn
Function defined in [Ajtai 99b] and case q=2 used for
Boolean lower bounds
Key to improvement: For some y, My has better
rigidity properties than Sylvester matrices have
Defining these matrices and analyzing their rigidity
properties is the key contribution of [Ajtai 99b]
Most of the hard work in Boolean lower bounds is in the
second half of [Ajtai 99a], much of which does not fit in the
STOC version
25
Ajtai’s matrices
y1
y2
0
My is constant on anti-diagonals
below the main diagonal
y3
y4
yn
yn+1 yn+2 y2n-2 y2n-1
My
26
xTMyx on an embedded (m,a)-rectangle
B
A
x
For every s on AUB,
f (xAUB,s,y)
A
= xAT MAB xB
+ g(xA,y)
+ h(xB,y)
B
My
x
27
Rectangles, rank, & rigidity
Largest rectangle on which xATMxB is
constant has density q-rank(M)
[BRS 89]
Lemma [Ajtai 99b] Can fix y s.t. every dndn
minor MAB of My has
rank(MAB) c dn/log2(1/d) d1+en
better than comparable rigidity bound of d2n for
Sylvester matrices
[BRS 89], [BST 98]
28
How to partition the layers
Assign every layer to 1 or 2
A = core(x, 1) = unread(x, 2)
B = core(x, 2) = unread(x, 1)
C = set of variables read in common
Two techniques for read-k case, both using
probabilistic method
[Borodin-Razborov-Smolensky 89]
k+1, a r k22k
|A|, |B| n/2
[Okol’nishnikova 89]
O(k), |B| n/2, a = 2k, r = 2k2
|A| n/k
29
Read-k case:
Branching program with node sequence
v0
L1
v1
kn
r
v2
kn
r
vr-1
L2
Lr
0
1
vr
30
Partitioning the layers
r layers (of height kn/r)
Let Layers(x,i) be the set of layers in
which variable xi is read on input x
|Layers(x,i)| k
For a set of layers,
unread(x, ) = { i : Layers(x,i) = }
core(x, ) = { i : Layers(x,i) }
Partition is good if these are large for = 1, 2
31
Partitioning the layers [Okol’nishnikova 89]
Fix node sequence s and x that follows s
Choose a random subset 1 of k of the r
layers
For each index i
Thus
PrL1 Layers(x,i) L1 1/
r
k
EL1 # i : Layers(x,i) L1 n /
r
k
Fix a partition achieving the average
32
Partitioning the layers [Okol’nishnikova 89]
I.e., for each such x
core(x, L1) n/
r
k
Only k layers of height kn/r
At most a=2k alternations
Total k2n/r n/2 vars read in 1 if r=2k2
core (x, 2) n/2
core(x, L1 ) n /
2
2k
k
n / k O(k)
33
Partitioning the layers [BRS 89]
Assign each layer independently
Pr[Li 1]=Pr[Li 2]=1/2
for =1 or 2
Let ci=1 if Layers(x,i) and 0 otherwise
Pr[ci]=Pr[Layers(x,i) ] 1/2k
each variable is read in at most k layers
E[ici ]=E[ #{ i: Layers(x,i) } ] n/2k
k
i.e., E[|core(x, )|] n/2
E[|unread(x, )|] n/2k
34
Modification for general BP [BST 98]
Let l(i) =|Layers(x,i)|
i l(i) kn
Pr[ci] = Pr[Layers(x,i) ] = 2 l(i)
E[|core(x, )|] = E[ici ] = i 2l(i)
By arithmetic-geometric mean inequality this is
l(i)/n
n2
i
n2k
35
Second Moment Method [BRS 89][BST 98]
If r is big enough |core(x,)| is concentrated
around its mean
Bound Var[|core(x, )|] = Var[ ci ]
i
Events for ci, cj correlated only if xi and xj read in
the same layer
At most l(i)kn/r vars read in the same layer as xi
Each contributes at most Pr[ci]=1/2 l(i) to variance
FKG-like inequality
Var[ici ] = (kn/r) i l(i) 2 l(i)
of Chebyshev - terms
l(i)
anti-correlated
(k/r) (j l(j)) i 2are
(k2n/r) i 2 l(i) = (k2n/r)
E[|core(x, )|]
36
Second Moment Method [BRS 89][BST 98]
Var[|core(x, )|] (k2n/r) E[|core(x, )|]
= (k2n/r) m
By Chebyshev’s inequality
Pr[ m/2 |core(x, )| 3m/2]
1 Var[|core(x, )|]/( m/2)2
1 4k22k/r
since m n/2k
Choose r=8k22k
37
The Boolean case is much harder
[BST 98] Showed only T 1.017n for S=o(n) for
quadratic form problem
Uses pseudo-rectangles but specialized to splitting BP only
at the T/2 level, deterministic
[Ajtai 99a] Shows lower bounds for Element
Distinctness over [n2] that work for density 2-em
Embedded rectangles not pseudo-rectangles, deterministic
[Ajtai 99b] T=O(n) S=W(n) for Boolean BP’s!!!
[B-Saks-Sun-Vee 00] Improved bounds and extension to
O(n/T)-error randomized case
Talk later
38
Power of the Large Domain Technique
For oblivious BPs, best bound using twoparty CC is
T=(n log (n/S)) [Alon-Maass 86]
Bounds match for general BPs over large
domains
Best oblivious BP bounds use multiparty CC
T=(n log2(n/S)) [Babai-Nisan-Szegedy 89]
[B-Vee 02] Matching bounds for general BPs over
large domains
Erik Vee talk later
39