Transcript Part II

Techniques for Time-Space Tradeoff Lower
Bounds for Branching Programs: Part II
Paul Beame
University of Washington
joint work with Erik Vee, Mike Saks, T.S. Jayram, Xiaodong Sun
1
The Trace of an Input
v0
v1
kn
r
v2
kn
r
L1
Partition a subset of the
layers Lj into sets 1, 2
L2
The trace of input x
• the sequence of nodes reached
on input x as the computation
moves from one set i to the other
•E.g. trace(x) =(v1,v2,v3)
v3
L5
0
1
• a = length of trace = # of
alternations in the partition
•  2Sa possible traces
2
Embedded (m,a)-rectangles

An embedded (m,a)-rectangle R  Dn
is a subset defined by
disjoint sets A,B  {1,...,n},
feet
a partial assignment s  DAUB,
spine
sets of assignments RA  DA, RB  DB legs
R = { z | zAUB = s, zA  RA, zB  RB }
 |A|,|B| = m
A
B
 |RA|/|D |, |RB|/|D |  a
density

3
RA
An embedded (m,a)-rectangle
DB
spine
s
Dn
x1
RB
m
DA
m
RA
legs
RB
A
feet
B
xn
RA and RB each have density at least a
Wlog A  B
4
Properties of a set of layers



r layers (of height  kn/r)
Let Layers(x,i) be the set of layers in
which variable xi is read on input x
For a set  of layers,
 unread(x, ) = { i : Layers(x,i)   =  }
 core(x, ) = { i : Layers(x,i)   }
5
Embedded rectangle partition (1,2) of
f-1(1) induced by 1, 2

Two inputs x,y  f-1(1) are equivalent iff

trace(x, 1, 2)= trace(y, 1, 2)
core(x, 1)= core(y, 1)
core(x, 2)= core(y, 2)

stem(x, 1, 2)= stem(y, 1, 2) where



stem(x, 1, 2) is the partial assignment that has the values
of x outside core(x, 1) and core(x, 2)


Fixing the trace and the two cores induces the partition into
pseudo-rectangles we used before
Fixing the stems, fixes the common part of each pseudorectangle and produces the embedded rectangles we later
reasoned about
6
Previous argument

Throw out all embedded rectangles
in (1,2) for which |core(  , 1)|
or |core(  , 2)| is smaller than m

Compute density bound a on what’s left

Problem with applying it to the Boolean case

The density bound a is too small
2
n
 
2(k 1)m
 Denominator contains

2
 
 m
Better density bounds?
7
Boolean bounds

This talk will cover



Density-bounding technique from [Ajtai 99a] with
improvements from [B-Saks-Sun-Vee 00]
Yields density  2em which is large enough for the
Boolean case
Yields

log n / S  

T  n
 loglog n / S  


8
Generalized method for choosing 1, 2

Generalization of the method from [BRS 89],
[BST 98]

Distribution q for probability q  1/2
 Pr[Li   ] = Pr[Li  2] = q
1
 Pr[Li     ] = 1  2q
1
2


Independent for each i
E[|core(x, 1)|]= E[|core(x, 2)|]  n qk
9
Second Moment Method


Var[|core(x, )|]  (k2n/r) E[|core(x, )|]
= (k2n/r) m
By Chebyshev’s inequality

Pr[ m/2  |core(x, )|  3m/2]
 1  Var[|core(x, )|]/( m/2)2
 1  4k2(1/q)k/r
since m  n qk

Choose r=8k2(1/q)k
10
First fix the trace
• f-1(1) and (1,2) are both disjoint unions
over the  2Sa choices of the trace
• we’ll bound densities in each separately
From now on when working with a fixed
partition, without saying it explicitly, we will
usually assume that the function f=ft for
some trace t
11
A simplifying assumption
On every input the BP reads every variable at least once
• Can easily ensure this by starting with n dummy queries
• Why bother?
• It gives an alternate characterization of core(x, 1)
• core(x, 1) = unread(x, 1)
12
Analyzing density - key observations
Embedded rectangle R in (1,2)
RA
s
r
RB
Every x in R has A=core(x, 1) and B=core(x, 2)
|RA|| ==##of
ofways
waysof
ofextending
varying x on
A and
staying
|R
r and
staying
in Rin R
A
Let r be the part outside A of some x in R
super-stem
13
How the cores can vary
1, 2, rest
v0
L1
v1
L2
v2
v3
v4
L5
0
1
Path of x
Path of y
r
i  core(x, 1)
 xi not read outside 1 on input x
 xi not read outside 1 on input y
 i  core(y, 1)
14
Analyzing density - key observations
Embedded rectangle R in (1,2)
RA
s
r
RB
Every x in R has A=core(x, 1) and B=core(x, 2)
|RA|| ==##of
ofways
waysof
ofextending
varying x on
A and
staying
|R
r and
staying
in Rin R
A
Let r be the part outside A of some x in R super-stem
Any input yDn agreeing with r has A=core(y, 1)
15
Lower-bounding density of rectangles

Look at rectangles that contain assignments
in f-1(1)  (DAr)


To show that most inputs are in rectangles R
with large |RA| it suffices to show that


R1A, R2A, R3A,… partition the projection of
f-1(1)  (DAr) on A
Any assignment r  super-stems(1) is
consistent with very few rectangles: numrects(r)
I.e., show numrects(r) is small relative to
|D|n|r|
16
Bounding numrects(r)

For r super-stems(1), any rectangle
containing r has the same A=core(  , 1)


Only option is choice of B=core(  , 2) since
the stem will be fixed by r
To count # of choices it suffices to show that
B D B’ is small for any rectangles R, R’
agreeing with r
17
New Goal: Bounding Symmetric Differences
For r super-stems(1) and x, y agreeing with r,
show |core(x, 2) D core(y, 2)| is small
…
and the same with roles of 1 , 2 reversed
18
How the cores can vary
v0
L1
v1
L2
v2
Variables read outside 1 are the same
on x and y since all are set by r
Only way i  core(x, 2)  core(y, 2)
is if xi is read in 1 on input y but not
on input x
v3
v4
0
1, 2, rest
Path of x, Path of y, r
1
L5 Key: variables in the symmetric
difference are read more!
19
Using the access pattern to bound the core
difference

Partition f-1(1) into classes depending on the
access pattern of the input


For xf-1(1) define patternx:[r] [n] given by
 patternx(t) = # { i: |Layers(x,i)| = t }
 number of variables read in exactly t layers
For each class C will define 1 , 2 so that


for all x in C, variables read in t layers will account
for almost all of core(x, 1), core(x, 2)
Variables in core(x, 2) D core(y, 2) will be read
in t layers on input either x or y
20
More precise characterization

For any t, core(x, 2) D core(y, 2) is
contained in
G2(x,t)G2(y,t)H2(x,t)H2(y,t)
where
 iG2(z,t) iff icore(z, 2) but
|Layers(z,i)| t

iH2(z,t) iff |Layers(z,i)  2|  t,
|Layers(z,i)  1|  1, and
Layers(z,i)  1  2
21
Recall method for choosing 1, 2

Distribution q for probability q  1/2
 Pr[Li   ] = Pr[Li  2] = q
1
 Pr[Li     ] = 1  2q
1
2

Independent for each i
22
Choosing the probabilities
Claim: There is a set Q of 2k probabilities q,
each at least k16k, such that for almost all z,
there is an integer tt(z)k with
E[|G2(z,t) H2(z,t)|]  E[|core(z, 2)|]
for 1,2 chosen from q where q=q(z)Q
With these values
E[|core(z, 2)|]  nqk  n (k-16k)k  n (k-16k2)
log n
For k  c
this is  n1
loglog n
23
Some issues


Inputs x and y extending some r super-stems(1)
may not have the same q and t
 We actually apply the above reasoning separately
for disjoint subsets Iq,t f-1(1) of inputs
We can bound |core(x, 2) D core(y, 2)| relative to
max{|core(x,2)|, |core(y,2)|} but need it in terms
of |core(x,1)|  |core(y, 1)|

Expectations of the cores of an input on 1 and 2
are the same and concentration of core(z, 1)
about its mean says these are similar for x and y
because core(x,1)  core(y, 1)
24
Randomized Lower Bounds

Recall: once 1,2 are fixed we obtain the
partition (1,2) of f-1(1) into embedded
rectangles


We only keep the good part of each partition
There are 2k choices of 1,2 that suffice to
cover most of f-1(1)



Each input in the good part of f-1(1) is contained in
at most 2k embedded rectangles
Implies original error multiplied at most 2k times
when looking at embedded rectangles
Works with initial error O(1/k)
25
Proof of the Claim: Tailoring q to the access
pattern to bound G2(z,t) and H2(z,t)

Let pt=patternz(t) = # { i: |Layers(x,i)| = t }

Define m(z,q) = t pt qt


Let t(q) be the index of the largest term pt qt
in t pt qt


Note m(z,q)=E[|core(z, 1)|]=E[|core(z, 2)|]
Pick the smallest such index if there are ties
Want to choose q so the term with index t(q)
is  the rest
26
m(z,q) = t pt qt

Let t(q) is non-increasing in q


Decreasing q shifts weight away from larger terms
If q  1/(4k) then t(q)  k

Since t pt = n it follows that  t  k pt qt  n qk+1

t pt qt =E[|core(z, 1)|]  n qk

First k terms add up to all but a q=1/(4k) fraction
of t pt qt

One of the first k terms must be larger than all the
other terms
27
Choices of q


Q = { qb=k-8b : 1b 2k }
Since 1  t(qb)  k and t(qb+1)  t(qb) are
integral, by PHP there must be a b such that
t(qb+1)  t(qb)  t(qb-1)





Set q(z)=qb
t(qb+1)  t(qb) implies term  terms with smaller t
t(qb)  t(qb-1) implies term  with larger t
This bounds G2(z,t)
Bounding H2(z,t) a little trickier since accesses
divided between 1,2; forces at least a factor of
k decrease between qb and qb+1
28
What functions are this hard?

Computing xTMyx  0 (mod 2)
for x {0,1}n, y {0,1}2n-1


Given x {0,1}n, compute the parity of the
number of (i,j) such that xi xj xi+j is true


Defined in [Ajtai 99b]
By reduction from previous problem [Ajtai 99b]
Element distinctness: Given x  [n2]n determine
whether or not all xi are distinct.
29
Why ED doesn’t have large embedded rectangles


Let RA DA and RB  DB have density more
than 2-|A| and 2-|B| respectively
Then more than |D|/2 elements of D appear in
RA and similarly for RB



Rectangle contains non-distinct input vector
If |D|n2 then |ED-1(1)|  |Dn|/e
Randomized bounds extend set-disjointness
technique of [Babai-Frankl-Simon 86]

n-2 error
30
The end

Bounds for quadratic form based on rigidity
argument [Ajtai 99b]
Given rigidity, randomized bounds follow from
discrepancy argument using pairwise
independence (Lindsay’s Lemma) [BSSV 00]

Open:better bounds, more functions

31