Corruption and Recovery-Efficient Locally Decodable Codes

Download Report

Transcript Corruption and Recovery-Efficient Locally Decodable Codes

Corruption and RecoveryEfficient Locally Decodable
Codes
David Woodruff
IBM Almaden
Locally Decodable Codes
•
A binary (q, δ, ε)-LDC is an encoding
m
C: {0,1}n -> {0,1}
for which there is a machine A with access to a noisy
version y of C(x)
•
8 x 2 {0,1}n, if ¢(y, C(x)) · δm, then 8 k 2 [n],
Pr[Ay(k) = xk] ¸ ½ + ε (probability over A’s coins)
•
A always queries at most q coordinates of y
•
C is called linear if C is a linear transformation
Locally Decodable Codes
• Tradeoff between message length n, encoding length m, number of
queries q, fraction of corrupted bits δ, and recovery probability ½+ ε
We focus on the popular case when q is constant
• For q = 1, LDCs do not exist (Katz, Trevisan)
• For q = 2, LDC is linear and Hadamard-based, and achieves m =
exp(δn) which is optimal for linear codes (Obata, improving upon
Goldreich, Karloff, Schulman, Trevisan)
• For q = 3, known construction has m = exp(exp(log1/2 n log log n)),
assuming ε and δ are constant (Efremenko, improving upon
Yekhanin)
• All known constructions of LDCs are linear
Main Result
We give a black box transformation from a linear LDC into a non-linear
LDC with a better dependence on δ and ε.
Can yield significant improvements in applications in which δ and ε may
be flexible (e.g., small constants or sub-constant).
Theorem: Given a family of (q, δ, ½-βδ)-LDCs of length m(n), where q is a
constant, β > 0 is a constant, and δ < 1/(2β), there is a family of nonlinear (q, £(δ), ε)-LDCs of length poly(1/(ε+ δ)) m(max(ε, δ)δn).
Hadamard code is (2, δ, ½-2δ)-LDC with m(n) = 2n. We get a family of
(2, £(δ), ε)-LDCs of length poly(1/(ε+δ))exp(max(ε,δ) δn).
Separates linear and non-linear LDCs as there is an exp(dn) lower
bound for linear 2-query LDCs (answers a question posed by
Kerenidis and de Wolf)
Improves the exponent (in terms of δ, ε) for constant q-query LDCs,
replacing occurrences of n in known constructions with max(δ, ε)δn
Additional Results
• Our result gives a 2-query LDC with m(n) = exp(max(ε,δ) δn).
Known lower bound for general (non-linear) 2-query LDCs is m(n) ¸
exp(ε2 δn) (Kerenidis, de Wolf)
• What is the optimal dependence on ε, δ?
• We improve the lower bound to m(n) ¸ exp(max(ε,δ) δn) when the
decoder is a matching sum decoder (generalizes the perfectly
smooth decoder of Trevisan, all known decoders have this property)
• Assumption: decoder has partial matchings M1, …, Mn of edges on
vertices {1, …, m}. Given k in [n] , the decoder chooses a uniformly
random edge {a, b} in Mk and outputs ya © yb (recall that y is the
received word). Note that the encoding need not be linear.
Additional Results
• Our lower bound technique also yields concrete improvements in the
best known 3-query LDCs for constant δ and ε
• The best known 3-query LDC has m(n) = exp(exp(log1/2 n log log n)),
but has recovery probability < ½ as soon as δ > 1/12
• We get a 3-query LDC of length m(n) = exp(exp(log1/2 n log log n))
for any δ < 1/6
• Our LDC, as well as Efremenko’s, has a matching sum decoder, and
there is no 3-query LDC with a matching sum decoder with δ > 1/6
Techniques
Theorem: Given a family of (q, δ, ½-βδ)-LDCs of length m(n), where q is a
constant, β > 0 is a constant, and δ < 1/(2β), there is a family of non-linear
(q, £(δ), ε)-LDCs of length poly(1/(ε+ δ)) m(max(ε, δ)δn).
Take x in {0,1}n and partition n coordinates into n/r blocks B1, …, Bn/r , each
containing r = £((ε+ δ)-2) coordinates. Compute zj = majority(xi | i in Bj), and
encode z1, …, zn/r with a (q, δ, ε)-LDC C.
If k 2 Bj , then Prx in {0,1}n[xk = zj] ¸ ½ + 3q(ε+δ).
Choose s1, …, st in {0,1}n uniformly at random, apply the above procedure to
each of x © s1, x © s2, …, x © st, and take the concatenation.
s1, …, st are chosen by the probabilistic method so that 8 x in {0,1}n and 8 k 2
[n], if k 2 Bj, then Pri in [t][(x © si)k = majority{(x © si)l | l in Bj}] ¸ ½ + 2q(ε+δ).
Length of encoding is t ¢ m(n/r) = poly(1/(ε+ δ)) ¢ m(n(ε+δ)2)
Techniques
• To decode, choose a random i 2 [t]. Decode from the q positions in the
(corrupted) encoding of x © si.
• Two sources of error:
1. Adversarial δ fraction of bits flipped in encoding
2. Sometimes (x © si)k unequal to majority of bits in corresponding
block.
• If error sources were independent, decoder’s probability would be
Pri[(x © si)k = majority{(x © si)l | l in Bj}]*(1-qδ) ¸ (1/2+2qε+2qδ)(1-qδ) > ½ + ε
• Not independent though, as adversary can first decode to recover x, then
However,
with probability at least 1-qδ, no queried position is corrupted.
guess a k 2 [n], then corrupt exactly those encodings of x © si for which
By
a union bound, Pr[decode correctly] ¸(1/2+2qε+2qδ) – qδ > ½ + ε
(x © s ) equals the majority of bits in the corresponding block.
i k
Techniques
•
We have a (q, £(δ), ε)-LDC, but the length is poly(1/(ε+ δ)) ¢ m(n(ε+δ)2).
•
If q = 2, this gives poly(1/(ε+ δ))exp(n(ε+δ)2). However, there is a linear 2query LDC with length poly(1/(ε+ δ))exp(δn).
•
If ε > δ, our LDC might be longer. We can handle ε > δ as follows:
•
Break the message x in {0,1}n into £(ε/δ) groups, each of size £((δ/ε)n).
•
Encode each group using the above procedure, and concatenate the
encodings.
The length is m’ = poly(1/(ε+ δ))exp((δ/ε)n(ε+δ)2) = poly(1/(ε+ δ))exp(εδn).
•
Inside each group, the adversary can corrupt up to δm’ positions, which
since the group has £((δ/ε))m’ positions, is an ε fraction of positions. Thus,
Pr[decode correctly] ¸(1/2+2qε+2qδ) – qε > ½ + ε, as desired.
Recap
X1
X2
…
…
…
Xn
Break into d = max(1, £(ε/δ)) groups
Group 1
Group 2
…
Group d
For each group,
1. Break into (n/d)(ε+δ)2 blocks each of size 1/(ε+δ)2.
2. Compute the majority bit of each block.
Repeat for
several
different shifts x
© si for random
si in {0,1}n
3. Encode the majority bits using an existing q-query LDC.
Lower Bound Techniques
•
How good is our upper bound?
•
For 2 queries we achieve m(n) = exp(max(ε,δ)δn)
•
Kerenidis and de Wolf show m(n) ¸ exp(ε2δn) for 2-query non-linear LDCs
(recall that for 2-query linear LDCs, the bound is m(n) ¸ exp(δn)).
•
We improve the bound to a tight m(n) = exp(max(ε,δ)δn) under the
assumption that the decoder has partial matchings M1, …, Mn of edges on
vertices {1, …, m}. Given k in [n] , the decoder chooses a uniformly random
edge {a, b} in Mk and outputs ya © yb (recall that y is the received word).
•
Any linear LDC can be assumed to be in this form by minor modifications.
Our LDC also has this form (so all known LDCs have this form).
Lower Bound Techniques
•
Intuition: fix a matching Mk. For each edge e=(a,b) in Mk, look at the
probability pk,e that C(x)a © C(x)b = xk for a random x in {0,1}n
•
Let qk be the probability the decoder succeeds, assuming there are no bits
flipped by the adversary, over a random x in {0,1}n
•
By our assumptions, qk = sume in Mk pk, e/|Mk|
•
(correctness) qk ¸ ½ + ε
•
(restricted decoder) qk ¸ ½ + δm/|Mk|. Otherwise there is a fixed x in {0,1}n
that has less than |Mk|/2 + δm edges that can be used for recovering xk. The
adversary can flip one endpoint of exactly δm edges in Mk
•
Main “Average-case LDC Lemma”: suppose we have matchings Mk with
sizes ckm such that for all k, qk ¸ ½ + rk. Let r = sumk=1n rk/n and c = sumk=1n
ck/n. Then m ¸ exp(ncr2).
Lower Bound Techniques
• Main Lemma: suppose we have matchings Mk with sizes
ckm such that for all k, qk ¸ ½ + rk. Let r = sumk=1n rk/n and
c = sumk=1n ck/n. Then m ¸ exp(ncr2).
• Our claims imply rk ¸ max(ε, δm/|Mk|) = max(ε, δ/ck).
• Can show that exp(ncr2) is minimized at exp(max(ε,δ)δn).
• Proof of the main lemma generalizes earlier quantum
information theory arguments of Kerenidis and de Wolf.
Conclusions
• Gave a black box transformation from a linear LDC into
a non-linear LDC with a better dependence on δ and ε.
– Separates linear and non-linear 2-query LDCs
– Yields 3-query LDCs with best known dependence
on δ and ε
• Gave a tight lower bound for 2-query LDCs with
matching sum decoders.
• Extended the range of δ for which 3-query LDCs
become non-trivial from δ < 1/12 to δ < 1/6.
• General question: how are the parameters of linear
and non-linear LDCs related?
Additional Perspective
• To prove the main lemma in our lower bound, we need various
transformations between LDCs.
• One such transformation yields concrete improvements in the best
known 3-query LDCs for constant δ and ε
• Best known 3-query LDC has m(n) = exp(exp(log1/2 n log log n)), but
has recovery probability < ½ as soon as δ > 1/12
• We get a 3-query LDC of length m(n) = exp(exp(log1/2 n log log n))
with non-trivial recovery probability for any δ < 1/6 (and we preserve
linearity)
• Our LDC, as well as Efremenko’s, has a matching sum decoder, and
there is no 3-query LDC with a matching sum decoder with δ > 1/6
An LDC Transformation
•
Take Efremenko’s linear 3-query LDC of length m(n). Identify the codeword
positions with linear forms vj, so that the j-th position computes <vj, x>
•
For all k in [n], there is a matching Mk of triples {a, b, c} of codeword
positions so that va © vb © vc = ek with |Mk| ¸ Ám for a constant Á > 0
•
Consider a new LDC formed by taking each ordered multiset S = {a1, …, ap)
of size p = O(ln 1/Á), and creating the entry a1 © a2 © … © ap
•
New codeword length is m(n)p, which is still exp(exp(log1/2 n log log n))
•
With high probability, for any multiset S and any k, there exists an a in S for
which (a, b) in Mk for some b, and so one can consider the edge (S, T),
where T is the multiset formed by removing a from S and inserting b.
•
We boost the size of matchings, and this gives a better dependence on δ
There are a few minor issues to ensure the matchings are well-defined.