An Ω(n1/3) Lower Bound for Bilinear Group Based PIR

Download Report

Transcript An Ω(n1/3) Lower Bound for Bilinear Group Based PIR

Codes with local decoding procedures
Sergey Yekhanin
Microsoft Research
Error-correcting codes: paradigm
Sender
Receiver
𝑋 ∈ 𝐹2π‘˜
E(𝑋) ∈ 𝐹2𝑛
E(𝑋) +noise
𝑋
0110001
011000100101
01*00*10010*
0110001
Encoder
Channel
Decoder
Erases up to 𝑒
coordinates.
β€’ The paradigm dates back to 1940s (Shannon / Hamming)
β€’ One limitation: recovering a single message coordinate requires
processing all corrupted codeword
Local decoding: paradigm
𝑋 ∈ 𝐹2π‘˜
0110001
E(𝑋) ∈ 𝐹2𝑛
E(𝑋) +noise
011000100101
01*00*10010*
Encoder
𝑋𝑖
1
Channel
Local
Decoder
Erases up to 𝑒
coordinates.
Reads up to π‘Ÿ
coordinates.
Local decoder runs in time much smaller than the message length!
β€’ First account: Reed’s decoder for Muller’s codes (1954)
β€’ Implicit use: (1950s-1990s)
β€’ Formal definition and systematic study (late 1990s) [Levin’95, STV’98, KT’00]
 Original applications in computational complexity theory
 Cryptography
 Most recently used in practice to provide reliability in distributed storage
(Microsoft Azure, Windows Server, Windows, Hadoop, etc.)
Local decoding: example
E(X)
X1
X
X1
X2
X3
X1 X2
X2
X1 X3
X1 X2 X3
Message length: k = 3
Codeword length: n = 7
Erased locations: 𝑒 = 3
Locality: π‘Ÿ = 2
X3
X2 X3
Local decoding: example
E(X)
X1
X
X1
X2
X3
X1 X2
X2
X1 X3
X1 X2 X3
Message length: k = 3
Codeword length: n = 7
Erased locations: 𝑒 = 3
Locality: π‘Ÿ = 2
X3
X2 X3
Local decoding: Decoding tuples for X1
E(X)
X1
X2
X3
X
X1
X2
X3
X1οƒ…X2
X1οƒ…X3
X1οƒ…X2οƒ…X3
X2οƒ…X3
Local decoding: Decoding tuples for X2
E(X)
X1
X2
X3
X
X1
X2
X3
X1οƒ…X2
X1οƒ…X3
X1οƒ…X2οƒ…X3
X2οƒ…X3
Codes with local decoding
Setting: Encode π‘˜ dimensional messages to 𝑛 dimensional codewords.
Main parameters: Redundancy 𝑛 βˆ’ π‘˜, locality π‘Ÿ, and noise level 𝑒.
Goal: Understand the true shape of the tradeoff between redundancy and
locality, for different settings of noise. (e.g., 𝑒 = 𝛿𝑛, π‘›πœ– , 𝑂 1 .)
Applications in
crypto / complexity
π‘˜πœ€
(log π‘˜)𝑐
Multiplicity
codes
Local
reconstruction
codes
Projective
geometry
codes
Locally decodable codes
π‘Ÿ
Applications to
data storage
Reed Muller
codes
Matching
vector
codes
𝑂(1)
𝑂(1)
π‘›πœ€
𝛿𝑛
𝑒
Taxonomy of known families of codes
Plan
β€’ Part I: (Locally decodable codes)
β€’ Private Information Retrieval (PIR) schemes
β€’ PIR schemes from smooth codes
β€’ Reed Muller codes
β€’ Part II: (Codes with locality for distributed data storage)
β€’ Erasure coding for data storage
β€’ Local reconstruction codes for data storage
β€’ Constructions and limitations
Part I: Locally decodable codes
Smooth codes
E(X)
X
X1
X2
c1
…
Xk
c1
ci
c5
…
c4
c7
c8
c2
c6
c3
cj
cn
Definition: Consider a code 𝐸 that encodes π‘˜β€“dimensional messages 𝑋 to
𝑛 βˆ’dimensional codewords 𝐸(𝑋). For every 𝑖 in [π‘˜], we have a family of
decoding π‘Ÿ βˆ’tuples 𝐷𝑖. We say that E is π‘Ÿ βˆ’smooth if for each 𝑖 in [π‘˜],
π‘Ÿ βˆ’tuples in 𝐷𝑖 partition the set [𝑛].
Note: If the code 𝐸 is π‘Ÿ βˆ’smooth; then each X𝑖 can be recovered by
𝑛
reading π‘Ÿ coordinates after 𝑒 = βˆ’ 1 erasures in 𝐸(𝑋).
π‘Ÿ
Private information retrieval
[CGKS]
Protocols that allow users to privately retrieve items from replicated DBs
X→E(X)
Protocol:
β€’
β€’
β€’
β€’
β€’
…
X→E(X)
Each server encodes the π‘˜-bit
database 𝑋 with the same π‘Ÿ-query
smooth code
The user interested in 𝑋𝑖, picks a random π‘Ÿ-tuple 𝑇 from 𝐷𝑖
The user sends π‘Ÿ elements of 𝑇 to π‘Ÿ different servers
Servers respond with respective coordinates of 𝐸(𝑋)
User finds out 𝑋𝑖
(Each server observes a sample from a uniform distribution on [𝑛].)
Private information retrieval
Properties of the protocol:
β€’ Information theoretic privacy ↔ smoothness
β€’ Number of DB replicas needed ↔ locality
β€’ Communication complexity ↔ codeword length
Short smooth codes with low query
complexity yield efficient PIR schemes.
Example:
3-server schemes with 2𝑂 log π‘˜
-communication to access an π‘˜-bit DB.
[Dvir Gopi’ 2014]: 2-server scheme
with the same communication.
Reed Muller codes
β€’
Parameters: π‘ž, π‘š, 𝑑 = π‘ž βˆ’ 2.
β€’
Codewords: evaluations of degree 𝑑 polynomials in π‘š variables over πΉπ‘ž .
β€’
Polynomial 𝑓 ∈ πΉπ‘ž 𝑧1 , … , π‘§π‘š , deg f ≀ 𝑑 yields a codeword: 𝑓(π‘₯)
β€’
Encoder is systematic.
β€’
Parameters: 𝑛 = π‘ž π‘š , π‘˜ =
β€’
π‘š+𝑑
.
π‘š
We argue that the code is π‘Ÿ-smooth for π‘Ÿ = π‘ž βˆ’ 1.
π‘₯βˆˆπΉπ‘žπ‘š
Reed Muller codes: local decoding
β€’
Key observation: Restriction of a codeword to an affine line yields an
evaluation of a univariate polynomial 𝑓 𝐿 of degree at most 𝑑.
q, m , d .
β€’ Decoding tuples for the value at π‘₯:
– Consider all affine lines through π‘₯.
– Use polynomial interpolation.
πΉπ‘žπ‘š
π‘₯
β€’ Smooth code: Affine lines partition the space.
Decoder reads π‘ž βˆ’ 1 coordinates.
Reed Muller codes: parameters
𝑛 = π‘žπ‘š ,
π‘˜=
π‘š+𝑑
,
π‘š
𝑑 = π‘ž βˆ’ 2,
Setting parameters:
π‘Ÿ = π‘ž βˆ’ 1,
1
π‘Ÿβˆ’1
β€’ q = 𝑂 1 , π‘š β†’ ∞:
π‘Ÿ = 𝑂 1 , 𝑛 = exp π‘˜
β€’ q = π‘š2
π‘Ÿ = (log π‘˜)2 , 𝑛 = π‘π‘œπ‘™π‘¦ π‘˜ .
∢
β€’ q β†’ ∞, π‘š = 𝑂 1 :
𝑒 = 𝛿𝑛.
.
π‘Ÿ = π‘˜πœ– , 𝑛 = 𝑂 π‘˜ .
Better
codes are
known
Reducing codeword length of locally decodable codes is a major open problem.
Part II: Distributed storage
Data storage
β€’ Store data reliably
β€’ Keep it readily available for users
Data storage: Replication
β€’ Store data reliably
β€’ Keep it readily available for users
β€’ Very large overhead
β€’ Moderate reliability
β€’ Local recovery:
Lose one machine, access one
Data storage: Erasure coding
β€’ Store data reliably
β€’ Keep it readily available for users
…
…
…
β€’ Low overhead
β€’ High reliability
β€’ No local recovery:
Loose one machine, access π‘˜
π‘˜ data chunks
𝑛 βˆ’ π‘˜ parity chunks
Need: Erasure codes with local decoding
Codes for data storage
X1
X2
…
Xk
P1
…
Pn-k
β€’ Goals:
β€’ (Cost) minimize the number of parities.
β€’ (Reliability) tolerate any pattern of h + 1 simultaneous failures.
β€’ (Availability) recover any data symbol by accessing at most π‘Ÿ other symbols
β€’ (Computational efficiency) use a small finite field to define parities.
Local reconstruction codes
β€’ Def: An (π‘Ÿ, β„Ž) – Local Reconstruction Code (LRC) encodes π‘˜ symbols to 𝑛 symbols, and
β€’ Corrects any pattern of β„Ž + 1 simultaneous failures;
β€’ Recovers any single erased data symbol by accessing at most π‘Ÿ other symbols.
Local reconstruction codes
β€’ Def: An (π‘Ÿ, β„Ž) – Local Reconstruction Code (LRC) encodes π‘˜ symbols to 𝑛 symbols, and
β€’ Corrects any pattern of β„Ž + 1 simultaneous failures;
β€’ Recovers any single erased data symbol by accessing at most π‘Ÿ other symbols.
β€’ Theorem[GHSY]: In any (π‘Ÿ, β„Ž) – (LRC), redundancy 𝑛 βˆ’ π‘˜ satisfies 𝑛 βˆ’ π‘˜ β‰₯
π‘˜
π‘Ÿ
+ β„Ž.
Local reconstruction codes
β€’ Def: An (π‘Ÿ, β„Ž) – Local Reconstruction Code (LRC) encodes π‘˜ symbols to 𝑛 symbols, and
β€’ Corrects any pattern of β„Ž + 1 simultaneous failures;
β€’ Recovers any single erased data symbol by accessing at most π‘Ÿ other symbols.
β€’ Theorem[GHSY]: In any (π‘Ÿ, β„Ž) – (LRC), redundancy 𝑛 βˆ’ π‘˜ satisfies 𝑛 βˆ’ π‘˜ β‰₯
π‘˜
π‘Ÿ
+ β„Ž.
β€’ Theorem[GHSY]: If π‘Ÿ π‘˜ and β„Ž < π‘Ÿ + 1; then any (π‘Ÿ, β„Ž) – LRC has the following topology:
Light
parities
Data symbols
Heavy
parities
…
L1
X1
…
Lg
Xr
…
Xk-r
H1
…
Hh
…
Local
group
Xk
Local reconstruction codes
β€’ Def: An (π‘Ÿ, β„Ž) – Local Reconstruction Code (LRC) encodes π‘˜ symbols to 𝑛 symbols, and
β€’ Corrects any pattern of β„Ž + 1 simultaneous failures;
β€’ Recovers any single erased data symbol by accessing at most π‘Ÿ other symbols.
β€’ Theorem[GHSY]: In any (π‘Ÿ, β„Ž) – (LRC), redundancy 𝑛 βˆ’ π‘˜ satisfies 𝑛 βˆ’ π‘˜ β‰₯
π‘˜
π‘Ÿ
+ β„Ž.
β€’ Theorem[GHSY]: If π‘Ÿ π‘˜ and β„Ž < π‘Ÿ + 1; then any (π‘Ÿ, β„Ž) – LRC has the following topology:
Light
parities
Data symbols
Heavy
parities
…
L1
X1
…
Lg
Xr
…
Xk-r
H1
…
Hh
…
Local
group
Xk
β€’ Fact: [HCL] There exist (π‘Ÿ, β„Ž) – LRCs with optimal redundancy over a field of size π‘˜ + β„Ž.
Reliability
Set π‘˜ = 8, π‘Ÿ = 4, and β„Ž = 3.
L1
X1
X2
L2
X3
X5
X4
H1
H2
H3
X6
X7
X8
Reliability
Set π‘˜ = 8, π‘Ÿ = 4, and β„Ž = 3.
L1
X1
X2
L2
X3
X5
X4
H1
β€’ All 4-failure patterns are correctable.
H2
H3
X6
X7
X8
Reliability
Set π‘˜ = 8, π‘Ÿ = 4, and β„Ž = 3.
L1
X1
X2
L2
X3
X5
X4
H1
H2
β€’ All 4-failure patterns are correctable.
β€’ Some 5-failure patterns are not correctable.
H3
X6
X7
X8
Reliability
Set π‘˜ = 8, π‘Ÿ = 4, and β„Ž = 3.
L1
X1
X2
L2
X3
X5
X4
H1
H2
β€’ All 4-failure patterns are correctable.
β€’ Some 5-failure patterns are not correctable.
β€’ Other 5-failure patterns might be correctable.
H3
X6
X7
X8
Reliability
Set π‘˜ = 8, π‘Ÿ = 4, and β„Ž = 3.
L1
X1
X2
L2
X3
X5
X4
H1
H2
β€’ All 4-failure patterns are correctable.
β€’ Some 5-failure patterns are not correctable.
β€’ Other 5-failure patterns might be correctable.
H3
X6
X7
X8
Combinatorics of correctable failure patterns
Def: A regular failure pattern for a (π‘Ÿ, β„Ž)-LRC is a pattern that can be obtained by failing
at most one symbol in each local group and β„Ž extra symbols.
L1
X1
X2
L2
X3
X4
H1
X5
H2
X6
L1
X7
X8
X1
X2
H3
L2
X3
X4
H1
X5
H2
X6
H3
Theorem:
β€’ If a failure pattern that is not regular; then it is not correctable by any LRC.
β€’
There exist LRCs that correct all regular failure patterns.
X7
X8
Maximally recoverable codes
Def: An (π‘Ÿ, β„Ž)-LRC is maximally recoverable if it corrects all regular failure patterns.
Theorem: [BHH] Maximally recoverable (π‘Ÿ, β„Ž)-LRCs exist.
Proof sketch: Pick the coefficients in heavy parities at random from a large finite field.
Asymptotic setting: β„Ž = 𝑂 1 , π‘Ÿ = 𝑂 1 , π‘˜ β†’ ∞.
Random choice needs a field of size at least [KM]: Ξ© π‘˜ β„Žβˆ’1 .
The tradeoff: Larger fields allow for more reliable codes up to maximal recoverability.
We want both: small field size (efficiency) and maximal recoverability.
Explicit maximally recoverable codes
Theorem[GHJY]: There exist maximally recoverable (π‘Ÿ, β„Ž)-LRC over a field of size
π‘π‘˜
1
β„Žβˆ’1 1βˆ’ π‘Ÿ
2
.
Comparison:
β€’ Our alphabet grows as 𝑂 π‘˜ β„Žβˆ’1 or slower.
β€’ Beats random codes for small β„Ž and large β„Ž.
β€’ Our only lower bound for the alphabet size thus far is π‘˜ + 1 independent of β„Ž.
Code construction
We use dual constraints to specify the code.
π‘˜
π‘Ÿ
π’™πŸ
π’™πŸ
…
𝒙𝒓
π‘³πŸ
1
1
…
1
1
…
π’™π’Œβˆ’π’“ π’™π’Œβˆ’π’“+𝟏
1
1
π‘˜
π‘Ÿ
+ 1 local groups.
…
π’™π’Œ
π‘³π’Œ/𝒓
…
1
1
π‘―πŸ
π‘―πŸ
…
h
𝛼𝑖𝑗
2
𝛼𝑖𝑗
…
β„Žβˆ’1
2
𝛼𝑖𝑗
Element 𝛼𝑖𝑗 appears in the j-th column of the i-th group.
We consider a sequence field extensions 𝐹2 βŠ† 𝐹2π‘Ž βŠ† 𝐹2𝑏 .
{πœ‰π‘— } βŠ† 𝐹2π‘Ž form a basis over 𝐹2 .
{πœ†π‘– } βŠ† 𝐹2𝑏 are β„Ž-independent over 𝐹2π‘Ž .
𝛼𝑖𝑗 =πœ‰π‘— × πœ†π‘– .
𝑯𝒉
Erasure correction
k=8, r=4, h=2.
π’™πŸ
π’™πŸ
π’™πŸ‘
π’™πŸ’
π‘³πŸ
1
1
1
1
1
π’™πŸ“
π’™πŸ”
π’™πŸ•
π’™πŸ–
π‘³πŸ
π‘―πŸ π‘―πŸ π‘―πŸ‘
1
1
1
1
1
1
1
1
1
𝛼11 𝛼12
𝛼21 𝛼22
𝛼31
𝛼11 𝛼12 𝛼21 𝛼22 𝛼31
2
2
𝛼11
𝛼12
2
2
𝛼21
𝛼22
2
𝛼31
2
2
2
2
2
𝛼11
𝛼12
𝛼21
𝛼22
𝛼31
4
4
𝛼11
𝛼12
4
4
𝛼21
𝛼22
4
𝛼31
4
4
4
4
4
𝛼11
𝛼12
𝛼21
𝛼22
𝛼31
(𝛼11 +𝛼12 )
(𝛼21 +𝛼22 )
𝛼31
𝛼11 +𝛼12
𝛼21 +𝛼22
𝛼31
2
2
𝛼11
+𝛼12
2
2
𝛼21
+𝛼22
2
𝛼31
(𝛼11 +𝛼12 ) 2 (𝛼21 +𝛼22 )2
2
𝛼31
4
4
𝛼11
+𝛼12
4
4
𝛼21
+𝛼22
4
𝛼31
(𝛼11 +𝛼12 ) 4 (𝛼21 +𝛼22 )4
4
𝛼31
(𝛼11 +𝛼12 )
(𝛼21 +𝛼22 )
𝛼31
(πœ‰1 + πœ‰2 ) × πœ†1
(πœ‰1 + πœ‰2 ) × πœ†2
πœ‰1 × πœ†3
Looking forward
β€’ Codes with locality allow super-fast recovery of individual message
coordinates from corrupted codewords.
β€’ Such codes are used to provide reliability in distributed storage
and have many applications in theoretical computer science.
β€’ Many questions regarding these codes remain wide open:
– 𝑒=Ξ© 𝑛 :
β€’ π‘Ÿ = 3:
π‘˜ 2 ≀ 𝑛 π‘˜ ≀ 22
log π‘˜
.
β€’ π‘Ÿ = π‘˜ π‘œ(1) : π‘˜ ≀ n(k) ≀ πœ”(π‘˜).
- 𝑒 = 1: 𝑛(π‘˜) is well understood. Maximally recoverable codes?
- 𝑒 = 𝑂 1 : Tight bounds for 𝑛 π‘˜ ?