Coding for Modern Distributed Storage Systems II (slides)
Download
Report
Transcript Coding for Modern Distributed Storage Systems II (slides)
Coding for Modern Distributed
Storage Systems: Part 2.
Locally Repairable Codes
Parikshit Gopalan
Windows Azure Storage, Microsoft.
Rate-distance-locality tradeoffs
Def: An π, π, π π linear code has locality π if each co-ordinate can be expressed
as a linear combination of π other coordinates.
What are the tradeoffs between π, π, π, π?
[G.-Huang-Simitci-Yekhaninβ12]: In any linear code with information locality r,
π+1
π β₯
π + π β 2.
π
ο΄ Algorithmic proof using linear algebra.
ο΄ [Papailiopoulus-Dimakisβ12] Replace rank with entropy.
ο΄ [Prakash-Lalitha-Kamath-Kumarβ12] Generalized Hamming weights.
ο΄ [Barg-Tamoβ13] Graph theoretic proof.
Generalizations
ο΄ Non-linear codes
[Papailiopoulos-Dimakis, Forbes-Yekhanin].
ο΄ Vector codes
[Papailoupoulos-Dimakis, Silberstein-Rawat-Koyluoglu-Vishwanath, KamathPrakash-Lalitha-Kumar]
ο΄ Codes over bounded alphabets
[Cadambe-Mazumdar]
ο΄ Codes with short local MDS codes
[Prakash-Lalitha-Kamath-Kumar, Silberstein-Rawat-Koyluoglu-Vishwanath]
Explicit codes with all-symbol locality.
[Tamo-Papailiopoulos-Dimakisβ13]
ο΄ Optimal length codes with all-symbol locality for π = exp(π).
ο΄ Construction based on RS code, analysis via matroid theory.
[Silberstein-Rawat-Koyluoglu-Vishwanathβ13]
ο΄ Optimal length codes with all-symbol locality for π = 2π .
ο΄ Construction based on Gabidulin codes (aka linearized RS codes).
[Barg-Tamoβ 14]
ο΄ Optimal length codes with all-symbol locality for π = π(π).
ο΄ Construction based on Reed-Solomon codes.
Stronger notions of locality
ο΄ Codes with local Regeneration
[Silberstein-Rawat-Koyluoglu-Vishwanath, Kamath-Prakash-Lalitha-Kumarβ¦]
ο΄ Codes with short local MDS codes [Prakash-Lalitha-Kamath-Kumar,
Silberstein-Rawat-Koyluoglu-Vishwanath]
Avoids the slowest node bottleneck [Shah-Lee-Ramachandran]
ο΄ Sequential local recovery [Prakash-Lalitha-Kumar]
ο΄ Multiple disjoint local parities [Wang-Zhang, Barg-Tamo]
Can serve multiple read requests in parallel.
Problem: Consider an π, π π linear code where even after π arbitrary failures,
every (information) symbol has locality π. How large does π need to be?
[Barg-Tamoβ14] might be a good starting point.
Tutorial on LRCs
Part 1.1: Locality
1. Locality of codeword symbols.
2. Rate-distance-locality tradeoffs: lower bounds and constructions.
Part 1.2: Reliability
1. Beyond minimum distance: Maximum recoverability.
2. Constructions of Maximally Recoverable LRCs.
Beyond minimum distance?
Is minimum distance the right measure of reliability?
Two types of failures:
ο΄ Large correlated failures
Power outage, upgrade.
Whole data center offline.
ο΄ Can assume further failures are independent.
Beyond minimum distance?
4 Racks
6 Machines per Rack
ο΄ Machines fail independently with probability π.
ο΄ Racks fail independently with probability π β π3 .
ο΄ Some 7 failure patterns are more likely than 5 failure patterns.
Beyond minimum distance
4 Racks
6 Machines per Rack
Want to tolerate 1 rack failure + 3 additional machine failures.
Beyond minimum distance
ο΄ Want to tolerate 1 rack + 3 more failures (9 total).
Solution 1: Use a [24,15,10] Reed-Solomon code.
Corrects any 9 failures.
Poor locality after a single failure.
Beyond minimum distance
ο΄ Want to tolerate 1 rack + 3 more failures (9 total).
[Plank-Blaum-Hafnerβ13]:
Sector-Disk (SD) codes.
Solution 1: Use [24, 15, 6] LRCs derived from Gabidulin codes.
Rack failure gives a 18, 15, 4 MDS code.
Stronger guarantee than minimum distance.
Beyond minimum distance
ο΄ Want to tolerate 1 rack + 3 more failures (9 total).
[Plank-Blaum-Hafnerβ13]:
Partial MDS codes.
Solution 1: Use [24, 15, 6] LRCs derived from Gabidulin codes.
Rack failure gives a 18, 15, 4 MDS code.
Stronger guarantee than minimum distance.
Maximally Recoverable Codes
[Chen-Huang-Liβ07, G.-Huang-Jenkins-Yekhaninβ14]
Code has a topology that decides linear relations between symbols (locality).
Any erasure with sufficiently many (independent) constraints is correctible.
[G-Huang-Jenkins-Yekhaninβ14]: Let πΌ1 , β¦ , πΌπ‘ be variables.
1. Topology is given by a parity check matrix, where each entry is a linear
function in the πΌπ s.
2. A code is specified by choice of πΌπ s.
3. The code is Maximally Recoverable if it corrects every error pattern that its
topology permits.
ο΄ Relevant determinant is non-singular.
ο΄ There is some choice of πΌs that corrects it.
Example 1: MDS codes
β global equations:
π
πΌπ,π ππ = 0.
π=1
Reed-Solomon codes are Maximally Recoverable.
Example 2: LRCs (PMDS codes)
Assume π|π, (π + 1)|π. Want length π codes satisfying
1. Local constraints: Parity of each column is 0.
2. β Global constraints: Linear constraints over all symbols.
The code is MR if puncturing one entry per column gives an π + β, π
The code is SD if puncturing any row gives an π + β, π
Known constructions require fairly large field sizes.
π
MDS code.
π
MDS code.
Example 3: Tensor Codes
Assume π|π, (π + 1)|π. Want length π codes satisfying
1. Column constraints: Parity of each column is 0.
2. β constraints per row: Linear constraints over symbols in the row.
Problem: When is an error pattern correctible?
Tensor of Reed-Solomon with Parity is not necessarily MR.
Maximally Recoverable Codes
[Chen-Huang-Liβ07, G.-Huang-Jenkins-Yekhaninβ14]
Let πΌ1 , β¦ , πΌπ‘ be variables.
1. Each entry in the parity check matrix is a linear function in the πΌπ s.
2. A code is specified by choice of πΌπ s.
3. The code is Maximally Recoverable if it corrects every error pattern possible
given its topology.
[G-Huang-Jenkins-Yekhaninβ14] For any topology, random codes over sufficiently
large fields are MR codes.
Do we need explicit constructions?
ο΄ Verifying a given construction is good might be hard.
ο΄ Large field size is undesirable.
How encoding works
Encoding a file using an π, π
π
code πΆ.
Ideally field elements are byte vectors, so π = 28π .
1. Break file into π equal sized parts.
2. Treat each part as a long stream over πΉπ .
3. Encode each row (of π elements) using πΆ, to create π β π more streams.
4. Distribute them to the right nodes.
a
z
a
j
d
r
b
c
d
g
f
t
b
f
n
v
v
y
g
g
g
x
b
j
How encoding works
Encoding a file using an π, π
π
code πΆ.
Ideally field elements are byte vectors, so π = 28π .
1. Break file into π equal sized parts.
2. Treat each part as a long stream over πΉπ .
3. Encode each row (of π elements) using πΆ, to create π β π more streams.
4. Distribute them to the right nodes.
Step 3 requires finite field arithmetic over πΉπ .
ο΄ Can use log tables up to 224 (a few Gb).
ο΄ Speed up via specialized CPU instructions.
ο΄ Beyond that, matrix vector multiplication (dimension = bit-length).
Field size matters even at encoding time.
How decoding works
Decoding from erasures = solving a linear system of equations.
ο΄ Whether an erasure pattern is correctible can be deduced from the generator
matrix.
ο΄ If correctible, each missing stream is a linear combination of the available
streams.
Random codes are as βgoodβ as explicit codes for a given field size.
a
z
d
r
a
j
b
c
f
t
d
g
b
f
v
y
n
v
g
g
b
j
g
x
Maximally Recoverable Codes
[Chen-Huang-Liβ07, G.-Huang-Jenkins-Yekhaninβ14]
Thm: For any topology, random codes over sufficiently large fields are MR codes.
ο΄ Large field size is undesirable.
ο΄ Is there a better analysis of the random construction?
[Kopparty-Mekaβ13]: Random π + π β 1, π
probability exp βπ for π β€ π πβ1.
π
codes are MDS only with
Random codes are MR with constant probability for π = O(d β
π π ).
Could explicit constructions require smaller field size?
Maximally Recoverable LRCs
1. Local constraints: Parity of each column is 0.
2. β Global constraints.
The code is MR if puncturing one entry per
column gives an π + β, π π MDS code.
1. Random gives MR LRCs for π = O
πβ
π
π
β
π , SD for q = π πβ .
2. [Silberstein-Rawat-Koylouglu-Vishwanathβ13] Explicit MR LRCs with π = 2π .
[G.-Huang-Jenkins-Yekhanin]
ο΄ Basic construction: Gives π = π π β .
ο΄ Product construction: Gives π = π π
1βπ β
for suitable β, π.
Open Problems:
ο΄ Are there MR LRCs over fields of size π π ?
ο΄ When is a tensor code MR? Explicit constructions?
ο΄ Are there natural topologies for which MR codes only exist over
exponentially large fields? Super-linear sized fields?
Thank you
ο΄ The Simons institute, David Tse, Venkat Guruswami.
ο΄ Azure Storage + MSR: Brad Calder, Cheng Huang, Aaron Ogus, Huseyin
Simitci, Sergey Yekhanin.
ο΄ My former colleagues at MSR-Silicon Valley.