Erasure Correcting Codes for Highly Available Storage

Download Report

Transcript Erasure Correcting Codes for Highly Available Storage

Erasure Correcting Codes
for
Highly Available Storage
Thomas Schwarz, S.J.
Error Control Codes
Use redundancy to correct errors
Designed for
• Ease of Encoding
•Decoding (Calculation of syndrome / location of error)
• Error Correction Power (Burst Errors / Low Redundancy)
Error Control Codes
Block Codes:
Information Symbols + Parity Symbols
(i1i2 i3 i4 i5 i 6 i7 i8 p1 p2 p3)
Error Control Codes
Typical Applications:
Communication:
Deep Space “A match made in heaven”
Telephone
Computer Networks
Streaming Audio, Video (CD, DVD)
Storage (Main Memory, Magnetic & Optical Devices)
Error Correcting Codes
Most applications use hardware implemented encoding and
decoding.
Erasure Correcting Codes
Protect against erasure of data.
Simplest Erasure Correcting Code: Parity
i1 i2 i3 i4 i5 i6 i7 i8 p where p = i1 i2  i3  i4  i5  i6  i7 
i8
Erasure Correcting Codes
Some applications implement encoding and decoding in
hardware (e.g. RAIDs).
Software implementation is much more feasible because of
the simpler decoding problem.
Erasure Correcting Codes
Ideal Properties:
 Systematic: Data is stored explicitly. Data
updates do not change other data.
 MDS: Only as much parity data is created
as is necessary to reconstruct maximum
level of failures
 Simple encoding and decoding.
Parity Based Codes
Only use parity of data (XOR operation) for
ease of coding and decoding.
Parity Based Codes
History: Protection for Multitrack Magnetic Recording.
Prusinkiewicz & Budkowski 1976:
XXXXXXXXXX
Parity 1
XXXXXXXXXX
Data 1
XXXXXXXXXX
Data 2
XXXXXXXXXX
Data 3
XXXXXXXXXX
Parity 2
Horizontal and diagonal parity.
Parity Based Codes
Extend the scheme by using lines of different slopes.
Patel 1985: horizontal + 2 diagonals (slopes 0,1,-1)
However, the code is optimal only if the data band is infinite.
If not, there is (slightly) more parity than data.
Parity Based Array Codes
Idea: Break up data into m symbols. Arrange
the symbols in columns. Use horizontal and
vertical lines to calculate parity.
0
1

0

1
0
0
0
1
0
0
1
0
0
0
1
0
0
1
0
1
1

0
0

0
1st column: horizontal parity, 2nd column: vertical parity
Parity Based Array Codes
But is it not so simple!
0
0

0

0
1
0
1
0
Is a legitimate code word.
0
0
0
0
1
0
1
0
0
0
0
0
0

0
0

0
Parity Based Array Codes
0
0

0

0
?
?
?
?
0
0
0
0
?
?
?
?
0
0
0
0
0

0
0

0
But indistinguishable from the zero code word after
failure of columns 1 and 3.
Parity Based Array Codes
Number of Data Columns needs to be prime.
EvenOdd
•Better version of array codes for two parity
•Code words two-dimensional m-1 by m arrays
with two additional parity columns
EvenOdd
The EvenOdd code has as code words the m-1 by m+2 array of
symbols ai,j such that
m 1
ai ,m   ai ,t
t 0
m 1
ai ,m1   am1t ,t  S
t 1
m 1
m 2
m 2
t 1
i 0
i 0
S   am1t ,t   ai ,m   ai ,m1
EvenOdd Encoding
Set m=5. Start with an arbitrary 4 by 5 data array.
0
1

0

0
1
1
1
0
1
0
1
0
0
0
1
1
0
1
0
1






EvenOdd Encoding
Fill in the horizontal parity lines:
0
1

0

0
1
1
1
0
1
0
1
0
0
0
1
1
0
1
0
1
and calculate S to be a3,1+a2,2+a1,3+a0,4
S=0+1+0+0 = 1.
0
1
1
0






EvenOdd Encoding
0 1 1 0 0
1 1 0 0 1

0 1 1 1 0

0
0
0
1
1

o o o o o
S  0 1 0  0
0 1

1 0
1 0

0 0
o o 
EvenOdd Decoding
Assume that the last two data columns have failed.
0
1

0

0
o
1 1 ? ? 0 1

1 0 ? ? 1 0
1 1 ? ? 1 0

0 0 ? ? 0 0
o o o o o o 
EvenOdd Decoding
0 1 1 ? ? 0 1 
1 1 0 ? ? 1 0 


0 1 1 ? ? 1 0 


0 0 0 ? ? 0 0 
 o o o o o o o 
S  0 11 0 1 0  0  0  1
Use the parity columns to calculate S.
EvenOdd Decoding
0
1

0

0
o
1 1 ? ? 0 1

1 0 ? ? 1 0
1 1 ? ? 1 0

0 0 ? 1 0 0
o o o o o o 
Use S=1 and the magenta diagonal to find the data symbol in
the last column.
EvenOdd Decoding
0
1

0

0
o
1 1 ? ? 0 1

1 0 ? ? 1 0
1 1 ? ? 1 0

0 0 1 1 0 0
o o o o o o 
Then use the horizontal parity for one more symbol.
EvenOdd Decoding
0
1

0

0

o
1 1 ? ? 0 1

1 0 ? ? 1 0
1 1 ? 0 1 0

0 0 1 1 0 0
o o o o o o 
The blue diagonal now can be exploited.
EvenOdd
EvenOdd requires m is a prime.
Hence, for a given number n of data lines, choose m
to be the smallest prime  n.
Set the superfluous data columns to zero:
d0
d1 d2 d3 0 p0
p1 
EvenOdd
Encoding and Decoding only uses XOR operations.
Given formulae suggests an iterative procedure, but
the equations can be easily expanded to calculate the
symbols in parallel.
8(2m  2m  1)
XOR operations
m 1
2
Higher Array Codes
There exists array codes using only XOR
operations that can correct up to m erasures.
The decoding process involves solution of
a linear equation.
Algebraic Block Codes
Interpret symbols (larger than bits) as elements of a Galois Field.
Calculate parity symbols as linear combinations of the data
symbols.
Galois Fields
Only GF(2f) for simplicity’s sake.
Elements: Bit strings of length f.
Addition: XOR
Multiplication: Much more complicated.
Galois Field Multiplication
For GF(28). Elements are bytes.
Method 1: Identify byte with a binary
polynomial.
E.g. (0100 1001) = x6+x3+1
Multiply to polynomials as polynomials
modulo a generator polynomial.
E.g. modulo 1 0001 1101 = x8+x4+x3+x2+1.
Galois Field Multiplication
(1010,1010) (1001, 0101) 
1, 0101, 0100  1, 0001,1101  0, 0100,1001
0,1001, 0010
1, 0010, 0100  1, 0001,1101  0, 0011,1001
0, 0111, 0010  0,1010,1010  0,1101,1000
1,1011, 0000  1, 0001,1101  0,1010,1101
1, 0101,1010  0,1010,1010  1, 0001,1101  0,1110,1101
1,1101,1010  1, 0001,1101  0,1100, 0111
1,1000,1110  0,1010,1010  1, 001,1101  0, 0011,1001
 0011,1001
Combination of XORs and shifts!
Galois Field Multiplication
This multiplication gives a field structure to
GF(2f).
Multiplicative group is cyclic:
There are elements  such that all
nonzero elements can be written as
i , i=0,1 … 2f-1.
Galois Field Multiplication
For each non-zero element x  GF(2f) define
log(x)=i iff i=x.
Define
antilog(i) = i
Calculate
xy = antilog(log(x)+log(y)); if x0y
= 0;
if x=0 or y=0.
Galois Field Multiplication
Can be implemented with
two tables,
two zero comparisons,
four additions
three memory accesses.
9 elementary operations in a processor with
sufficient L1 cache to store 3*(2f –1) entries.
Linear Erasure Correcting
Block Codes
m data symbols u = (u0,u1,u2…um-1)
u0
u1
u2
u3
u0’
u1’
u2’
u3 ’
u0’’
u1’’
u2’’
u3’’
u0’’’
u1’’’
u2’’’
u3’’’
.
.
.
.
.
.
.
.
.
.
.
.
Bucket 0
Bucket 3
Code Word u’’
Linear Erasure Correcting
Block Codes
Add k=n – m parity symbols for code word a
u0
u1
u2
u3
p0
pk-1
u0’
u1’
u2’
u3 ’
p0’
pk -1’
u0’’
u1’’
u2’’
u3’’
p0 ’’
pk -1’’
u0’’’
u1’’’
u2’’’
u3’’’
p0’’’
pk-1’’’
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bucket 0
Bucket 3
.
Parity
Bucket
k-1
.
.
Linear Erasure Correcting
Block Codes
Calculate the parity symbols as a linear
combination of the data symbols:
( p0 , p1 ,
pk 1 )  (u0 , u1 ,
a  (u0 , u1 ,
 (u0 , u1 ,
, um1 ) G '
, um1 , p0 , p1 ,
, um1 ) G
With “Generator Matrix” G.
pk 1 )
Properties of a Good
Generator Matrix

Systematic:

MDS:
Left m by m matrix is
identity matrix.
All matrices formed from m
different columns of G are
invertible.
Thus: Any m coordinates of
code word a suffice to
calculate data word u.
Generation of Generator
Matrices

Find the largest rectangular matrix with
MDS property.
 Multiply from left with the inverse of the
matrix formed by the first m columns.
Result is still MDS and now systematic.
A 1
* *

* *


* *
* * * *
* * * *
* * * *
*  1 0
 
*  0 1

 
 
*   0 0
0 * * *
0 * * *
1 * * *
*

*


* 
Large MDS Matrices

There are known families of matrices with
the MDS property:
 Cauchy m+n = 2f
 Vandermonde n=2f–1
 Twice extended Vandermonde n =2f+1
Vandermonde Matrix
 a0
 1
a
 0
2
W   a0


 a m 1
 0
0
0
a1
a2
1
a1
1
a2
2
2
a1
m 1
a1
0
a2
m 1
a2
am 1 
0

am 1 
2
a m 1 


m 1 
a m 1 
1
Vandermonde Generator Matrix
 a

 a
V a


 a m 1
 0
0
0
1
0
2
0
0
1
1
1
2
1
0
2
1
2
2
2
0
n 1
1
n 1
2
n 1
a
a
a
a
a
a
a
a
a
m 1
1
m 1
2
a
a
m 1
n 1
a
0

0
0



1
Vandermonde Generator Matrix

Write column m as a linear combination of
the first m columns.
 Multiply column i (i=0,1,…m – 1) with this
coefficient (non-zero according to Cramer’s
Rule. (This preserves MDS.)
 Multiply with A-1, where A is the matrix
consisting of columns 0 to m – 1.
Vandermonde Generator Matrix
1 0

0
1

G


0 0
0
0
0
1
1
1
1
1
*
*
*
*
*

*


*
RS Erasure Correcting Codes

The generator matrix is that of a twice
extended, generalized Reed-Solomon code.
 Large number of parity symbols:
If symbols are bytes, then code length is
257.
RS Erasure Correcting Codes
Encoding:
Generation of a parity symbol costs:
m multiplications with known coefficients
m-1 XOR operation
7m-1 elementary operations
RS Erasure Correcting Codes
Change of one data symbol in a data word:
Calculate the difference d = uinew – uinew.
Send d to the site maintaining the parity symbol.
Multiply with coefficient gi,l of G.
Add to existing parity.
7 elementary operations per parity site.
1 elementary operation at data site.
1 message.
RS Erasure Correcting Codes
Erasure Correction:
Typical cases:
 Parity site has failed. Regenerate parity
from the data sites.
 Data site has failed. Use column m to
regenerate the data from the other data sites
and the XOR stored at this first parity site.
RS Erasure Correcting Codes
Erasure Correction General Case:
 Collect m survivors among data and parity
sites
 Invert the matrix consisting of the
corresponding columns of G
 Each replacement site uses this matrix and
G in order to calculate a decoding matrix H
RS Erasure Correcting Codes

Send surviving data to all replacement sites.
 Use decoding matrix in order to regenerate
the lost data or parity.
Measurements
XOR Update: 0.45 sec
 EvenOdd Update: 0.48 sec
 RS Update: 1.27 sec
Record Group with 4 records of 100 bytes.
One record is changed. Measured is time to
update parity symbol.
Used 700 MHz Pentium 3 Machine.
