distributed storage and NC - Institute of Network Coding

Download Report

Transcript distributed storage and NC - Institute of Network Coding

BASIC Regenerating Codes for
Distributed Storage Systems
Kenneth Shum
(Joint work with Minghua Chen, Hanxu Hou and Hui Li)
Window Azure data centers
Aug 2013
kshum
2
http://technoblimp.com
Inside a data center
Aug 2013
kshum
3
Data distribution
• Encode and distribute a data file to n storage
nodes.
Data File: “INC”
Aug 2013
kshum
4
Data collector
• Data collector can retrieve the whole file by
downloading from any k storage nodes.
“INC” 
Aug 2013
kshum
5
Three kinds of disk failures
• Transient error due to noise corruption
– repeat the disk access request
• Disk sector error
– partial failure
– detected and masked by the operating system
• Catastrophic error
– total failure due to disk controller for instance
– the whole disk is regarded as erased
Aug 2013
kshum
6
Frequency of node failures
Figure from “XORing elephants: novel erasure codes for Big Data”
by Sathiamoorthy et al.
Aug 2013
Number of failed nodes over a single month in a
3000 node production cluster of Facebook.
7
Outline of this talk
• Repetition scheme
• Traditional erasure-correcting codes
– Reed-Solomon codes
• Network-coding-based scheme
– BASIC regenerating codes
Aug 2013
kshum
8
Distributed storage system
• Encode a data file and distribute it to n disks
• (n,k) recovery property
– The data file can be rebuilt from any k disks.
• Repair
– If a node fails, we regenerate a new node by
connecting and downloading data from any d
surviving disks.
– Aim at minimizing the repair bandwidth
(Dimakis et al 2007).
• A coding scheme with the above properties is
called a regenerating code.
Aug 2013
kshum
9
Repetition scheme
• GFS: Replicate data 3 times
• Gmail: Replicate data 21 times
Aug 2013
kshum
10
2x Repetition scheme
Divide the data
file into 2 parts
A, B
1G
1G
1G
A
B
A
1G
B
Aug 2013
Data
Collector
Cannot tolerate
double disk failures
11
Repair is easy for repetition-based system
New node
A
A
B
1G
A
Repair bandwidth =1G
B
Aug 2013
12
Reed-Solomon Code
Divide the
file into 2 parts
A
A, B
B
Data
Collector
A+B
A+2B
Aug 2013
It can tolerate
double disk failures
13
Repair requires essentially decoding the
whole file
A
A
New node
1G
B
1G
A+B
Repair bandwidth = 2G
A+2B
Aug 2013
kshum
14
BASIC regeneration code
Divide the data
file into 4 parts
0.5G
0.5G
0.5G
0.5G




Aug 2013
Binary
Addition
Shift
Implementable
Convolutional
Utilization of bit-wise shift
in storage was proposed by
Piret and Krol (1983), and
Qureshi, Foh and Cai (2012).
15
Download from nodes 1 and 2
1G
0.5G
0.5G
0.5G
0.5G

1G
Data
Collector



Aug 2013
16
Download from nodes 1 and 3
1G
0.5G
0.5G
0.5G
0.5G
Data
Collector


1G


Aug 2013
17
Download from nodes 1 and 4
1G
0.5G
0.5G
0.5G
0.5G
Data
Collector



1G

Aug 2013
18
Download from nodes 2 and 3
1G
0.5G
0.5G
0.5G
0.5G
Data
Collector


1G


Aug 2013
19
Download from nodes 2 and 4
1G
0.5G
0.5G
0.5G
0.5G
Data
Collector



1G

Aug 2013
20
Download from nodes 3 and 4
0.5G
0.5G
0.5G
0.5G

1G
Data
Collector


1G

Aug 2013
21
Zigzag decoding
à la Gollakata and Katabi (2008)
What to solve
for P1 and P2.
P1

P2
P1  P2
P1

P2’
P1  P2’
Aug 2013
kshum
22
Repair of BASIC regenerating code
New
node
XOR
Repair bandwidth=1.5 G


Bitwise shift and XOR


Bitwise shift and XOR
Repair of BASIC regenerating code


Decode the blue
and red packets by
zigzag decoding

Interference alignment

Comparison of the three examples
Repetition
scheme
Reed-Solomon Codes
BASIC regenerating
codes
Storage
efficiency
1/2
1/2
1/2
Reliability
Tolerate one
disk failure
Tolerate two disk
failures
Tolerate two disk
failures
Repair
bandwidth
1G
2G
1.5 G
Finite field arithmetic
Binary addition
and bit-wise shift
Computational Very small
complexity
Aug 2013
kshum
25
Summary
• We can reduce repair bandwidth by network
coding.
• BASIC regenerating codes
– A failed storage node can be repaired by simple
bit-wise shift and XOR operations.
– Small storage overhead due to shifting.
Aug 2013
kshum
26
References
• Piret and Krol, MDS convolution codes, IEEE Trans. of Information Theory,
1983.
• Dimakis, Brighten, Wainwright and Ramchandran, Network coding for
distributed storage systems, INFOCOM, 2007.
• Gollakata and Katabi, Zigzag decoding: combating hidden terminals in
wireless networks, Proc. in the ACM Sigcomm, 2008.
• Qureshi, Foh, and Cai, Optimal solution for the index coding problem using
network coding over GF(2), Proc. IEEE Conf. on Sensor Mesh and Ad Hoc
Comm. and Network, 2012.
• Sung and Gong, A zigzag decodable code with MDS property for
distributed storage systems, Proc. IEEE Symp. on Information Theory, 2013.
• Hou, Shum, Chen and Li, BASIC regenerating code: binary addition and
shift for exact repair, Proc. IEEE Symp. on Information Theory, 2013.
Aug 2013
kshum
27
Two modes of repair
• Exact repair
– The content of the new node is exactly the same
as the content of the failed node
• Functional repair
– only requires that the (n,k) recovery property is
preserved.
Aug 2013
kshum
28