NCCloud: A Network-Coding-Based Storage System in a Cloud
Download
Report
Transcript NCCloud: A Network-Coding-Based Storage System in a Cloud
NCCloud: A Network-Coding-Based
Storage System in a Cloud-of-Clouds
Henry C. H. Chen
Yuchong Hu
Patrick P. C. Lee
Yang Tang
IEEE Transactions on Computers, 15 August 2013
1
Outline
ﻪ
ﻪ
ﻪ
ﻪ
ﻪ
Introduction
Repair in Multiple Cloud Storage
FMSR Codes
NCCloud
Conclusion
2
Introduction
ﻪCloud storage provides an on-demand
remote backup solution.
ﻪA single cloud storage provider encounters
the problem such as a single point of failure.
3
Introduction
ﻪThe general solution is to distribute data
across different cloud providers.
ﻩstripe data
ﻪThe fault-tolerance can be improved by the
diversity of multiple clouds.
4
Introduction-Data Failure
ﻪThis paper focuses on unexpected
permanent cloud failure.
ﻩa cloud fails permanently => activate repair.
ﻩmaintain data redundancy and fault-tolerance.
ﻪA repair operation
ﻩretrieves data from existing surviving clouds.
ﻩreconstructs the lost data in a new cloud.
5
Introduction-Data Failure
ﻪDuring repair, each surviving node
ﻩencode its stored data chunks.
ﻩsend the encoded chunks to a new node
ﻪRegenerate the lost data.
6
Introduction-Cost Problem
ﻪToday’s cloud storage providers charge
users for outbound data.
ﻪWhile repairing failures, moving the
enormous amount of data (repair traffic) can
introduce significant monetary costs.
7
Introduction-Repair Traffic
Problem
ﻪIn order to minimize repair traffic problem,
regenerating codes [16] have been proposed.
ﻩstore data redundantly in a distributed storage
system.
ﻩrequire less repair traffic, but with the same
fault-tolerance level.
[16] Network Coding for Distributed Storage Systems
8
Introduction-Regenerating Codes
ﻪBut, most existing regenerating codes
require storage nodes
ﻩequip with computation capabilities.
ﻩperform encoding operations during repair.
9
Introduction-Regenerating Codes
ﻪIn order to make regenerating codes
portable to any cloud storage service.
ﻪThis paper considers only a thin-cloud
interface where storage nodes only support
read/write.
10
Introduction-NCCloud
ﻪIn this paper, we present the design and
implementation of NCCloud
ﻩa proxy-based storage system.
ﻩa fault-tolerant storage.
ﻩover multiple cloud storage providers.
11
Introduction-FMSR
ﻪOn top of NCCloud, we propose the
functional minimum-storage regenerating
(FMSR) codes.
ﻪThe FMSR code implementation
ﻩmaintain double-fault tolerance.
ﻩmaintain the same storage cost as in RAID-6
ﻩless repair traffic when recovering a single-cloud failure.
12
Introduction-FMSR
ﻪFMSR codes are non-systematic
ﻩthe encoded chunks was formed by linear
combination of the original data chunks.
ﻩnot keep the original data chunks as in
systematic coding schemes.
13
Outline
ﻪ
ﻪ
ﻪ
ﻪ
ﻪ
Introduction
Repair in Multiple Cloud Storage
FMSR Codes
NCCloud
Conclusion
14
Repair in Multiple Cloud Storage
ﻪTransient failure
ﻩis short-term, such that the failed cloud will
return to normal after some time and no
outsourced data is lost.
15
Repair in Multiple Cloud Storage
ﻪPermanent failure
ﻩis long-term, in the sense that the outsourced
data on a failed cloud will become permanently
unavailable.
ﻩexample :
ﻯdata center outages in disasters.
ﻯdata loss and corruption.
ﻯmalicious attacks.
16
Outline
ﻪIntroduction
ﻪRepair in Multiple Cloud Storage
ﻪFMSR Codes
ﻩMotivation
ﻩImplementation
ﻪNCCloud
ﻪConclusion
17
Motivation
ﻪThis paper considers
ﻩ
ﻩ
ﻩ
ﻩ
distributed
multiple-cloud storage
data is striped
proxy-based design
18
Motivation
19
Fault-tolerant
ﻪMaximum Distance Separable property
( ﻩn, k)-MDS code
ﻯdivide file into equal-size native chunks.
ﻯlinearly combined to form code chunks.
ﻩdistribute over n (larger than k) nodes.
ﻩreconstruct original file from any k of the n
nodes.
ﻩtolerate the failures of any n − k nodes.
20
Fault-tolerant
ﻪThe FMSR codes can reconstruct the data of
failed node from the surviving nodes.
ﻩdownload less data.
ﻩnot reconstruct the whole file.
21
Different Coding Schemes
Storage size 2M
Repair traffic M
Storage size 2M
Repair traffic 0.75M
Storage size 2M
Repair traffic 0.75M
22
Double-fault Tolerant FMSR
Codes
ﻪdivide a file M into 2(n − 2) native chunks.
ﻪgenerate 2n code chunks.
ﻪeach node store two code chunks of size
ﻪrepair a failed node, repair traffic is
ﻪRAID-6 codes, total storage size is
traffic is M.
𝑀
.
2(𝑛−2)
𝑀(𝑛−1)
.
2(𝑛−2) 50%
𝑀𝑛
𝑛−2
saved
, repair
23
Outline
ﻪIntroduction
ﻪRepair in Multiple Cloud Storage
ﻪFMSR Codes
ﻩMotivation
ﻩImplementation
ﻪNCCloud
ﻪConclusion
24
FMSR Codes Implementation
ﻪFMSR codes do not require lost chunks to
be exactly reconstructed
ﻩnot identical to those in the failed node.
ﻪAs long as the MDS property holds.
25
FMSR Codes Implementation
ﻪThis paper propose a two-phase checking
scheme to ensure the code chunks on all
nodes always satisfy the MDS property.
26
FMSR Codes Implementation
ﻪThe implementation assumes a thin-cloud
interface.
1. File upload
2. File download
3. Repair
27
File Upload
ﻪNative chunks :
ﻪCode chunks :
ﻪEncoding matrix of coefficients :
ﻩsize 𝑛 𝑛 − 𝑘 × 𝑘 𝑛 − 𝑘
ﻩin the Galois field GF(pn)
28
File Upload
ﻪGalois field GF(pn)
Encoding coefficient vector
29
File Download
1. Download the k(n−k) code chunks from any k of
the n storage nodes.
2. The ECVs of the k(n−k) code chunks can form a
k(n−k)×k(n−k) square matrix.
3. Obtain the original k(n − k) native chunks.
ﻩmultiply the inverse of the square matrix with the code
chunks.
30
Iterative Repair
ﻪMDS property must hold even after iterative
repairs.
ﻪThis paper proposes a two-phase checking.
ﻩMDS property
ﻩrMDS property
31
Satisfy MDS, but not rMDS
32
Iterative Repair
Step 1. Download the encoding matrix from a surviving node.
Step 2. Select one ECV from each of the n-1 surviving nodes.
Step 3. Generate a repair matrix
.
Step 4. Compute the ECVs for the new code chunks and
reproduce a new encoding matrix.
33
Iterative Repair
Step 5. Given EM’, verify if those properties are satisfied.
ﻩverify MDS by enumerating all
ﻩverify rMDS by n(n−k)n-1
𝑛
𝑘
𝑛
𝑘
.
.
ﻩThe corresponding encoding matrices must form a full rank.
Step 6. Download the actual chunk data and regenerate new
chunk data.
ﻩStep 4 : The new ECVs
ﻩCode chunks from surviving nodes
34
rMDS Sustaining
35
Time of Two-phase Checking
36
Double-fault Tolerant Codes
ﻪMarkov Model
37
MTTDL, Compare to RAID-6
Mean Time To Data Loss
38
Outline
ﻪ
ﻪ
ﻪ
ﻪ
ﻪ
Introduction
Repair in Multiple Cloud Storage
FMSR Codes
NCCloud
Conclusion
39
NCCloud
ﻪA proxy that bridges user applications and
multiple clouds.
ﻪIts design is built on three layers.
ﻩFile system layer
ﻩCoding layer
ﻩStorage layer
40
NCCloud
ﻪIt is mainly implemented in Python, while
the coding schemes are implemented in C
for better efficiency.
41
Goal of NCCloud
ﻪCompare the costs and response time of
using RAID-6 and FMSR codes.
ﻪThe cost advantage of FMSR over RAID-6,
while maintaining acceptable response time.
42
Goal of NCCloud
ﻪNormal operations
ﻩRAID-6 and FMSR incur similar storage costs.
ﻪRepair operation
ﻩFMSR save a significant amount of transfer
costs over RAID-6.
43
Cost Saving-Price
44
Cost Saving
ﻪNormal operations
ﻩ1.25PB of data stored
ﻯFMSR : $86,851 monthly storage cost
ﻯRAID-6 : $86,851 monthly storage cost
ﻪRepair operation
ﻩRAID-6 : 1PB of data, $56,832 Saving of $ 22,938
ﻩFMSR : 0.5625PB of data, $33,894
45
Response Time-Local Cloud
46
Response Time-Local Cloud
47
Response Time-Commerical
Cloud
48
Outline
ﻪ
ﻪ
ﻪ
ﻪ
ﻪ
Introduction
Repair in Multiple Cloud Storage
FMSR Codes
NCCloud
Conclusion
49
Conclusion
ﻪThis paper present NCCloud providing the reliability of
today’s cloud backup storage.
ﻩproxy-based
ﻩmultiple-cloud storage system
ﻪNCCloud not only provides fault tolerance in storage, but
also allows cost-effective repair.
ﻪThe FMSR code implementation eliminates the encoding
requirement of storage nodes during repair.
50