a fast repair code based on regular graphs for distributed storage systems

1

A Fast Repair Code Based on Regular Graphs for Distributed Storage SystemsYan Wang, East China Jiao Tong University

Xin Wang, Fudan University

12/11/2013

2 Outline

Introduction Related work The code framework Performance analysis Conclusion

12/11/2013

3 Introduction

Introducing a class of distributed storage codes, the fast repair codes (FRC).Based on regular graphs.Simple lookup and exact repair.minimum repair bandwidth and low disk I/O overhead.

12/11/2013

4 Outline


12/11/2013

5 Related work

Minimum storage regenerating (MSR). Minimum bandwidth regenerating (MBR). (n, k, f)-SRC code. Twin-MDS codes. Uncoded repair property and fractional repetition codes. Self-repairing code.

12/11/2013

6 Related work─ Regular graph

12/11/2013

7 Related work─ Twin-MDS codes

Partition the n storage nodes into two categories. Consisting of and nodes respectively and encode them using two

different MDS codes. Minimization of storage space. Repair-bandwidth. Low complexity of operation. Fewer disk reads.

But in the worstcase analysis. Handle failure of any n − (2k − 1) nodes with respect to data-construction.

12/11/2013

8 Related work─ Uncoded repair property and fractional repetition codes

Using an outer MDS code followed by an inner repetition code.

Exact repair for the minimum bandwidth regime. Can totally tolerate ρ − 1 node failures.

ρ = 2, achieved based on regular graph.ρ > 2, achieved based on Steiner system.

12/11/2013

9 Related work─ Self-repairing code

Low complexity and bandwidth consumption. Repair one block, often two blocks are enough. Only encoding operation is XOR. However, their code does not satisfy the (n, k)-MDS

property.Not any k storage node can reconstruct the data file.

12/11/2013

10

[1] “Network coding for distributed storage system,” in Proc.IEEE Int. Conf. on Computer Commun, May 2007

[5] “Simple regenerating codes,” in arXiV, Aug 2011, Aug. 2011. [6] “Enabling node repair in any erasure code for distributed storage,” in

ISIT, Sep. 2011. [7] “Fractional repetition codes for repair in distributed storage

systems,” in Proc. Allerton Conf., Sep. 2010. [9] “Self-repairing homomorphic codes for distributed storage systems,”

in INFOCOM, 2011 Proceedings IEEE, april 2011, pp. 1215 –1223.

Related work─ References

12/11/2013

11 Outline

Introduction Related work The code framework

Code constructionConstruction of regular graphExample

Performance analysis Conclusion

12/11/2013

12 The code framework─Code construction

The data file to be stored in n distributed storage nodes. A series of 0-1 bits of length M.

(n’, k’)-MDS code. Partition the file into k’ blocks of equal size. Encode the file into n’ coded blocks.

Choose a d-regular graph G(V,E). Deploy the n’ coded blocks to n storage nodes. n nodes, each node corresponds to a storage node and has d neighbors. Each edge as a coded block, each node stores d coded blocks.

n’ = nd/2.

12/11/2013

13 The code framework─Code construction

When a node fails, we select a newcomer to perform exact repair. As each block is stored in two nodes.

Can be done by retrieving coded block one from each neighbor node in the regular graph.

Need to access several storage nodes to get no less than k’ distinct coded blocks.

The number of distinct coded blocks stored in k storage nodes equals to the number of edges covered by the k nodes in the regular graph.

12/11/2013

14

Let Rc(G, k) denote the minimum number of edges covered by any k nodes in a regular graph G.k’ ≤ Rc(G, k).

Can often get k’ distinct blocks from accessing less than k storage nodes.Because we choose the storage nodes randomly while Rc(G, k)

considers the minimum distinct blocks we can get from k storage nodes.

The code framework─Code construction

12/11/2013

15 The code framework─Construction of regular graph

Show that how to construct a class of regular graphs for any values of (n, k, d) that n is divisible by (d − 1).n = (d − 1)m, the regular graph is formed by d − 1 cycles of length m.Label each node of each cycle from 1 to m.Connect any two nodes having the same label.

Thereby, any d − 1 nodes having the same label forms a complete graph −1.Each node is of degree d.d − 2 edges from the complete graph Kd−1.

12/11/2013

16 The code framework─Example

12/11/2013

17

Consider a DSS with (n = 10, k = 4, d = 3).choose n’ = 15, as there are 15 edges in the regular graph.K’ = 8, as any 4 nodes can cover 8 distinct edges at least.

Use a (15, 8)-MDS code to encode the file into 15 coded blocks.

Each storage node stores 3 blocks corresponding the 3 adjacent edges in graph.

The code framework─Example

12/11/2013

18 Outline

Introduction Related work The code framework Performance analysis

Coding rateOther aspectsTrade strict MDS property for better average performance

Conclusion12/11/2013

19 Performance analysis─Coding rate

Store 2n’ = nd blocks in all, while the file is of size k’ blocks.choose an (nd/2, k’)-MDS code.K’ is no more than Rc(G, k) .

Maximizing the coding rate = maximizing Rc(G, k).However, it is a challenging problem.

12/11/2013

20 Performance analysis─Coding rate

Proposition 1: For general d-regular graph, Rc(G, k) ≥ dk/2.

Proof: As each node has d neighbors and each edge is counted in dk at most twice.

Therefore, let k’ = Rc(G, k), and the coding rate of the FRC code is no less than k/2n.

12/11/2013

21

When a node fails: We retrieve one block from each of its d neighbor nodes to

exactly restore the data.The total repair bandwidth is the same as the storage per nodeThe number of disk access is d.

Only look-up and replications are performed.the lowest coding complexity for repairing.the lowest repair bandwidth as well.

Performance analysis─Other aspects

12/11/2013

22

Not access a fixed number of storage nodes.Keep accessing new storage nodes until we collect

enough distinct coded blocks.no constraint on k’ and thus can start with any given

coding rate.

Performance analysis─Trade strict MDS property for better average performance

12/11/2013

23

Proposition 2: For any d-regular graph, the expected number of edges covered by k nodes is dk(1 - ).Proof: Let denote the expected number of edges

covered twice by the k nodes. i.e., both its ends belong to the k selected nodes.

Then the total number of covered edges is dk − .


12/11/2013

24

Select the k nodes one by one. When selecting the last node, its d neighbors are

independently selected with probability . = + d ・ = 0. = = dk .


12/11/2013

25

We can conclude the results.When k ≤ n/2, the expected number of edges covered

by k storage nodes is greater than dk. Thus, we can set k’ = dk, achieving coding rate .

File can be reconstructed from no more than k storage nodes on average.


12/11/2013

26 Performance analysis─Trade strict MDS property for better average performance

12/11/2013

27 Outline


12/11/2013

28 Conclusion

FRC codes minimizes bandwidth consumption and coding complexity in the repairing process.

Analytical results show that the FRC codes outperform the others in terms of low repair complexity and disk I/O overhead

12/11/2013

29

The challenging issue is the relatively small coding ratesConsidering acceptable as a trade-off for the simple

repairing process As future research

It is challenging to find a class of regular graphs with large coding rates

Conclusion

12/11/2013

a fast repair code based on regular graphs for distributed storage systems

Documents