Transcript here
Automatic RAID Construction Ba-Quy Vuong(Bryan) and Yiying Zhang Department of Computer Sciences University of Wisconsin-Madison Outline • • • • • Introduction Architecture and Design Implementation Performance Evaluation Conclusion & Future Work What is RAID? • Redundant Arrays of Inexpensive Disks • Purposes: – Reliability – Performance RAID Implementation • Hardware RAID: – Using dedicated hardware to control the disk array – Host independent • Software RAID: – Using a software layer sitting above the disk drivers to control the disk array – Host dependent Problems with Software RAID • There are many ways to build RAID systems, including: – Different checksum-based schemes – Different parity-based scheme • Not Flexible: Each RAID level requires a specific RAID driver • Not Robust: Writing a new RAID driver is timeconsuming and may have lots of bugs Our Solution: Automatic RAID Construction • Approach: – A way to describe checksum and parity-based schemes – Mapping the specified scheme to a RAID driver • Advantages: – Flexibility – Robustness Outline • • • • • Introduction Architecture and Design Implementation Performance Evaluation Conclusion & Future Work Architecture • Design Consideration – Parity on top of Checksum – Checksum on top of Parity Architecture • Example: – 3-disk RAID 5 – Mirroring checksum Automatic Parity • Goals: Allows any parity scheme • Two data structures – Layout matrix: How blocks are laid out • • • • The whole matrix corresponds to a stripe Each row corresponds to one strip Zeros mean data blocks, ones mean parity blocks Number of columns is the number of disks 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 4-disk RAID 5 0 0 0 1 0 1 0 1 4-disk RAID 4 4-disk RAID 0+1 Automatic Parity • Two data structures – Parity matrix: What data blocks contribute to a parity block • #rows: #parity blocks in one stripe • #columns: #data blocks in one stripe • The element at row i, column j is one means the data block j is used to calculate the parity block i 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 4-disk RAID 5 1 1 1 4-disk RAID 4 1 0 0 1 4-disk RAID 0+1 Automatic Checksum - Goals • • • • • Checksum over data and parity blocks Flexible number of blocks as a checksum unit Flexible checksum size Flexible functions Flexible locations Automatic Checksum - Design • User specified parameters: – # of blocks as a checksum unit – Checksum size for each block – Checksum function • Example: – 3 blocks as a checksum unit – 1 block for checksums • One more level mapping Outline • • • • • Introduction Architecture and Design Implementation Performance Evaluation Conclusion & Future Work Implementation • RAID driver is implemented as a device driver in Linux • Checksums and parities are specified by users • Checksum functions • Provided: sum, hash-based Implementation • Memory-based version – Uses each memory chunk as a disk – Easy to build and debug – No significant effect on the overall code • Disk-based version – Uses real disks – Communicates with disk drivers through bio structure – Problems of synchronization due to asynchronous IOs Outline • • • • • Introduction Architecture and Design Implementation Performance Evaluation Conclusion & Future Work Performance Evaluation: Setup • Host: VMWare, Fedora 8, Intel Core 2 Duo 2.2GHz, 1GB RAM • Memory-based • Simulating disk delay – Each low-level disk read: 15ms – Each low-level disk write: 17ms • Simulating disk failure – Unable to read (20%) – Read inconsistency (20%) Performance Evaluation: Settings • Evaluation settings – With and without reconstruction – Different layouts, parity logics, and checksum functions – Different workloads • Systems: – – – – – System 1: 4-disk no parity, no checksum System 2: 4-disk Raid 0+1 with hash-based checksum System 3: 4-disk Raid 0+1 with sum checksum System 4: 4-disk Raid 5 with hash-based checksum System 5: 4-disk Raid 5 with sum checksum • Workload: – reading, writing 30KB files – mkfs, mount Performance: Avg Read and Write Time Performance: Deviation of Read & Write Performance: Reconstruction Performance: Timeline Outline • • • • • Introduction Architecture and Design Implementation Performance Evaluation Conclusion & Future Work Conclusion • Why automatic RAID? – Flexible vs. fixed raid drivers – Robustness • Approach – Automatic Parity with two matrices – Automatic Checksum with user-defined parameters • Lessons learned – Performance is a big issue – Disk-based RAID is much harder to implement than Memory-based RAID Future Work • • • • Complete the disk-based version Improve the performance Check for input correctness Extend the parity and checksum layers to handle more schemes Questions & Answers