I/O Subsystem - Toronto District School Board

Download Report

Transcript I/O Subsystem - Toronto District School Board

CSE521: Introduction to Computer Architecture
Mazin Yousif
I/O Subsystem
RAID (Redundant Array of Independent Disks)
MSY F02 2
RAID
•
•
•
Improvements in Microprocessor performance (~ 50%) widely exceeds
that of disk access time (~ 10%) - depends on Mechanical System
Improvements in Magnetic Media Densities has also been Slow (~ 20%)
Solution: Disk Arrays: Uses Parallelism between Multiple Disks to
Improve Aggregate I/O Performance
– Disk Arrays stripe data across multiple disks and access them in
parallel
• Capacity Penalty to store redundant data
• Bandwidth Penalty to update it
RAID
MSY F02 3
• Positive Aspects of Disk Arrays:
– Higher data transfer rate on large data accesses
– Higher I/O rates on small data accesses
– Uniform load balancing across all the disks - no hot spots (Hopefully)
• Negative Aspects of Disk Arrays:
– Higher vulnerability to disk failures - Need to employ redundancy in the
form of Error Correcting Code to tolerate failures
•
•
•
Several Data Striping and Redundancy Schemes
Sequential access generates highest data transfer with minimal head positioning
Random accesses generates high I/O rates with lots of head positioning
RAID
MSY F02 4
• Data is Striped for improved performance
– Distributes data over multiple disks to make them appear as a single
fast large disk
– Allows multiple I/Os to be serviced in parallel
• Multiple independent requests serviced in parallel
• A block request may be serviced in parallel by multiple disks
• Data is Redundant for improved reliability
– Large number of disks in an array lowers the reliability of the array
• Reliability of N disks = Reliability of 1 disk /N
• Example:
– 50,000 hours / 70 disks = 700 hours
– Disk System MTTF drops from 6 years to 1 month
– Arrays without redundancy are unreliable to be useful
RAID
MSY F02 5
• RAID 0 (Non-redundant)
–
–
–
–
–
Stripes Data; but does not employ redundancy
Lowest cost of any RAID
Best Write performance - no redundant information
Any single disk failure is catastrophic
Used in environments where performance is more important than
reliability.
RAID
MSY F02 6
Disk 1
Disk 2
Disk 3
Disk 4
D0
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
D15
D16
D17
D18
D19
Stripe Unit
Stripe
RAID
MSY F02 7
• RAID 1 (Mirrored)
– Uses twice as many disks as non-redundant arrays - 100% Capacity
Overhead - Two copies of data are maintained
– Data is simultaneously written to both arrays
– Data is read from the array with shorter queuing, seek and rotation
delays - Best Read Performance.
– When a disk fails, mirrored copy is still available
– Used in environments where availability and performance (I/O rate)
are more important than storage efficiency.
RAID
MSY F02 8
• RAID 2 (Memory Style ECC)
– Uses Hamming code - parity for distinct overlapping subsets of data
– # of redundant disks is proportional to log of total # of disks - Better
for large # of disks - e.g., 4 data disks require 3 redundant disks
– If disk fails, other data in subset is used to regenerate lost data
– Multiple redundant disks are needed to identify faulty disk
RAID
MSY F02 9
• RAID 3 (Bit Interleaved Parity)
–
–
–
–
–
Data is bit -wise over the data disks
Uses Single parity disk to tolerate disk failures - Overhead is 1/N
Logically a single high capacity, high transfer rate disk
Reads access data disks only; Writes access both data and parity disks
Used in environments that require high BW (Scientific, Image
Processing, etc.) , and not high I/O rates
1
0
0
1
0
0
1
1
0
0
1
1
1
0
0
1
0
1
1
1
0
1
1
1
1
1
1
0
1
0
RAID
MSY F02 10
• RAID 4 (Block Interleaved Parity)
– Similar to bit-interleaved parity disk array; except data is blockinterleaved (Striping Units)
– Read requests smaller than one striping unit, access one Striping unit
– Write requests update the data block; and the parity block.
– Generating parity requires 4 I/O accesses (RMW)
– Parity disk gets updates on all writes - a bottleneck
RAID
MSY F02 11
• RAID 5 (Block-Interleaved Distributed Parity)
– Eliminates the parity disk bottleneck in RAID 4 - Distributes parity among
all the disks
– Data is distributed among all disks
– All disks participates in read requests - Better performance than RAID 4
– Write requests update the data block; and the parity block.
– Generating parity requires 4 I/O accesses (RMW)
– Left symmetry v.s. Right Symmetry - Allows each disk to be traversed
once before any disk twice
RAID
MSY F02 12
D0
D1
D2
D3
P
D4
D5
D6
P
D7
D8
D9
P
D10
D11
D12
P
D13
D14
D15
P
D16
D17
D18
D19
Stripe Unit
Stripe
RAID
MSY F02 13
D0’
D0
D1
D2
D3
P
Old Data
1. Read
New
Data
Old Parity
(2. Read)
+
+
3. Write
New Data
D0’
4. Write New
Parity
D1
D2
D3
P’
RAID
MSY F02 14
• RAID 6 (P + Q Redundancy)
–
–
–
–
–
–
Uses Reed-Solomon codes to protect against up to 2 disk failures
Data is distributed among all disks
Two sets of parity P & Q
Write requests update the data block; and the parity blocks.
Generating parity requires 6 I/O accesses (RMW) - update both P & Q
Used in environments that require stringent reliability requirements
RAID
MSY F02 15
Comparisons
• Comparisons
– Read/Write Performance
• RAID 0 provides the best Write performance
• RAID 1 provides the best Read Performance
– Cost - Total # of Disks
•
•
•
•
RAID 1 is most expensive - 100% capacity overhead - 2N Disks
RAID 0 is least expensive - N Disks - no redundancy
RAID 2 needs N + ceiling(log2N) + 1
RAID 3, RAID 4 & RAID 5 needs N + 1 disks
RAID
MSY F02 16
• Preferred Environments
–
–
–
–
–
–
RAID 0: Performance & capacity are more important than reliability
RAID 1: High I/O rate, high availability environments
RAID 2: Large I/O Data Transfer
RAID 3: High BW Applications (Scientific, Image Processing…)
RAID 4: High bit BW Applications
RAID 5 & RAID 6: Mixed Applications
RAID
MSY F02 17
RAID
MSY F02 18
• Performance:
– What metric ?
•
•
•
•
•
IOPS ?
Byte/sec ?
Response Time ?
IOPS per $$ ?
Hybrid ?
– Application Dependent
•
•
•
•
Transaction Processing: IOPS per $$
Scientific Applications: Bytes/sec per $$
File Servers: Both IOPS and Bytes/sec
Time-Sharing Applications: User Capacity per $$
RAID
MSY F02 19
The table below, which shows Throughput per $$ relative to RAID 0,
assumes that G drives in an error correcting group
* RAID 3 performance/cost is always =< RAID 5 performance
RAID Level Small Reads
Small Writes Large Reads Large Writes Storage Efficiency
RAID 0
1
1
RAID 1
1
1/2
RAID 3
1/G
1/G
RAID 5
1
max(1/G,1/4)
RAID 6
1
max(1/G,1/6)
1
1
1
1
1/2
1/2
(G-1)/G
(G-1)/G
1
(G-1)/G
(G-1)/G
1
(G-2)/G
(G-2)/G
(G-1)/G
RAID
MSY F02 20
Performance Issues
• Improving Small Write Performance for RAID 5:
– Writes need 4 I/O accesses; Overhead is emphasized for small writes
• Response time increases by factor of 2; Throughput decreases by factor of 4.
• In contrast, RAID 1 writes require two writes - concurrent - latency may
increase; throughput decreases by factor of 2.
• Three techniques to improve RAID 5 performance
• Buffering & Caching:
– Disk cache (Write buffering) acknowledges the host before data is
written to disk
– Under high load, write backs increase & response time goes back to 4
times RAID 0
– During write back, group sequential writes together
– Keep a copy of old data before writing ==> 3 I/O accesses
– Keep the new parity & new data in cache; Any later updates will require
2 I/O accesses
RAID
MSY F02 21
Performance Issues
• Floating Parity:
– Shortens RMW of small writes to average 1 I/O access
– Clusters parity into cylinders; each containing a track of free blocks
– When new parity needs updating, it is written on the closest unallocated
block following old parity
 New parity update is approximately one read plus 1msec.
– Overhead: Directories for unallocated blocks and parity blocks in a
cache in RAID adapter  Mbytes of memory
– Floating Data??
• Larger directories
• sequential data may become discontinuous on disk
RAID
MSY F02 22
Performance Issues
• Parity Logging:
– Delay writing the new parity
– Create an “update image” - difference between old & new parity - and
store in log file in RAID adapter
– Hopefully, can group several parity blocks when writing back
– Log file is stored in NVRAM - can extend NVRAM to disk space
– Although, may be more I/Os, but efficient since large chunks of data are
processed
– Logging reduces I/O accesses for small writes from 4 to possibly 2+
– Overhead: NVRAM, extra disk space, memory when applying parity
update image to old parity
RAID
MSY F02 23
Hardware v.s. Software RAID
• RAID can be implemented in the OS
– In RAID 1, Hardware RAID allows 100% mirroring. OS implemented
mirroring must distinguish between Master & Slave drives.
• Only master drive has the boot code; if it fails, you can continue work, but
no booting is possible
• Hardware mirroring does not have this drawback
– Since software RAIDs implement standard SCSI, repair functions such
as support for spare drives and hot plug have not been implemented; in
contrast hardware RAID implements various repair functions.
– Hardware RAID improves system performance with its caching system,
especially during high load situations, and synchronization
– Microsoft Windows NT implements RAID 0 and RAID1
RAID
MSY F02 24
• What RAID for which application
– Fast Workstation:
• Caching is important to improve I/O rate
• If large files are installed, then RAID 0 may be necessary
• It is preferred to put the OS and swap files in separate drives from
user drives to minimize movement between swap file area & user area.
– Small Server:
• RAID 1 is preferred
– Mid-Size Server:
• If more capacity is needed, then RAID 5 is recommended
– Large Server: e.g. Database Servers
• RAID 5 is preferred
• Separate different I/Os in mechanically independent arrays; place
index & data files in databases in different arrays
RAID