Outline - University of Florida
Download
Report
Transcript Outline - University of Florida
CDA3101
Fall 2013
Computer Storage:
Practical Aspects
6,13 November 2013
Copyright © 2011 Prabhat Mishra
1
Storage Systems
Introduction
Disk Storage
Dependability and Reliability
• I/O Performance
Server Computers
Conclusion
CDA 3101 – Fall 2013
Copyright © 2011 Prabhat Mishra
2
Case for Storage
Shift in focus from computation to communication and
storage of information
“The
Computing Revolution” (1960s to 1980s)
– IBM, Control Data Corp., Cray Research
“The
Information Age” (1990 to today)
– Google, Yahoo, Amazon, …
Storage emphasizes reliability and scalability as well
as cost-performance
Program crash – frustrating
Data loss is unacceptable dependability is key concern
Which software determines HW features?
Operating
System for storage
Compiler for processor
Cost vs Access time in DRAM/Disk
DRAM is 100,000 times faster, and costs 30-150 times more per gigabyte.
Nonvolatile semiconductor storage
– 1000× faster than disk
Smaller, lower power, more robust
But more $/GB (between disk and DRAM)
100×
§6.4 Flash Storage
Flash Storage
Chapter 6 — Storage and O
Hard Disk Drive
6
Seek Time is not Linear in Distance
Requires 3 revolutions to perform 4 reads
(26, 100, 724, 9987)
Requires just 3/4th of a revolution
• RULE OF THUMB: average seek is the time
to access 1/3rd of the number of cylinders
-- it is not linear, accelerate arm,
pause, decelerate, wait for settle time.
-- The average does not work well due to
locality property.
Dependability
Fault: failure of a component
May
or may not lead to system failure
Service accomplishment
Service delivered
as specified
Restoration
Failure
Service interruption
Deviation from
specified service
Dependability Measures
Reliability: mean time to failure (MTTF)
Service interruption: mean time to repair (MTTR)
Mean time between failures (MTBF)
MTBF
= MTTF + MTTR
Availability = MTTF / (MTTF + MTTR)
Improving Availability
Increase
MTTF: fault avoidance, fault tolerance, fault
forecasting
Reduce
MTTR: improved tools and processes for
diagnosis and repair
Disk Access Example
Given
512B sector, 15,000rpm, 4ms average seek
time, 100MB/s transfer rate, 0.2ms controller
overhead, idle disk
Average read time
4ms seek time
+ ½ / (15,000/60) = 2ms rotational latency
+ 512 / 100MB/s = 0.005ms transfer time
+ 0.2ms controller delay
= 6.2ms
If actual average seek time is 1ms
Average read time = 3.2ms
Use Arrays of Small Disks?
Can smaller disks be used to close gap in
performance between disks and CPUs?
Improves throughput, latency may not improve
Conventional:
4 disk
3.5” 5.25”
designs
Low End
Disk Array:
1 disk design
3.5”
10”
14”
High End
Array Reliability
• Reliability of N disks = Reliability of 1 Disk ÷ N
50,000 Hours ÷ 70 disks = 700 hours
Disk system MTTF: Drops from 6 years to 1 month!
• Arrays (w/o redundancy) too unreliable to be used
Hot spares support reconstruction in parallel with
access: very high media availability can be achieved
Redundant Arrays of (Inexpensive) Disks
Files are "striped" across multiple disks
Redundancy yields high data availability
Availability:
service still provided to user, even if
some components failed
Disks will still fail
Contents reconstructed from data redundantly
stored in the array
Capacity
penalty to store redundant information
Bandwidth
penalty to update redundant information
RAID 1: Disk Mirroring/Shadowing
recovery
group
• Each disk is fully duplicated onto its “mirror”
Very high availability can be achieved
• Bandwidth sacrifice on write:
• Logical write = two physical writes
• Reads may be optimized
• Most expensive solution: 100% capacity overhead
RAID 10 vs RAID 01
Striped mirrors
RAID 1 + 0
For example, four pair of disks for four-disk
data
Mirrored stripes
For example, pair of
four disks for
four-disk data
RAID 0 + 1
15
RAID 2
Memory-style error correcting codes in disks
Not used anymore.
Other
RAID organizations are more attractive
16
RAID 3: Parity Disk
10010011
11001101
10010011
...
logical record
Striped physical
records
P
1
0
1
0
0
0
1
1
1
1
0
0
1
1
0
1
P contains sum of
other disks per stripe
mod 2 (“parity”)
If disk fails, subtract
P from sum of other
disks to find missing information
1
0
1
0
0
0
1
1
1
1
0
0
1
1
0
1
Inspiration for RAID 4
RAID 3 relies on parity disk to discover errors
on Read
But every sector has an error detection field
To catch errors on read, rely on error detection
field vs. the parity disk
Allows independent reads to different disks
simultaneously
RAID 4: High I/O Rate Parity
Inside of
5 disks
Example:
small read
D0 & D5,
large write
D12-D15
D0
D1
D2
D3
P
D4
D5
D6
D7
P
D8
D9
D10
D11
P
D12
D13
D14
D15
P
D16
D17
D18
D19
P
D20
D21
D22
D23
P
.
.
.
.
Columns
.
.
.
.
.
.
.
.
.
.
Disk
.
Increasing
Logical
Disk
Address
Stripe
Inspiration for RAID 5
RAID 4 works well for small reads
Small writes (write to one disk):
Option 1: read other data disks, create new sum and write to Parity Disk
Option 2: since P has old sum, compare old data to new data, add the
difference to P
Small writes are limited by Parity Disk: Write to
D0, D5 both also write to P disk
D0
D1
D2
D3
P
D4
D5
D6
D7
P
RAID 5: Distributed Parity
N + 1 disks
Like
RAID 4, but parity blocks distributed
across disks
Avoids parity disk being a bottleneck
Widely used
RAID 6: Recovering from 2 failures
Why > 1 failure recovery?
If
operator accidentally replaces the wrong disk
during a failure
Since disk bandwidth is growing slowly than
disk capacity, the MTTR of a disk is increasing
increases the chances of a 2nd failure during repair
since it takes longer
– 500 GB SATA disk could take 3 hours to read sequentially
reading much more data during reconstruction meant
increasing the chance of an uncorrectable media
failure, which would result in data loss
Increasing
number of disks, use of ATA disks
(slower and larger than SCSI disks).
RAID 6: Recovering from 2 failures
Network Appliance’s row-diagonal parity or RAID-DP
Like the standard RAID schemes, it uses redundant
space based on parity calculation per stripe
Since it is protecting against a double failure, it adds
two check blocks per stripe of data.
If
p+1 disks total, p-1 disks have data
Row parity disk is just like in RAID 4
Even
parity across other data blocks in its stripe
Each block of the diagonal parity disk contains the
even parity of the blocks in the same diagonal
Example p = 5
Row diagonal parity starts by recovering one of the 4 blocks
on the failed disk using diagonal parity
Since each diagonal misses one disk, and all diagonals miss a
different disk, 2 diagonals are only missing 1 block
Once the data for those blocks are recovered, then the
standard RAID recovery scheme can be used to recover two
more blocks in the standard RAID 4 stripes
Process continues until two failed disks are restored
I/O - Introduction
I/O devices can be characterized by
Behavior:
input, output, storage
Partner: human or machine
Data rate: bytes/sec, transfers/sec
I/O bus connections
I/O System Characteristics
Dependability is important
Particularly
for storage devices
Performance measures
Latency
(response time)
Throughput
Desktops
(bandwidth)
& embedded systems
Primary focus is response time & diversity of devices
Servers
Primary focus is throughput & expandability of devices
Typical x86 PC I/O System
I/O Register Mapping
Memory mapped I/O
Registers
are addressed in same space as
memory
Address decoder distinguishes between them
OS uses address translation mechanism to make
them only accessible to kernel
I/O instructions
Separate
instructions to access I/O registers
Can only be executed in kernel mode
Example: x86
Polling
Periodically check I/O status register
If
device ready, do operation
If error, take action
Common in small or low-performance realtime embedded systems
Predictable
timing
Low hardware cost
In other systems, wastes CPU time
Interrupts
When a device is ready or error occurs
Controller
interrupts CPU
Interrupt is like an exception
But
not synchronized to instruction execution
Can invoke handler between instructions
Cause information often identifies the interrupting
device
Priority interrupts
Devices
needing more urgent attention get higher
priority
Can interrupt handler for a lower priority interrupt
I/O Data Transfer
Polling and interrupt-driven I/O
CPU
transfers data between memory and I/O
data registers
Time
consuming for high-speed devices
Direct memory access (DMA)
OS
provides starting address in memory
I/O
controller transfers to/from memory
autonomously
Controller
interrupts on completion or error
Server Computers
Applications are increasingly run on servers
Web
search, office apps, virtual worlds, …
Requires large data center servers
Multiple
processors, networks connections,
massive storage
Space
and power constraints
Server equipment built for 19” racks
Multiples
of 1.75” (1U) high
Rack-Mounted Servers
Sun Fire x4150 1U server
Chapter 6 — Storage and Ot
Sun Fire x4150 1U server
4 cores
each
16 x 4GB =
64GB DRAM
Concluding Remarks
I/O performance measures
Throughput,
response time
Dependability
and cost also important
Buses used to connect CPU, memory,
I/O controllers
Polling,
interrupts, DMA
RAID
Improves
performance and dependability
Please read Sections 6.1 – 6.10 P&H 4th Ed.
THINK: Weekend!!
The best way to predict the future is to create it.
Peter Drucker
36