Lect 11: Prediction Intro/Projects

Download Report

Transcript Lect 11: Prediction Intro/Projects

CS 505: Computer Structures
Memory and Disk I/O
Thu D. Nguyen
Spring 2005
Computer Science
Rutgers University
Rutgers University, Spring 2003
1
CS 505: Thu D. Nguyen
Main Memory Background
• Performance of Main Memory:
– Latency: Cache Miss Penalty
» Access Time: time between request and word arrives
» Cycle Time: time between requests
– Bandwidth: I/O & Large Block Miss Penalty (L2)
• Main Memory is DRAM: Dynamic Random Access Memory
– Dynamic since needs to be refreshed periodically (8 ms)
– Addresses divided into 2 halves (Memory as a 2D matrix):
» RAS or Row Access Strobe
» CAS or Column Access Strobe
• Cache uses SRAM: Static Random Access Memory
– No refresh (6 transistors/bit vs. 1 transistor)
– Size: DRAM/SRAM 4-8
– Cost/Cycle time: SRAM/DRAM 8-16
Rutgers University, Spring 2003
2
CS 505: Thu D. Nguyen
DRAM logical organization
(4 Mbit)
11
A0…A10
Column Decoder
…
Sense Amps & I/O
Memory Array
(2,048 x 2,048)
D
Q
Storage
Word Line Cell
Rutgers University, Spring 2003
3
CS 505: Thu D. Nguyen
4 Key DRAM Timing
Parameters
• tRAC: minimum time from RAS line falling to the valid
data output.
– Quoted as the speed of a DRAM when buy
– A typical 4Mb DRAM tRAC = 60 ns
• tRC: minimum time from the start of one row access
to the start of the next.
– tRC = 110 ns for a 4Mbit DRAM with a tRAC of 60 ns
• tCAC: minimum time from CAS line falling to valid
data output.
– 15 ns for a 4Mbit DRAM with a tRAC of 60 ns
• tPC: minimum time from the start of one column
access to the start of the next.
– 35 ns for a 4Mbit DRAM with a tRAC of 60 ns
Rutgers University, Spring 2003
4
CS 505: Thu D. Nguyen
DRAM Performance
• A 60 ns (tRAC) DRAM can
– perform a row access only every 110 ns (tRC)
– perform column access (tCAC) in 15 ns, but time between column
accesses is at least 35 ns (tPC).
» In practice, external address delays and turning around
buses make it 40 to 50 ns
• These times do not include the time to drive the
addresses off the microprocessor nor the memory
controller overhead!
Rutgers University, Spring 2003
5
CS 505: Thu D. Nguyen
DRAM History
• DRAMs: capacity +60%/yr, cost –30%/yr
– 2.5X cells/area, 1.5X die size in 3 years
• ‘98 DRAM fab line costs $2B
– DRAM only: density, leakage v. speed
• Rely on increasing no. of computers & memory per
computer (60% market)
– SIMM or DIMM is replaceable unit
=> computers use any generation DRAM
• Commodity, second source industry
=> high volume, low profit, conservative
– Little organization innovation in 20 years
• Order of importance: 1) Cost/bit 2) Capacity
– First RAMBUS: 10X BW, +30% cost => little impact
Rutgers University, Spring 2003
6
CS 505: Thu D. Nguyen
More esoteric Storage Technologies?
• Tunneling Magnetic Junction RAM (TMJ-RAM):
– Speed of SRAM, density of DRAM, non-volatile
(no refresh)
– New field called “Spintronics”: combination of
quantum spin and electronics
– Same technology used in high-density disk-drives
• MEMs storage devices:
– Large magnetic “sled” floating on top of lots of
little read/write heads
– Micromechanical actuators move the sled back
and forth over the heads
Rutgers University, Spring 2003
7
CS 505: Thu D. Nguyen
MEMS-based Storage
• Magnetic “sled” floats
on array of read/write
heads
– Approx 250 Gbit/in2
– Data rates:
IBM: 250 MB/s w 1000
heads
CMU: 3.1 MB/s w 400
heads
• Electrostatic actuators
move media around to
align it with heads
– Sweep sled ±50m in <
0.5s
• Capacity estimated to
be in the 1-10GB in
10cm2
See Ganger et all: http://www.lcs.ece.cmu.edu/research/MEMS
Rutgers University, Spring 2003
8
CS 505: Thu D. Nguyen
Main Memory Performance
• Simple:
– CPU, Cache, Bus, Memory
same width
(32 or 64 bits)
• Wide:
– CPU/Mux 1 word;
Mux/Cache, Bus, Memory
N words (Alpha: 64 bits &
256 bits; UtraSPARC 512)
• Interleaved:
– CPU, Cache, Bus 1 word:
Memory N Modules
(4 Modules); example is
word interleaved
Rutgers University, Spring 2003
9
CS 505: Thu D. Nguyen
Main Memory Performance
• Timing model (word size is 32 bits)
– 1 to send address,
– 6 access time, 1 to send data
– Cache Block is 4 words
• Simple M.P.
= 4 x (1+6+1) = 32
• Wide M.P.
= 1 + 6 + 1 = 8
• Interleaved M.P. = 1 + 6 + 4x1 = 11
Rutgers University, Spring 2003
10
CS 505: Thu D. Nguyen
How Many Banks?
• Number of banks  Number of clock cycles to
access word in bank
– otherwise will return to original bank before it can have next
word ready
• Increasing DRAM size => fewer chips => harder to
have banks
Rutgers University, Spring 2003
11
CS 505: Thu D. Nguyen
Minimum Memory Size
DRAMs per PC over Time
‘86
1 Mb
32
4 MB
8 MB
DRAM Generation
‘89
‘92
‘96
‘99
‘02
4 Mb 16 Mb 64 Mb 256 Mb 1 Gb
8
16
4
8
16 MB
32 MB
64 MB
2
4
1
8
2
128 MB
4
1
256 MB
8
2
Rutgers University, Spring 2003
12
CS 505: Thu D. Nguyen
Avoiding Bank Conflicts
• Lots of banks
int x[256][512];
for (j = 0; j < 512; j = j+1)
for (i = 0; i < 256; i = i+1)
x[i][j] = 2 * x[i][j];
• Even with 128 banks, since 512 is multiple of 128, conflict on
word accesses
• SW: loop interchange or declaring array not power of 2 (“array
padding”)
• HW: Prime number of banks
–
–
–
–
–
bank number = address mod number of banks
address within bank = address / number of words in bank
modulo & divide per memory access with prime no. banks?
address within bank = address mod number words in bank
bank number? easy if 2N words per bank
Rutgers University, Spring 2003
13
CS 505: Thu D. Nguyen
Fast Memory Systems: DRAM specific
• Multiple CAS accesses: several names (page mode)
– Extended Data Out (EDO): 30% faster in page mode
• New DRAMs to address gap;
what will they cost, will they survive?
– RAMBUS: startup company; reinvent DRAM interface
» Each Chip a module vs. slice of memory
» Short bus between CPU and chips
» Does own refresh
» Variable amount of data returned
» 1 byte / 2 ns (500 MB/s per chip)
– Synchronous DRAM: 2 banks on chip, a clock signal to DRAM, transfer
synchronous to system clock (66 - 150 MHz)
• Niche memory or main memory?
– e.g., Video RAM for frame buffers, DRAM + fast serial output
Rutgers University, Spring 2003
14
CS 505: Thu D. Nguyen
Potential
DRAM Crossroads?
• After 20 years of 4X every 3 years, running into
wall? (64Mb - 1 Gb)
• How can keep $1B fab lines full if buy fewer
DRAMs per computer?
• Cost/bit –30%/yr if stop 4X/3 yr?
• What will happen to $40B/yr DRAM industry?
Rutgers University, Spring 2003
15
CS 505: Thu D. Nguyen
Main Memory Summary
• Wider Memory
• Interleaved Memory: for sequential or
independent accesses
• Avoiding bank conflicts: SW & HW
• DRAM specific optimizations: page mode &
Specialty DRAM
• DRAM future less rosy?
Rutgers University, Spring 2003
16
CS 505: Thu D. Nguyen
Virtual Memory: TB (TLB)
CPU
CPU
VA
VA
VA
VA
Tags
TB
CPU
PA
Tags
$
$
TB
VA
PA
PA
L2 $
TB
$
PA
PA
MEM
MEM
Conventional
Organization
Virtually Addressed Cache
Translate only on miss
Synonym Problem
Rutgers University, Spring 2003
17
MEM
Overlap $ access
with VA translation:
requires $ index to
remain invariant
505: Thu D. Nguyen
acrossCStranslation
2. Fast hits by Avoiding Address
Translation
• Send virtual address to cache? Called Virtually Addressed Cache or
just Virtual Cache vs. Physical Cache
– Every time process is switched logically must flush the cache; otherwise get false
hits
» Cost is time to flush + “compulsory” misses from empty cache
– Dealing with aliases (sometimes called synonyms);
Two different virtual addresses map to same physical address
– I/O must interact with cache, so need virtual address
• Solution to aliases
– One possible solution in Wang et al.’s paper
• Solution to cache flush
– Add process identifier tag that identifies process as well as address within
process: can’t get a hit if wrong process
Rutgers University, Spring 2003
18
CS 505: Thu D. Nguyen
2. Fast Cache Hits by Avoiding
Translation: Process ID impact
• Black is uniprocess
• Light Gray is
multiprocess when
flush cache
• Dark Gray is
multiprocess when
use Process ID tag
• Y axis: Miss Rates
up to 20%
• X axis: Cache size
from 2 KB to 1024
KB
Rutgers University, Spring 2003
19
CS 505: Thu D. Nguyen
2. Fast Cache Hits by Avoiding
Translation: Index with Physical
Portion of Address
• If index is physical part of address, can start tag
access in parallel with translation so that can
compare to physical tag
Page Address
Page Offset
Address Tag
Index
Block Offset
• Limits cache to page size: what if want bigger
caches and uses same trick?
– Higher associativity one solution
Rutgers University, Spring 2003
20
CS 505: Thu D. Nguyen
Alpha 21064
• Separate Instr & Data
TLB & Caches
• TLBs fully associative
• TLB updates in SW
(“Priv Arch Libr”)
• Caches 8KB direct
mapped, write thru
• Critical 8 bytes first
• Prefetch instr. stream
buffer
• 2 MB L2 cache, direct
mapped, WB (off-chip)
• 256 bit path to main
memory, 4 x 64-bit
modules
• Victim Buffer: to give
read priority over
write
• 4 entry write buffer
between D$ & L2$
Rutgers University, Spring 2003
Instr
Data
Write
Buffer
Stream
Buffer
Victim Buffer21
CS 505: Thu D. Nguyen
Alpha Memory Performance:
Miss Rates of SPEC92
1
AlphaSort
Eqntott
Ora
Alvinn
Spice
Miss Rate
0.1
I$
8K
D $ 8K
0.01
L2 2M
0.001
0.0001
Rutgers University, Spring 2003
22
CS 505: Thu D. Nguyen
Alpha CPI Components
CPI
• Instruction stall: branch mispredict (green);
• Data cache (blue); Instruction cache (yellow); L2$ (pink)
Other: compute + reg conflicts, structural conflicts
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
AlphaSort Espresso
Rutgers University, Spring 2003
L2
I$
D$
I Stall
Other
Sc
Mdljsp2
23
Ear
Alvinn
Mdljp2
CS 505: Thu D. Nguyen
Pitfall: Predicting Cache Performance
from Different Prog. (ISA, compiler,
...)
35%
D$, Tom
30%
D: tomcatv
• 4KB Data cache miss
rate 8%,12%, or
25%
28%?
• 1KB Instr cache miss
D$, gcc
20%
rate 0%,3%,or 10%?Miss
Rate
• Alpha vs. MIPS
15% D$, esp
for 8KB Data $:
17% vs. 10%
10%
• Why 2X Alpha v.
MIPS?
I$, gcc
D: gcc
D: espresso
I: gcc
I: espresso
I: tomcatv
5%
0%I$, esp
1
2
I$, Tom
Rutgers University, Spring 2003
24
4
8
16
Cache Size (KB)
32
64
128
CS 505: Thu D. Nguyen
Pitfall: Simulating Too Small an
Address Trace
4.5
4
Cummlati
3.5
ve
3
Average
Memory 2.5
Access
2
Time
1.5
1
I$ = 4 KB, B=16B
0
D$ = 4 KB, B=16B
L2 = 512 KB, B=128B
MP = 12, 200
Rutgers University, Spring 2003
1
2
3
4
5
6
7
8
9 10 11 12
Instructions Executed (billions)
25
CS 505: Thu D. Nguyen
Main Memory Summary
• Wider Memory
• Interleaved Memory: for sequential or
independent accesses
• Avoiding bank conflicts: SW & HW
• DRAM specific optimizations: page mode &
Specialty DRAM
• DRAM future less rosy?
Rutgers University, Spring 2003
26
CS 505: Thu D. Nguyen
Outline
•
•
•
•
•
•
Disk Basics
Disk History
Disk options in 2000
Disk fallacies and performance
Tapes
RAID
Rutgers University, Spring 2003
27
CS 505: Thu D. Nguyen
Disk Device Terminology
Arm Head
Inner Outer
Sector
Track Track
Platter
Actuator
• Several platters, with information recorded magnetically on both
surfaces (usually)
• Bits recorded in tracks, which in turn divided into sectors (e.g.,
512 Bytes)
• Actuator moves head (end of arm,1/surface) over track (“seek”),
select surface, wait for sector rotate under head, then read or
write
–
“Cylinder”: all tracks under heads
Rutgers University, Spring 2003
28
CS 505: Thu D. Nguyen
Photo of Disk Head, Arm, Actuator
Spindle
Arm
Head
Actuator
Platters (12)
Rutgers University, Spring 2003
29
CS 505: Thu D. Nguyen
Disk Device Performance
Outer
Track
Inner Sector
Head Arm Controller
Spindle
Track
Platter
Actuator
• Disk Latency = Seek Time + Rotation Time + Transfer
Time + Controller Overhead
• Seek Time? depends no. tracks move arm, seek speed of disk
• Rotation Time? depends on speed disk rotates, how far sector is
from head
• Transfer Time? depends on data rate (bandwidth) of disk (bit
density), size of request
Rutgers University, Spring 2003
30
CS 505: Thu D. Nguyen
Disk Device Performance
• Average distance sector from head?
• 1/2 time of a rotation
– 7200 Revolutions Per Minute = 120 Rev/sec
– 1 revolution = 1/120 sec = 8.33 milliseconds
– 1/2 rotation (revolution) = 4.16 ms
• Average no. tracks move arm?
– Sum all possible seek distances
from all possible tracks / # possible
» Assumes average seek distance is random
– Disk industry standard benchmark
Rutgers University, Spring 2003
31
CS 505: Thu D. Nguyen
Data Rate: Inner vs. Outer Tracks
• To keep things simple, orginally kept same number
of sectors per track
– Since outer track longer, lower bits per inch
• Competition  decided to keep BPI the same for
all tracks (“constant bit density”)
– More capacity per disk
– More of sectors per track towards edge
– Since disk spins at constant speed, outer tracks have faster
data rate
• Bandwidth outer track 1.7X inner track!
Rutgers University, Spring 2003
32
CS 505: Thu D. Nguyen
Devices: Magnetic Disks
• Purpose:
– Long-term, nonvolatile storage
– Large, inexpensive, slow level in
the storage hierarchy
Track
Sector
• Characteristics:
Cylinder
– Seek Time (~8 ms avg)
•
Transfer rate
–
–
Head
10-30 MByte/sec
Blocks
7200 RPM = 120 RPS => 8 ms per rev
ave rot. latency = 4 ms
128 sectors per track => 0.25 ms per sector
1 KB per sector => 16 MB / s
• Capacity
–
–
Gigabytes
Quadruples every 3 years
(aerodynamics)
Platter
Response time
= Queue + Controller + Seek + Rot + Xfer
Service time
Rutgers University, Spring 2003
33
CS 505: Thu D. Nguyen
Historical Perspective
• 1956 IBM Ramac — early 1970s Winchester
– Developed for mainframe computers, proprietary interfaces
– Steady shrink in form factor: 27 in. to 14 in.
• 1970s developments
– 5.25 inch floppy disk formfactor (microcode into mainframe)
– early emergence of industry standard disk interfaces
» ST506, SASI, SMD, ESDI
• Early 1980s
– PCs and first generation workstations
• Mid 1980s
– Client/server computing
– Centralized storage on file server
» accelerates disk downsizing: 8 inch to 5.25 inch
– Mass market disk drives become a reality
» industry standards: SCSI, IPI, IDE
» 5.25 inch drives for standalone PCs, End of proprietary interfaces
Rutgers University, Spring 2003
34
CS 505: Thu D. Nguyen
Disk History
Data
density
Mbit/sq. in.
Capacity of
Unit Shown
Megabytes
1973:
1. 7 Mbit/sq. in
140 MBytes
1979:
7. 7 Mbit/sq. in
2,300 MBytes
source: New York Times, 2/23/98, page C3,
“Makers of disk drives crowd even more data
into even smaller spaces”
Rutgers University, Spring 2003
35
CS 505: Thu D. Nguyen
Historical Perspective
• Late 1980s/Early 1990s:
– Laptops, notebooks, (palmtops)
– 3.5 inch, 2.5 inch, (1.8 inch formfactors)
– Formfactor plus capacity drives market, not so much
performance
» Recently Bandwidth improving at 40%/ year
– Challenged by DRAM, flash RAM in PCMCIA cards
» still expensive, Intel promises but doesn’t deliver
» unattractive MBytes per cubic inch
– Optical disk fails on performance but finds niche (CD ROM)
Rutgers University, Spring 2003
36
CS 505: Thu D. Nguyen
Disk History
1989:
63 Mbit/sq. in
60,000 MBytes
1997:
1450 Mbit/sq. in
2300 MBytes
1997:
3090 Mbit/sq. in
8100 MBytes
source: New York Times, 2/23/98, page C3,
“Makers of disk drives crowd even mroe data into even smaller spaces”
Rutgers University, Spring 2003
37
CS 505: Thu D. Nguyen
1 inch disk drive!
• 2000 IBM MicroDrive:
– 1.7” x 1.4” x 0.2”
– 1 GB, 3600 RPM,
5 MB/s, 15 ms seek
– Digital camera, PalmPC?
• 2006 MicroDrive?
• 9 GB, 50 MB/s!
– Assuming it finds a niche
in a successful product
– Assuming past trends continue
Rutgers University, Spring 2003
38
CS 505: Thu D. Nguyen
Disk Performance Model /Trends
• Capacity
– + 100%/year (2X / 1.0 yrs)
• Transfer rate (BW)
– + 40%/year (2X / 2.0 yrs)
• Rotation + Seek time
– – 8%/ year (1/2 in 10 yrs)
• MB/$
– > 100%/year (2X / <1.5 yrs)
– Fewer chips + areal density
Rutgers University, Spring 2003
39
CS 505: Thu D. Nguyen
State of the Art: Ultrastar 72ZX
Track
Sector
Cylinder
Track Arm
Platter
Head
Buffer
Latency =
Queuing Time +
Controller time +
per access Seek Time +
+
Rotation Time +
per byte
Size / Bandwidth
{
Rutgers University, Spring 2003
40
– 73.4 GB, 3.5 inch disk
– 2¢/MB
– 10,000 RPM;
3 ms = 1/2 rotation
– 11 platters, 22 surfaces
– 15,110 cylinders
– 7 Gbit/sq. in. areal den
– 17 watts (idle)
– 0.1 ms controller time
– 5.3 ms avg. seek
– 50 to 29 MB/s(internal)
source: www.ibm.com;
www.pricewatch.com; 2/14/00
CS 505: Thu D. Nguyen
Disk Performance Example
• Calculate time to read 1 sector (512B) for
UltraStar 72 using advertised performance; sector
is on outer track
• Disk latency = average seek time + average
rotational delay + transfer time + controller
overhead
• = 5.3 ms + 0.5 * 1/(10000 RPM)
+ 0.5 KB / (50 MB/s) + 0.15 ms
• = 5.3 ms + 0.5 /(10000 RPM/(60000ms/M))
+ 0.5 KB / (50 KB/ms) + 0.15 ms
• = 5.3 + 3.0 + 0.10 + 0.15 ms = 8.55 ms
Rutgers University, Spring 2003
41
CS 505: Thu D. Nguyen
Areal Density
• Bits recorded along a track
– Metric is Bits Per Inch (BPI)
• Number of tracks per surface
– Metric is Tracks Per Inch (TPI)
• Care about bit density per unit area
– Metric is Bits Per Square Inch
– Called Areal Density
– Areal Density = BPI x TPI
Rutgers University, Spring 2003
42
CS 505: Thu D. Nguyen
Areal Density
Year
Areal Density
1.7
1979
7.7
1989
63
1997
3090
2000
17100
100000
10000
1000
Areal Density
1973
100
10
1
1970
1980
1990
2000
Year
– Areal Density = BPI x TPI
– Change slope 30%/yr to 60%/yr about 1991
Rutgers University, Spring 2003
43
CS 505: Thu D. Nguyen
Disk Characteristics in 2000
Seagate
IBM
IBM 1GB
Cheetah
Travelstar
Microdrive
ST173404LC 32GH DJSA - DSCM-11000
Ultra160 SCSI 232 ATA-4
Disk diameter
(inches)
Formatted data
capacity (GB)
Cylinders
3.5
2.5
1.0
73.4
32.0
1.0
14,100
21,664
7,167
Disks
12
4
1
Recording
Surfaces (Heads)
Bytes per sector
24
8
2
512 to 4096
512
512
~ 424
~ 360
~ 140
6.0
14.0
15.2
Avg Sectors per
track (512 byte)
Max. areal
density(Gbit/sq.in.)
Rutgers University, Spring 2003
44
CS 505: Thu D. Nguyen
Disk Characteristics in 2000
Seagate
IBM
IBM 1GB
Cheetah
Travelstar
Microdrive
ST173404LC 32GH DJSA - DSCM-11000
Ultra160 SCSI 232 ATA-4
Rotation speed
(RPM)
Avg. seek ms
(read/write)
Minimum seek
ms (read/write)
Max. seek ms
Data transfer
rate MB/second
Link speed to
buffer MB/s
Power
idle/operating
Watts
Rutgers University, Spring 2003
10033
5411
3600
5.6/6.2
12.0
12.0
0.6/0.9
2.5
1.0
14.0/15.0
23.0
19.0
27 to 40
11 to 21
2.6 to 4.2
160
67
13
16.4 / 23.5
2.0 / 2.6
0.5 / 0.8
45
CS 505: Thu D. Nguyen
Disk Characteristics in 2000
Seagate
IBM
IBM 1GB
Cheetah
Travelstar
Microdrive
ST173404LC 32GH DJSA - DSCM-11000
Ultra160 SCSI 232 ATA-4
Buffer size in MB
4.0
2.0
0.125
Size: height x
width x depth
inches
Weight pounds
1.6 x 4.0 x
5.8
2.00
0.5 x 2.7 x 0.2 x 1.4 x
3.9
1.7
0.34
0.035
Rated MTTF in
powered-on hours
1,200,000
% of POH per
month
% of POH
seeking, reading,
writing
100%
(300,000?) (20K/5 yr
life?)
45%
20%
Rutgers University, Spring 2003
90%
20%
46
20%
CS 505: Thu D. Nguyen
Disk Characteristics in 2000
Seagate
IBM Travelstar
Cheetah
32GH DJSA ST173404LC
232 ATA-4
Ultra160 SCSI
Load/Unload
cycles (disk
powered on/off)
Nonrecoverable
read errors per
bits read
Seek errors
Shock tolerance:
Operating, Not
operating
Vibration
tolerance:
Operating, Not
operating (sine
Rutgers
University,
2003
swept,
0 Spring
to peak)
IBM 1GB Microdri
DSCM-11000
250 per year
300,000
300,000
<1 per 1015
< 1 per 1013
< 1 per 1013
<1 per 107
not available
not available
10 G, 175 G 150 G, 700 G
175 G, 1500 G
5-400 Hz @ 5-500 Hz @ 5-500 Hz @ 1G, 1
0.5G, 22-400 1.0G, 2.5-500
500 Hz @ 5G
Hz @ 2.0G Hz @ 5.0G
47
CS 505: Thu D. Nguyen
Technology Trends
Disk Capacity
now doubles
every
12 months; before
1990 every 36 motnhs
• Today: Processing Power Doubles Every 18 months
• Today: Memory Size Doubles Every 18-24 months(4X/3yr)
The I/O
GAP
• Today: Disk Capacity Doubles Every 12-18 months
• Disk Positioning Rate (Seek + Rotate) Doubles Every Ten Years!
Rutgers University, Spring 2003
48
CS 505: Thu D. Nguyen
Fallacy: Use Data Sheet “Average
Seek” Time
• Manufacturers needed standard for fair comparison
(“benchmark”)
– Calculate all seeks from all tracks, divide by number of seeks =>
“average”
• Real average would be based on how data laid out
on disk, where seek in real applications, then
measure performance
– Usually, tend to seek to tracks nearby, not to random track
• Rule of Thumb: observed average seek time is
typically about 1/4 to 1/3 of quoted seek time
(i.e., 3X-4X faster)
– UltraStar 72 avg. seek: 5.3 ms => 1.7 ms
Rutgers University, Spring 2003
49
CS 505: Thu D. Nguyen
Fallacy: Use Data Sheet Transfer
Rate
• Manufacturers quote the speed of the data rate
off the surface of the disk
• Sectors contain an error detection and correction
field (can be 20% of sector size) plus sector
number as well as data
• There are gaps between sectors on track
• Rule of Thumb: disks deliver about 3/4 of internal
media rate (1.3X slower) for data
• For example, UlstraStar 72 quotes
50 to 29 MB/s internal media rate
• => Expect 37 to 22 MB/s user data rate
Rutgers University, Spring 2003
50
CS 505: Thu D. Nguyen
Disk Performance Example
• Calculate time to read 1 sector for UltraStar 72
again, this time using 1/3 quoted seek time, 3/4 of
internal outer track bandwidth; (8.55 ms before)
• Disk latency = average seek time + average
rotational delay + transfer time + controller
overhead
• = (0.33 * 5.3 ms) + 0.5 * 1/(10000 RPM)
+ 0.5 KB / (0.75 * 50 MB/s) + 0.15 ms
• = 1.77 ms + 0.5 /(10000 RPM/(60000ms/M))
+ 0.5 KB / (37 KB/ms) + 0.15 ms
• = 1.73 + 3.0 + 0.14 + 0.15 ms = 5.02 ms
Rutgers University, Spring 2003
51
CS 505: Thu D. Nguyen
Future Disk Size and Performance
• Continued advance in capacity (60%/yr) and
bandwidth (40%/yr)
• Slow improvement in seek, rotation (8%/yr)
• Time to read whole disk
•
Year Sequentially Randomly
(1 sector/seek)
•
1990 4 minutes 6 hours
•
2000 12 minutes 1 week(!)
• 3.5” form factor make sense in 5-7 yrs?
Rutgers University, Spring 2003
52
CS 505: Thu D. Nguyen
SCSI: Small Computer System
Interface
• Clock rate: 5 MHz / 10 (fast) / 20 (ultra)- 80 MHz (Ultra3)
• Width: n = 8 bits / 16 bits (wide); up to n – 1 devices to
communicate on a bus or “string”
• Devices can be slave (“target”) or master(“initiator”)
• SCSI protocol: a series of “phases”, during which specific
actions are taken by the controller and the SCSI disks
– Bus Free: No device is currently accessing the bus
– Arbitration: When the SCSI bus goes free, multiple devices may request
(arbitrate for) the bus; fixed priority by address
– Selection: informs the target that it will participate (Reselection if
disconnected)
– Command: the initiator reads the SCSI command bytes from host
memory and sends them to the target
– Data Transfer: data in or out, initiator: target
– Message Phase: message in or out, initiator: target (identify,
save/restore data pointer, disconnect, command complete)
– Status Phase: target, just before command complete
Rutgers University, Spring 2003
53
CS 505: Thu D. Nguyen
Use Arrays of Small Disks?
•Katz and Patterson asked in 1987:
•Can smaller disks be used to close gap in
performance between disks and CPUs?
Conventional:
4 disk
3.5” 5.25”
designs
10”
14”
High End
Low End
Disk Array:
1 disk design
3.5”
Rutgers University, Spring 2003
54
CS 505: Thu D. Nguyen
Replace Small Number of Large Disks
with Large Number of Small Disks!
(1988 Disks)
IBM 3390K IBM 3.5" 0061
x70
20 GBytes 320 MBytes 23 GBytes
Capacity
97 cu. ft.
11 cu. ft. 9X
Volume
0.1 cu. ft.
3 KW
1 KW 3X
Power
11 W
15 MB/s
120 MB/s 8X
Data Rate
1.5 MB/s
600 I/Os/s
3900 IOs/s 6X
I/O Rate
55 I/Os/s
250 KHrs
??? Hrs
MTTF
50 KHrs
$250K
$150K
Cost
$2K
Disk Arrays have potential for large data and
I/O rates, high MB per cu. ft., high MB per KW,
but what about reliability?
Rutgers University, Spring 2003
55
CS 505: Thu D. Nguyen
Array Reliability
• Reliability of N disks = Reliability of 1 Disk ÷ N
50,000 Hours ÷ 70 disks = 700 hours
Disk system MTTF: Drops from 6 years to 1 month!
• Arrays (without redundancy) too unreliable to be useful!
Hot spares support reconstruction in parallel with
access: very high media availability can be achieved
Rutgers University, Spring 2003
56
CS 505: Thu D. Nguyen
Redundant Arrays of (Inexpensive)
Disks
• Files are "striped" across multiple disks
• Redundancy yields high data availability
– Availability: service still provided to user, even if some
components failed
• Disks will still fail
• Contents reconstructed from data redundantly
stored in the array
– Capacity penalty to store redundant info
– Bandwidth penalty to update redundant info
Rutgers University, Spring 2003
57
CS 505: Thu D. Nguyen
Redundant Arrays of Inexpensive Disks
RAID 1: Disk Mirroring/Shadowing
recovery
group
• Each disk is fully duplicated onto its “mirror”
Very high availability can be achieved
• Bandwidth sacrifice on write:
Logical write = two physical writes
• Reads may be optimized
• Most expensive solution: 100% capacity overhead
• (RAID 2 not interesting, so skip)
Rutgers University, Spring 2003
58
CS 505: Thu D. Nguyen
Redundant Array of Inexpensive
Disks RAID 3: Parity Disk
10010011
11001101
10010011
...
logical record
P
1
1
0
1
Striped physical
1
0
records
0
0
P contains sum of
0
1
other disks per stripe 0
1
mod 2 (“parity”)
1
0
If disk fails, subtract 1
1
P from sum of other
disks to find missing information
Rutgers University, Spring 2003
59
1
0
1
0
0
0
1
1
1
1
0
0
1
1
0
1
CS 505: Thu D. Nguyen
RAID 3
• Sum computed across recovery group to protect
against hard disk failures, stored in P disk
• Logically, a single high capacity, high transfer rate
disk: good for large transfers
• Wider arrays reduce capacity costs, but decreases
availability
• 33% capacity cost for parity in this configuration
Rutgers University, Spring 2003
60
CS 505: Thu D. Nguyen
Inspiration for RAID 4
• RAID 3 relies on parity disk to discover errors
on Read
• But every sector has an error detection field
• Rely on error detection field to catch errors on
read, not on the parity disk
• Allows independent reads to different disks
simultaneously
Rutgers University, Spring 2003
61
CS 505: Thu D. Nguyen
Redundant Arrays of Inexpensive
Disks RAID 4: High I/O Rate
Increasing
Parity
Insides of
5 disks
Example:
small read
D0 & D5,
large write
D12-D15
Rutgers University, Spring 2003
D0
D1
D2
D3
P
D4
D5
D6
D7
P
D8
D9
D10
D11
P
D12
D13
D14
D15
P
D16
D17
D18
D19
P
D20
D21
D22
D23
P
.
.
.
.
Columns
.
.
.
.
.
.
.
.
.
.
Disk
.
62
Logical
Disk
Address
Stripe
CS 505: Thu D. Nguyen
Inspiration for RAID 5
• RAID 4 works well for small reads
• Small writes (write to one disk):
– Option 1: read other data disks, create new sum and write to
Parity Disk
– Option 2: since P has old sum, compare old data to new data,
add the difference to P
• Small writes are limited by Parity Disk: Write to
D0, D5 both also write to P disk
Rutgers University, Spring 2003
D0
D1
D2
D3
P
D4
D5
D6
D7
P
63
CS 505: Thu D. Nguyen
Redundant Arrays of Inexpensive
Disks RAID 5: High I/O Rate
Interleaved Parity
Independent
writes
possible
because of
interleaved
parity
Example:
write to
D0, D5
uses disks
0, 1, 3, 4
Rutgers University, Spring 2003
D0
D1
D2
D3
P
D4
D5
D6
P
D7
D8
D9
P
D10
D11
D12
P
D13
D14
D15
P
D16
D17
D18
D19
D20
D21
D22
D23
P
.
.
.
.
.
.
.
.
.
.
Disk Columns
.
.
64
.
.
.
Increasing
Logical
Disk
Addresses
CS 505: Thu D. Nguyen
Problems of Disk Arrays:
Small Writes
RAID-5: Small Write Algorithm
1 Logical Write = 2 Physical Reads + 2 Physical Writes
D0'
new
data
D0
D1
D2
D3
old
data (1. Read)
P
old
(2. Read)
parity
+ XOR
+ XOR
(3. Write)
D0'
Rutgers University, Spring 2003
D1
(4. Write)
D2
65
D3
P'
CS 505: Thu D. Nguyen
System Availability: Orthogonal
RAIDs
Array
Controller
String
Controller
. . .
String
Controller
. . .
String
Controller
. . .
String
Controller
. . .
String
Controller
. . .
String
Controller
. . .
Data Recovery Group: unit of data redundancy
Redundant Support Components: fans, power supplies, controller, cables
End to End Data Integrity: internal parity protected data pathsCS 505: Thu D. Nguyen
Rutgers University, Spring 2003
66
System-Level Availability
host
I/O Controller
host
Fully dual redundant
Array Controller
I/O Controller
Array Controller
...
...
...
...
Goal: No Single
Points of
Failure
...
.
.
Recovery
.
Group
Rutgers University, Spring 2003
with duplicated paths, higher performance can be
obtained when there are no failures
67
CS 505: Thu D. Nguyen
Summary: Redundant Arrays
of Disks (RAID) Techniques
• Disk Mirroring, Shadowing (RAID 1)
Each disk is fully duplicated onto its "shadow"
Logical write = two physical writes
100% capacity overhead
• Parity Data Bandwidth Array (RAID 3)
Parity computed horizontally
Logically a single high data bw disk
• High I/O Rate Parity Array (RAID 5)
Interleaved parity blocks
1
0
0
1
0
0
1
1
1
0
0
1
0
0
1
1
1
0
0
1
0
0
1
1
1
1
0
0
1
1
0
1
1
0
0
1
0
0
1
1
0
0
1
1
0
0
1
0
Independent reads and writes
Logical write = 2 reads + 2 writes
Parity + Reed-Solomon codes
Rutgers University, Spring 2003
68
CS 505: Thu D. Nguyen