Introduction to Computer Systems 15-213/18

Download Report

Transcript Introduction to Computer Systems 15-213/18

Carnegie
Mellon
Saint Louis
University
Storage Technologies and
Memory Hierarchy
CSCI 224 / ECE 317: Computer Architecture
Instructor:
Prof. Jason Fritts
Slides adapted from Bryant & O’Hallaron’s slides
1
Carnegie
Mellon
Saint Louis
University
Storage Technologies and Memory Hierarchy

Storage Technologies and Trends





RAM
Nonvolatile memory
Disk
Trends
Locality, Cache, and Memory Hierarchy to the
Rescue!
2
Carnegie
Mellon
Saint Louis
University
Random-Access Memory (RAM)


Basic storage unit is normally a cell (one bit per cell).
Static RAM (SRAM)
 Each cell stores a bit with a four or six-transistor circuit.
 Retains value indefinitely, as long as it is kept powered.

Dynamic RAM (DRAM)
 Each cell stores bit with a capacitor; one transistor used for access
 Value must be refreshed every 10-100 ms
Trans.
per bit
Access
time
Needs
refresh?
Cost
Applications
SRAM
4 or 6
1X
No
100x
Cache memories
DRAM
1
10X
Yes
1X
Main memories,
frame buffers
3
Carnegie
Mellon
Saint Louis
University
Conventional DRAM Organization

D x W DRAM:
 organized as D supercells of size W bits
 access desired supercell by first sending row address (RAS),
then column address (CAS)
16 x 8 DRAM chip
RAS = 2
CAS = 1
0
2
/
Rows
Memory
controller
supercell
(2,1)
2
3
0
addr
To CPU
1
Cols
1
2
8
/
3
data
supercell
(2,1)
Internal row buffer
4
Carnegie
Mellon
Saint Louis
University
Memory Modules
addr (row = i, col = j)
: supercell (i,j)
DRAM 0
64 MB
memory module
consisting of
eight 8Mx8 DRAMs
DRAM 7
bits
bits
56-63 48-55
63
56 55
48 47
bits
40-47
40 39
bits
32-39
bits
24-31
bits
16-23
32 31
24 23
16 15
bits
8-15
bits
0-7
8 7
64-bit doubleword at main memory address A
0
Memory
controller
64-bit doubleword
5
Carnegie
Mellon
Saint Louis
University
Enhanced DRAMs

Basic DRAM cell has not changed since its invention in 1966
 Commercialized by Intel in 1970

DRAM cores with better interface logic and faster I/O :
 Synchronous DRAM (SDRAM)

Uses a conventional clock signal instead of asynchronous control
 Double data-rate synchronous DRAM (DDR SDRAM)


Double edge clocking sends two bits per cycle per pin
Different types distinguished by size of small prefetch buffer:
– DDR (2 bits), DDR2 (4 bits), DDR3 (8 bits)
DDR RAM standard for most server and desktop systems
6
Carnegie
Mellon
Saint Louis
University
Storage Technologies and Memory Hierarchy

Storage Technologies and Trends





RAM
Nonvolatile memory
Disk
Trends
Locality, Cache, and Memory Hierarchy to the
Rescue!
7
Carnegie
Mellon
Saint Louis
University
Nonvolatile Memories

DRAM and SRAM are volatile memories
 Data lost when powered off

Nonvolatile memories retain value even if powered off





Read-only memory (ROM) – programmed during production
Programmable ROM (PROM) – can be programmed once
Electrically-Eraseable PROM (EEPROM) – electronic erase capability
Flash memory – EEPROMs with partial (sector) erase capability
 Wears out after about 100,000 erasings
Uses for Nonvolatile Memories
 Firmware programs stored in a ROM (BIOS, disk controllers, …)
 Solid state disks (USB sticks, smart phones, tablets, laptops, …)
 Disk caches
8
Carnegie
Mellon
Saint Louis
University
Traditional Bus Structure Connecting
CPU and Memory

A bus is a collection of parallel wires that carry address,
data, and control signals
 bus width in CPU defines architecture width (32-bit, 64-bit, etc.)

Buses are typically shared by multiple devices
CPU chip
Register file
ALU
System bus
Bus interface
I/O
bridge
Memory bus
Main
memory
9
Carnegie
Mellon
Saint Louis
University
Memory Read Transaction (1)

CPU places address A on the memory bus.
CPU chip
Register file
%eax
Load operation: movl A, %eax
ALU
I/O bridge
Bus interface
Main memory
A
x
0
A
10
Carnegie
Mellon
Saint Louis
University
Memory Read Transaction (2)

Main memory reads A from the memory bus, retrieves
word x, and places it on the bus.
CPU chip
Register file
%eax
Load operation: movl A, %eax
ALU
Main memory
I/O bridge
Bus interface
x
x
0
A
11
Carnegie
Mellon
Saint Louis
University
Memory Read Transaction (3)

CPU read word x from the bus and copies it into register
%eax.
CPU chip
Register file
%eax
x
Load operation: movl A, %eax
ALU
I/O bridge
Bus interface
Main memory
x
0
A
12
Carnegie
Mellon
Saint Louis
University
Storage Technologies and Memory Hierarchy

Storage Technologies and Trends





RAM
Nonvolatile memory
Disk
Trends
Locality, Cache, and Memory Hierarchy to the
Rescue!
15
Carnegie
Mellon
Saint Louis
University
What’s Inside A Disk Drive?
Arm
Spindle
Platters
Actuator
SCSI
connector
Electronics
(including a
processor
and memory!)
Image courtesy of Seagate Technology
16
Carnegie
Mellon
Saint Louis
University
Disk Geometry



Disks consist of platters, each with two surfaces.
Each surface consists of concentric rings called tracks.
Each track consists of sectors separated by gaps.
Tracks
Surface
Track k
Gaps
Spindle
Sectors
17
Carnegie
Mellon
Saint Louis
University
Disk Structure - top view of single platter
Surface organized into tracks
Tracks divided into sectors
18
Carnegie
Mellon
Saint Louis
University
Disk Geometry (Muliple-Platter View)

Aligned tracks form a cylinder.
Cylinder k
Surface 0
Platter 0
Surface 1
Surface 2
Platter 1
Surface 3
Surface 4
Platter 2
Surface 5
Spindle
19
Carnegie
Mellon
Saint Louis
University
Disk Capacity

Capacity – maximum number of bits that can be stored.
 Vendors express capacity in units of gigabytes (GB)

Capacity is determined by these technology factors:
 Recording density (bits/in) – number of bits per segment of a track
 Track density (tracks/in) – number of tracks per radial segment
 Areal density (bits/in2) – product of recording and track density

Modern disks partition tracks into disjoint subsets called
recording zones
 Each track in a zone has the same number of sectors, determined
by the circumference of innermost track
 Each zone has a different number of sectors/track
20
Carnegie
Mellon
Saint Louis
University
Computing Disk Capacity
Capacity = (# bytes/sector) x (avg. # sectors/track) x
(# tracks/surface) x (# surfaces/platter) x
(# platters/disk)
Example:
 512 bytes/sector
 300 sectors/track (on average)
 20,000 tracks/surface
 2 surfaces/platter
 5 platters/disk
Capacity = 512 x 300 x 20000 x 2 x 5
= 30,720,000,000
= 30.72 GB
21
Carnegie
Mellon
Saint Louis
University
Disk Operation (Single-Platter View)
The disk surface
spins at a fixed
rotational rate
The read/write head
is attached to the end
of the arm and flies over
the disk surface on
a thin cushion of air.
spindle
spindle
spindle
spindle
By moving radially, the arm can
position the read/write head over
any track.
22
Carnegie
Mellon
Saint Louis
University
Disk Access
Head in position above a track
24
Carnegie
Mellon
Saint Louis
University
Disk Access
Rotation is counter-clockwise
25
Carnegie
Mellon
Saint Louis
University
Disk Access – Read
About to read blue sector
26
Carnegie
Mellon
Saint Louis
University
Disk Access – Read
After BLUE read
After reading blue sector
27
Carnegie
Mellon
Saint Louis
University
Disk Access – Read
After BLUE read
Red request scheduled next
28
Carnegie
Mellon
Saint Louis
University
Disk Access – Seek
After BLUE read
Seek for RED
Seek to red’s track
29
Carnegie
Mellon
Saint Louis
University
Disk Access – Rotational Latency
After BLUE read
Seek for RED
Rotational latency
Wait for red sector to rotate around
30
Carnegie
Mellon
Saint Louis
University
Disk Access – Read
After BLUE read
Seek for RED
Rotational latency
After RED read
Complete read of red
31
Carnegie
Mellon
Saint Louis
University
Disk Access – Service Time Components
After BLUE read
Seek for RED
Rotational latency
After RED read
Data transfer
Seek
Rotational
latency
Data transfer
32
Carnegie
Mellon
Saint Louis
University
Disk Access Time

Average time to access some target sector approximated by :
 Taccess = Tavg seek + Tavg rotation + Tavg transfer

Seek time (Tavg seek)
 Time to position heads over cylinder containing target sector.
 Typical Tavg seek is 3—9 ms

Rotational latency (Tavg rotation)
 Time waiting for first bit of target sector to pass under r/w head.
 Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min
 Typical Tavg rotation = 7200 RPMs

Transfer time (Tavg transfer)
 Time to read the bits in the target sector.
 Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min.
33
Carnegie
Mellon
Saint Louis
University
Disk Access Time Example

Given:
 Rotational rate = 7,200 RPM
 Average seek time = 9 ms.
 Avg # sectors/track = 400.

Derived:
 Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms.
 Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms
 Taccess = 9 ms + 4 ms + 0.02 ms

Important points:
 Access time dominated by seek time and rotational latency.
 First bit in a sector is the most expensive, the rest are free.
 SRAM access time is about 4 ns/doubleword, DRAM about 60 ns


Disk is about 40,000 times slower than SRAM,
and about 2,500 times slower then DRAM
34
Carnegie
Mellon
Saint Louis
University
I/O Bus
CPU chip
Register file
ALU
System bus
Memory bus
Main
memory
I/O
bridge
Bus interface
I/O bus
USB
controller
Graphics
adapter
Mouse Keyboard
Monitor
Disk
controller
Expansion slots for
other devices such
as network adapters.
Disk
35
Carnegie
Mellon
Saint Louis
University
Reading a Disk Sector (1)
CPU chip
Register file
ALU
CPU initiates a disk read by writing a
command, logical block number, and
destination memory address to a port
(address) associated with disk controller.
Main
memory
Bus interface
I/O bus
USB
controller
mouse keyboard
Graphics
adapter
Disk
controller
Monitor
Disk
36
Carnegie
Mellon
Saint Louis
University
Reading a Disk Sector (2)
CPU chip
Register file
ALU
Disk controller reads the sector and
performs a direct memory access
(DMA) transfer into main memory.
Main
memory
Bus interface
I/O bus
USB
controller
Graphics
adapter
Mouse Keyboard
Monitor
Disk
controller
Disk
37
Carnegie
Mellon
Saint Louis
University
Reading a Disk Sector (3)
CPU chip
Register file
ALU
When the DMA transfer completes,
the disk controller notifies the CPU
with an interrupt (i.e., asserts a
special “interrupt” pin on the CPU)
Main
memory
Bus interface
I/O bus
USB
controller
Graphics
adapter
Mouse Keyboard
Monitor
Disk
controller
Disk
38
Carnegie
Mellon
Saint Louis
University
Solid State Disks (SSDs) vs Rotating Disks

Advantages
 No moving parts  faster, less power, more rugged

Disadvantages
 Have the potential to wear out
Mitigated by “wear leveling logic” in flash translation layer
 E.g. Intel X25 guarantees 1 petabyte (1015 bytes) of random
writes before they wear out
 As of 2014, about 15 times more expensive per byte


Applications
 Smart phones and tablets
 Found in some laptops, desktops, and servers
41
Carnegie
Mellon
Saint Louis
University
Storage Technologies and Memory Hierarchy

Storage Technologies and Trends





RAM
Nonvolatile memory
Disk
Trends
Locality, Cache, and Memory Hierarchy to the
Rescue!
42
Carnegie
Mellon
Saint Louis
University
Storage Trends
SRAM
Metric
1980
1985
1990
1995
2000
2005
2010
2010:1980
$/MB
access (ns)
19,200
300
2,900
150
320
35
256
15
100
3
75
2
60
1.5
320
200
Metric
1980
1985
1990
1995
2000
2005
2010
2010:1980
$/MB
access (ns)
typical size (MB)
8,000
375
0.064
880
200
0.256
100
100
4
30
70
16
1
60
64
0.1
50
2,000
0.06
40
8,000
130,000
9
125,000
Metric
1980
1985
1990
1995
2000
2005
2010
2010:1980
$/MB
access (ms)
typical size (MB)
500
87
1
100
75
10
8
28
160
0.30
10
1,000
0.01
8
20,000
0.005
0.0003 1,600,000
4
3
29
160,000 1,500,000 1,500,000
DRAM
Disk
43
Carnegie
Mellon
Saint Louis
University
CPU Clock Rates
Inflection point in computer history
when designers hit the “Power Wall”
1980
1990
1995
2000
2003
2005
2010
8080
386
Pentium
P-III
P-4
Core 2
Core i7
---
Clock
rate (MHz)
1
20
150
600
3300
2000
2500
2500
Cycle
time (ns)
1000
50
6
1.6
0.3
0.50
0.4
2500
1
1
1
1
1
2
4
4
1000
50
6
1.6
0.3
0.25
0.1
10,000
CPU
Cores
Effective
cycle
time (ns)
2010:1980
44
Carnegie
Mellon
Saint Louis
University
The CPU-Memory Gap
The gap
between DRAM, disk, and CPU speeds.
100,000,000.0
Disk
10,000,000.0
1,000,000.0
SSD
100,000.0
Disk seek time
Flash SSD access time
DRAM access time
SRAM access time
CPU cycle time
Effective CPU cycle time
ns
10,000.0
1,000.0
DRAM
100.0
10.0
1.0
CPU
0.1
0.0
1980
1985
1990
1995
2000
Year
2003
2005
2010
45
Carnegie
Mellon
Saint Louis
University
Storage Technologies and Memory Hierarchy

Storage Technologies and Trends





RAM
Nonvolatile memory
Disk
Trends
Locality, Cache, and Memory Hierarchy to the
Rescue!
46
Carnegie
Mellon
Saint Louis
University
Locality and Cache to the Rescue!
The key to bridging this Processor-Memory gap is a
fundamental property of computer programs
known as…
Locality
47
Carnegie
Mellon
Saint Louis
University
Locality

Principle of Locality: Programs tend to use data and
instructions with the same or neighboring addresses as
those they have used recently

Temporal locality:
 Recently referenced items are likely
to be referenced again in the near future

Spatial locality:
 Items with nearby addresses tend
to be referenced close together in time
(will discuss locality in more detail later in semester…)
48
Carnegie
Mellon
Saint Louis
University
Caches
Cache memory takes advantage of locality

Cache:
 A smaller, faster memory device that acts as a staging area for a
subset of the data from a larger, slower device
 that subset of data frequently exhibits good locality

Cache memory can be made very fast using SRAM
 BUT, fast caches are small
 Making caches larger quickly reduces speed

How to get BOTH size and speed?
… the Memory Hierarchy
49
Carnegie
Mellon
Saint Louis
University
Memory Hierarchy
L0:
Registers
L1:
Smaller,
faster,
more expensive
per byte
L2:
CPU registers hold words retrieved
from L1 cache
L1 cache
(SRAM)
L2 cache
(SRAM)
L1 cache holds cache lines retrieved
from L2 cache
L2 cache holds cache lines
retrieved from main memory
L3:
Larger,
slower,
cheaper
per byte
L5:
Main memory
(DRAM)
L4:
Local secondary storage
(local disks)
Main memory holds disk blocks
retrieved from local disks
Local disks hold files
retrieved from disks on
remote network servers
Remote secondary storage
(tapes, distributed file systems, Web servers)
50
Carnegie
Mellon
Saint Louis
University
Caches

Fundamental idea of a memory hierarchy:
 For each k, the faster, smaller device at level k serves as a cache for the
larger, slower device at level k+1.

Why do memory hierarchies work?
 Because of locality, programs tend to access the data at level k more
often than they access the data at level k+1.
 Thus, the storage at level k+1 can be slower, and thus larger and
cheaper per bit.

Big Idea: The memory hierarchy creates a large pool of
storage that costs as much as the cheap storage near the
bottom, but that serves data to programs at the rate of the
fast storage near the top.
51
Carnegie
Mellon
Saint Louis
University
Intel Core i7 Cache Hierarchy
Processor package
Core 0
Core 3
Regs
L1
d-cache
L1 i-cache and d-cache:
32 KB, 8-way,
Access: 4 cycles
Regs
L1
i-cache
…
L2 unified cache
L1
d-cache
L1
i-cache
L2 unified cache
L3 unified cache
(shared by all cores)
L2 unified cache:
256 KB, 8-way,
Access: 11 cycles
L3 unified cache:
8 MB, 16-way,
Access: 30-40 cycles
Block size: 64 bytes for
all caches.
Main memory
52
Carnegie
Mellon
Saint Louis
University
Cache Performance Metrics

Miss Rate
 Fraction of memory references not found in cache (misses / accesses)
= 1 – hit rate
 Typical numbers (in percentages):
 3-10% for L1
 can be quite small (e.g., < 1%) for L2, depending on size, etc.

Hit Time
 Time to deliver a line in the cache to the processor
includes time to determine whether the line is in the cache
 Typical numbers:
 1-2 clock cycle for L1
 5-20 clock cycles for L2


Miss Penalty
 Additional time required because of a miss

typically 50-200 cycles for main memory (Trend: increasing!)
53
Carnegie
Mellon
Saint Louis
University
Lets think about those numbers

Huge difference between a hit and a miss
 Could be 100x, if just L1 and main memory

Would you believe 99% hits is twice as good as 97%?
 Consider:
cache hit time of 1 cycle
miss penalty of 100 cycles
 Average access time:
97% hits: 1 cycle + 0.03 * 100 cycles = 4 cycles
99% hits: 1 cycle + 0.01 * 100 cycles = 2 cycles

This is why “miss rate” is used instead of “hit rate”
54
Carnegie
Mellon
Saint Louis
University
Summary

The speed gap between processor, memory, and mass
storage continues to widen.

Memory hierarchies based on caching close the gap by
exploiting locality.
55