Gross, Joseph MS Defense - University of Maryland, College

Download Report

Transcript Gross, Joseph MS Defense - University of Maryland, College

High-Performance DRAM
System Design Constraints
and Considerations
by: Joseph Gross
August 2, 2010
Table of Contents

Background
◦ Devices and organizations

DRAM Protocol
◦ Operations and timing constraints
Power Analysis
 Experimental Setup

◦ Policies and Algorithms
Results
 Conclusions
 Appendix

2
What is the Problem?
Controller performance is sensitive to policies and
parameters
 Real simulations show surprising behaviors
 Policies interact in non-trivial and non-linear ways

3
DRAM Devices – 1T1C Cell
Row address is decoded
and chooses the wordline
 Values are sent across
the bitline to the sense
amps
 Very space-efficient but
must be refreshed

bitline
wordline
4
Organization – Rows and Columns
Can only read from/write
to an active row
 Can access row after it is
sensed but before the
data is restored
 Read or write to any
column
column within a row
 Row reuse avoids having
to sense and restore new
rows

row
DRAM Array
active row
Sense Amps
5
DRAM Operation
DRAM Array
DRAM Array
DRAM Array
DRAM Array
DRAM Array
DRAM Array
DRAM Array
DRAM Array
Control Logic
CKE
Row
Address
Select
Mode Register
Row
Row
Latch/
Row
Latch/
Decoder
Row
Latch/
Decoder
2
2
Row
Latch/
Decoder
Row
Latch/
Decoder Row
Latch/
Decoder
Row
Latch/
Decoder
Latch/
Decoder
Sense Amps
Decoder
Sense Amps
Refresh Counter
CLK
1
CS#
Command
Decoder
Sense Amps
Sense Amps
Sense Amps
Sense Amps
Sense Amps
Sense Amps
1
WE#
3
CAS#
Bank
Controller
Input Data
Register
I/O Gating
Write Drivers
Read Latch
RAS#
Output Data
Register
1-3
ADDR
Address
Register
Column
counter
Column
Select
Data I/O
Gating
4
4
DATA
6
Organization
One memory
controller per
channel
 1-4 ranks/DIMM
in a JEDEC
system
 Registered
DIMMs at slower
speeds may
have more
DIMMs/channel
Rank 0
Rank 1
Rank 2
Rank 3

DIMM 0/front
DIMM 1/front
DIMM 1/back
DIMM 1
Channel 0
DIMM 0
Memory
Memory
Controller
Controller 0
DIMM 0/back
7
A Read Cycle
Activate the row and wait for it to be sensed before
issuing the read
 Data begins to be sent after tCAS
 Precharge once the row is restored

time
clock
command
Bank/sense amp
ACT
NOP
NOP
Row sense
Read
NOP
NOP
Pre
Row restore
Bank access
I/O gating
NOP
NOP
ACT
Bank precharge
I/O Gating
data
data data data data
tRCD
tCAS
tBurst
tRAS
tRP
tRC
8
Command Interactions
Commands must wait for resources to be available
 Data, address and command buses must be
available
 Other banks and ranks can affect timing (tRTRS, tFAW)

tCMD + tRP + tRCD
time
clock
command
Read
Pre
NOP
ACT
Bank/sense amp A
Bank read
Bank/sense amp B
Bank precharge
I/O gating
NOP
NOP
Write
NOP
Data sense
tRP
NOP
I/O Gating
data data data data data data data data
tCMD
NOP
Data restore
I/O Gating
data
NOP
tRCD
data data data data data data data data
tCWD
9
Power Modeling
Based on Micron guidelines (TN-41-01)
 Calculates background and event power

time
clock
command
ACT
ACT
Read
NOP
NOP
NOP
NOP
Pre
NOP
NOP
current
Activation current
Read current
Precharge
current
10
Controller Design
Address
Mapping Policy
 Row Buffer
Management
Policy
 Command
Ordering Policy
 Pipelined
operation with
reordering
DRAMsimII

Channel n
Channel 1
Rank 1
Bank n
(row buffer
management policy,
address mapping
CPU/
Network 1
policy)
CPU/
Network 2
Command
queue
Command
queue
Command
queue
Rank 2
Refresh queue
CPU/
Network 3
Bank 2
Bank 1
Command
generator/
scheduler
BIU
Bank n
Bank 2
Bank 1 Command
queue
Decode
delay
Command
queue
Command
queue
CPU/
Network n
Transaction queue
Rank n
Bank n
(Transaction
ordering algorithm,
timing parameters,)
(Command ordering
algorithm)
Bank 2
Bank 1
Command
queue
Command
queue
Command
queue
11
Controller Design
DRAMsimII
Command
queue
Command
queue
Transaction
queue
Command
queue
Command/
address/data
bus
Command
queue
Row Buffer
Management
Policy
Command
queue
Command
Ordering
Algorithm
Command
queue
Refresh
queue
Command
queue
Address
Mapping Policy
Command
queue
Command
queue
Command
queue
12
Transaction Queue
Not varied in this simulation
 Policies

◦ Reads go before writes
◦ Fetches go before reads
◦ Variable number of transactions may be decoded
Optimized to avoid bottlenecks
 Request reordering

13
Row Buffer Management Policy
Close Page
Activate
Read
Pre
Activate
Write
Pre
Write
Pre
Activate
Read
Pre
Open Page
Activate
Read
Activate
Read
Write
Close Page Aggressive
Activate
Read
Write
Pre
Activate
Read
Pre
Activate
Write
Open Page Aggressive
Activate
Read
Write
Pre
Activate
Read
WritePre
Activate
14
Address Mapping Policy
Burger Base (BBM)
row
bank
rank
column
channel
Byte addr

SDRAM High Performance (OPBAS)
row
rank
bank
Column high
channel
Column low
Byte addr

SDRAM Base (SDBAS)
rank
row
bank
Column high
channel
Column low
Byte addr
Intel 845G (845G)
rank

row
bank
column
Byte addr
Chosen to work with row
buffer management
policy
Can either improve row
locality or bank
distribution
Performance depends on
workload
SDRAM Close Page (CPBAS)
row
Column high
rank
bank
channel
Column low
Byte addr
Column low
Byte addr
SDRAM Close Page Baseline Optimized
row high column high
rank
row low
bank
channel
SDRAM Close Page Low Locality (LOLOC)
Column high
row
Column low
bank
rank
channel
Byte addr
SDRAM Close Page High Locality (HILOC)
rank
bank
channel
Column high
row
Column low
Byte addr
15
Address Mapping Policy –
433.calculix
Low Locality (~5s) – irregular distribution
SDRAM Baseline (~3.5s) – more regular distribution
16
Command Ordering Algorithm

Second Level of Command Scheduling
◦ FCFS (FIFO)
◦ Bank Round Robin
◦ Rank Round Robin
◦ Command Pair Rank Hop
◦ First Available (Age)
◦ First Available (Queue)
◦ First Available (RIFF)
DRAMsimII
Command
queue
Command
queue
Transaction
queue
Command
queue
Command/
address/data
bus
Command
queue
Row Buffer
Management
Policy
Command
queue
Command
Ordering
Algorithm
Command
queue
Refresh
queue
Command
queue
Address
Mapping Policy
Command
queue
Command
queue
Command
queue
17
Command Ordering Algorithm – First
Available
Requires tracking of when rank/bank resources are
available
 Evaluates every potential command choice

◦ Age, Queue, RIFF – secondary criteria
CAS
CAS
Time
tBurst
tCAS
tCWD
tRTRS
CASW
Other rank
tCWD
tWTR
tBurst
CASW
18
Results - Bandwidth
19
Results - Latency
20
Results – Execution Time
21
Results - Energy
22
Command Ordering Algorithms
23
Command Ordering Algorithms
24
Conclusions

The right combination of policies can achieve good
latency/bandwidth for a given benchmark
◦ Address mapping policies and row buffer management
policies should be chosen together
◦ Command ordering algorithms become important as the
memory system is heavily loaded
Open Page policies require more energy than Close
Page policies in most conditions
 The extra logic for more complex schemes helps
improve bandwidth but may not be necessary
 Address mapping policies should balance row reuse
and bank distribution to reuse open rows and use
available resources in parallel

25
Appendix
26
Bandwidth (cont.)
27
Row Reuse Rate (cont.)
28
Bandwidth (cont.)
29
Results – Execution Time
30
Results – Row Reuse Rate





Open Page/Open Page Aggressive have the greatest
reuse rate
Close page aggressive rarely exceeds 10% reuse
SDRAM Baseline and SDRAM High Performance
work well with open page
429.mcf has very little ability to reuse rows, 35% at
the most
458.sjeng can reuse 80% with SDRAM Baseline or
SDRAM High Performance, else the rate is very low
31
Execution Time (cont.)
32
Row Reuse Rate (cont.)
33
Average Latency (cont.)
34
Average Latency (cont.)
35
Results - Bandwidth






High Locality is consistently worse than others
Close Page Baseline (Opt) work better with Close
Page (Aggressive)
SDRAM Baseline/High Performance work better with
Open Page (Aggressive)
Greater bandwidth correlates inversely with
execution time – configurations that gave
benchmarks more bandwidth finished sooner
470.lbm (1783%), (1.5s, 5.1GB/s) – (26.8s,
823MB/s)
458.sjeng (120%), (5.18s, 357MB/s) – (6.24s,
285MB/s)
36
Results - Energy


Close Page (Aggressive) generally takes less energy
than Open Page (Aggressive)
The disparity is less for heavy-bandwidth applications
like 470.lbm
◦ Banks are mostly in standby mode

Doubling the number of ranks
◦ Approximately doubles the energy for Open Page (Aggressive)
◦ Increases Close Page (Aggressive) energy by about 50%



Close Page Aggressive can use less energy when row
reuse rates are significant
470.lbm (424%), (1.5s, 12350mJ) – (26.8s, 52410mJ)
458.sjeng (670%), (5.18s, 14013mJ) – (6.24s,
93924mJ)
37
Bandwidth (cont.)
38
Bandwidth (cont.)
39
Results – Average Latency
40
Energy (cont.)
41
Energy (cont.)
42
Average Latency (cont.)
43
Memory System Organization
DRAM Array
DRAM Array
DRAM Array
DRAM Array
DRAM Array
DRAM Array
DRAM Array
DRAM Array
DRAM Array
Command bus
Memory
Controller
Data bus
Address bus
44
Transaction Queue





RIFF or FIFO
Prioritizes read or
fetch
Allows reordering
Increases controller
complexity
Avoids hazards
Incoming Transaction
Queue
Incoming Transaction
Queue
WRITE
WRITE
WRITE
WRITE
WRITE
WRITE
WRITE
WRITE
WRITE
WRITE
WRITE
WRITE
RIFF
WRITE
FETCH
WRITE
FETCH
READ
READ
FETCH
FETCH
READ
READ
READ
READ
FETCH
FETCH
FETCH
FETCH
45
Transaction Queue – Decode Window





Out-of-order
decoding
Avoids queuing
delays
Helps to keep
per-bank queues
full
Increases
controller
complexity
Allows
reordering
Incoming Transaction
Queue
Incoming Transaction
Queue
READ
READ
FETCH
FETCH
READ
READ
READ
READ
WRITE
WRITE
READ
READ
FETCH
READ
WRITE
WRITE
READ
Incoming Transaction
Queue
FETCH
Decode
Window
READ
WRITE
WRITE
READ
READ
Decode
Window
FETCH
Decode
Window
READ
READ
READ
FETCH
FETCH
WRITE
FETCH
FETCH
READ
46
Row Buffer Management Policy

Close Page / Close Page Aggressive
Address
Mapping Policy
Read
Transaction
Row Buffer Management Policy
Close Page
Rank 1
Rank 0
CAS+P
.
.
.
Close Page Aggressive
CAS
Bank 4
.
.
.
RAS
CAS+P
RAS
or
CAS+P
RAS
47
Row Buffer Management Policy

Open Page / Open Page Aggressive
Address
Mapping Policy
Read
Transaction
Row Buffer Management Policy
Open Page
CAS
Rank 1
Rank 0
Pre
.
.
.
or
CAS
Bank 4
.
.
.
CAS
RAS
Pre
Open Page Aggressive
CAS+P
PreRAS
CAS
or
CAS
RAS
Pre
48