Estimating Influence of Data Layout Optimizations on SDRAM Energy Consumption

Download Report

Transcript Estimating Influence of Data Layout Optimizations on SDRAM Energy Consumption

Estimating Influence of Data
Layout Optimizations on SDRAM
Energy
Consumption
†
H.S. Kim, †V. Narayanan, †M. Kandemir, ‡E.
Brockmeyer, ‡F. Catthoor, †M.J. Irwin
Aug. , 2003
†Dept.
Computer Science and Engineering
The Pennsylvania State University
‡IMEC, Belgium
Estimating influence of data layout
optimizations on SDRAM energy
 Applications demand much larger memory bandwidth (eg.
Video applications)
 There have been much work on reducing off-chip memory
access frequency by improving local (intermediate) memory
locality
 Locality in SDRAM itself make significant difference on
energy, as well (a page open operation is 6 times more
expensive than a data read operation)
 Estimation of the number of page open operation (page
break) can serve as an energy estimate of various
optimizations
 Data Layout optimization
 Conventional Layout vs. Blocked Layout
2
Preliminaries (SDRAMS)
 Banked architecture
CONTROL
LOGIC
CONTROL
MODE
COMMANDS REGISTER
ROW
BANK 0
ROW
BANK
0
DECOMEMORY
ROW
BANK
0
DECOMEMORY
DER
ROW
ARRAY
DECOBANK
0
MEMORY
DER
ARRAY
DECODER
MEMORY
ARRAY
DER
ARRAY
SENSE AMS
ADDRESS
COLUMN
DECODER
DATA BUFFER
3
Preliminaries (SDRAM operations)
 One operation
tRP
tRCD
CAS latency
command
DQ
D0 D1 D2 D3
Precharge bank 0
Activate bank 0
Read data
 Two consecutive operations to two different rows of one bank
Bank 0 /Page y
tRRD
command
DQ
D0 D1 D2 D3
Bank /Page x
D0
Lost cycles
4
SDRAM energy consumption
E  Estatic  Edata  Eactivation
 Estat _ data  Edata  Eactivation  Estat _ activation
 Estat _ data  D * ed  D / B * Pmiss * (eact  estat _ act )
 Estat _ data  D * ed (1  Pmiss * ( x  y ) / B)
D words, B burst size, Pmiss miss rate,
eact = x*ed, estat_act = y*ed, where eact is energy per activation, ed
energy per data transfer of one word, estat_act static energy per
activation
(Example) Microns 8MB SDRAM,
eact = 13nJ, estat_act = 7nJ, ed = 3.6nJ, x+y ~ 6
5
Page break estimation of data layouts
 Page break estimation can be used to estimate energy
and performance of various optimization techniques
 Estimation should take little time
 In blocked layout, different tile/block sizes/shapes
result in different number of page breaks
Intra page break
Block
Inter page break
Tile size =
Page size
Array
6
Estimation Modeling
 Polyhedral Modeling of page breaks, implemented
using Presburger Formulas
 Valid
Iteration Points
 Lexicographical Ordering
 Data Layouts in Memory
 Mapping Memory Locations to Memory Banks
 Page Break Estimation Model for Blocked Layout
 Implementation
 Omega
Calculator to simplify the models (existential
operators allowed, not possible in Polylib)
 Polylib to count the numbers
7
Intra/Inter page break models for
blocked data layout
 Intra page breaks
( Ru , i )  IntraPageBreakType1( L)  i  I 
b, r : Map ( Lx( Fu(i ), x ), b, r ) 
r ' : (( Map ( Lx( Fu(i )  [b0,0], x ), b, r ' ) 
Map ( Lx( Fu(i )  [0, b1], x ), b, r )) 
( Map ( Lx( Fu(i )  [0, b1], x ), b, r ' ) 
Map ( Lx( Fu(i )  [b0,0], x ), b, r ))) 
(r  r' )
( Ru , i )  IntraPageBreakType2( L)  i  I 
b, r : Map ( Lx( Fu(i ), x ), b, r ) 
r ' , r ' ' : ( Map ( Lx( Fu(i )  [b0,0], x ), b, r ' ) 
Map ( Lx( Fu(i )  [0, b1], x ), b, r ' ' )) 
(r  r'  r' ' )
 Inter page breaks
( Ru , i )  InterPageBreak ( L)  i  I 
b, r : Map ( Lx( Fu(i ), x ), b, r ) 
j , r ' : j  I  ( Rv , j )  ( Ru , i ) 
Map ( Lx( Fu( j )  [b0, b1], x ), b, r ' ) 
(k , w : w  I  ( Rv , j )  ( Rk , w)  ( Ru , i )) 
(r  r' )
8
Experiments
 E_ACT = (IDD0 - IDD3)*Trc*Vdd*Tcycle *#.pagebreaks
 E_STAT = IDD3*Vdd* Tcycle *total_cycles
 Benchmarks
 qsdpcm
(quadtree-structured motion estimation)
 phods (parallel hierarchical motion estimation)
 an edge_detect code from UTDSP benchmark suite
 Various fetch tile/block shapes (set_1, set_2, set_3)
 Architectural assumptions
a
block of data is fetched from SDRAM into local data
memory via Direct Memory Access (ie. software
controlled intermediate memory)
 SDRAM (MICRON’s 8MB/4 banked, 32b bus, 1KB
pages)
9
Experiments
 SDRAM power (& cycle) simulator to compare the
estimates with
C code
ATOMIUM (memory
instrumentation tool)
Memory reference log (addr. size, time)
SDRAM
cycle simulator
#. page activations
MICRON’s SDRAM
Power Calculator
Total Activation energy 10
Results (qsdpcm, simulation)
 Conventional layout shows varying energy numbers
depending on the array size (800X640 vs. 176X144)
 Blocked layout shows no variance on the array size
orig(800X640)
orig(176X144)
block-based
8.E-05
7.E-05
E nergy (J)
6.E-05
5.E-05
4.E-05
3.E-05
2.E-05
1.E-05
11
0.E+00
E_STAT
E_ACT
Results (row-major vs. blocked, phods)
 Estimated numbers match the corresponding
simulated numbers reasonably for both row-major
and blocked layout
35000
est
#. page breaks
30000
avg. sim
25000
20000
15000
10000
5000
0
row-major
set_1
set_2
set_3
12
Results (blocked layout, estimation vs.
simulation)
 Arrays w/ manifest indexes can be estimated without
error (edge_detect)
 Arrays w/ dynamic elements (eg. motion vectors) can be
estimated reasonably (phods, qsdpcm)
 Varying energy numbers depending on block/tile shapes
(set_1 ~ set_3)
qsdpcm
phods
edge_detect
13
Conclusions and Future Work
 Estimation framework tracks page breaks well
 Blocked Layout reduces the number of page breaks
significantly
 Tile/Block shapes should be chosen carefully
 On-going work
 Refinement
of estimation formulas for
conventional/blocked layout of higher order
dimensional arrays
 Automation


Automatic incorporation of omega library and polylib
Automatic code transformation into main memory
efficient data layout for each array
 Exploration
techniques to find optimal data layout
14