Estimating Influence of Data Layout Optimizations on SDRAM Energy Consumption
Download
Report
Transcript Estimating Influence of Data Layout Optimizations on SDRAM Energy Consumption
Estimating Influence of Data
Layout Optimizations on SDRAM
Energy
Consumption
†
H.S. Kim, †V. Narayanan, †M. Kandemir, ‡E.
Brockmeyer, ‡F. Catthoor, †M.J. Irwin
Aug. , 2003
†Dept.
Computer Science and Engineering
The Pennsylvania State University
‡IMEC, Belgium
Estimating influence of data layout
optimizations on SDRAM energy
Applications demand much larger memory bandwidth (eg.
Video applications)
There have been much work on reducing off-chip memory
access frequency by improving local (intermediate) memory
locality
Locality in SDRAM itself make significant difference on
energy, as well (a page open operation is 6 times more
expensive than a data read operation)
Estimation of the number of page open operation (page
break) can serve as an energy estimate of various
optimizations
Data Layout optimization
Conventional Layout vs. Blocked Layout
2
Preliminaries (SDRAMS)
Banked architecture
CONTROL
LOGIC
CONTROL
MODE
COMMANDS REGISTER
ROW
BANK 0
ROW
BANK
0
DECOMEMORY
ROW
BANK
0
DECOMEMORY
DER
ROW
ARRAY
DECOBANK
0
MEMORY
DER
ARRAY
DECODER
MEMORY
ARRAY
DER
ARRAY
SENSE AMS
ADDRESS
COLUMN
DECODER
DATA BUFFER
3
Preliminaries (SDRAM operations)
One operation
tRP
tRCD
CAS latency
command
DQ
D0 D1 D2 D3
Precharge bank 0
Activate bank 0
Read data
Two consecutive operations to two different rows of one bank
Bank 0 /Page y
tRRD
command
DQ
D0 D1 D2 D3
Bank /Page x
D0
Lost cycles
4
SDRAM energy consumption
E Estatic Edata Eactivation
Estat _ data Edata Eactivation Estat _ activation
Estat _ data D * ed D / B * Pmiss * (eact estat _ act )
Estat _ data D * ed (1 Pmiss * ( x y ) / B)
D words, B burst size, Pmiss miss rate,
eact = x*ed, estat_act = y*ed, where eact is energy per activation, ed
energy per data transfer of one word, estat_act static energy per
activation
(Example) Microns 8MB SDRAM,
eact = 13nJ, estat_act = 7nJ, ed = 3.6nJ, x+y ~ 6
5
Page break estimation of data layouts
Page break estimation can be used to estimate energy
and performance of various optimization techniques
Estimation should take little time
In blocked layout, different tile/block sizes/shapes
result in different number of page breaks
Intra page break
Block
Inter page break
Tile size =
Page size
Array
6
Estimation Modeling
Polyhedral Modeling of page breaks, implemented
using Presburger Formulas
Valid
Iteration Points
Lexicographical Ordering
Data Layouts in Memory
Mapping Memory Locations to Memory Banks
Page Break Estimation Model for Blocked Layout
Implementation
Omega
Calculator to simplify the models (existential
operators allowed, not possible in Polylib)
Polylib to count the numbers
7
Intra/Inter page break models for
blocked data layout
Intra page breaks
( Ru , i ) IntraPageBreakType1( L) i I
b, r : Map ( Lx( Fu(i ), x ), b, r )
r ' : (( Map ( Lx( Fu(i ) [b0,0], x ), b, r ' )
Map ( Lx( Fu(i ) [0, b1], x ), b, r ))
( Map ( Lx( Fu(i ) [0, b1], x ), b, r ' )
Map ( Lx( Fu(i ) [b0,0], x ), b, r )))
(r r' )
( Ru , i ) IntraPageBreakType2( L) i I
b, r : Map ( Lx( Fu(i ), x ), b, r )
r ' , r ' ' : ( Map ( Lx( Fu(i ) [b0,0], x ), b, r ' )
Map ( Lx( Fu(i ) [0, b1], x ), b, r ' ' ))
(r r' r' ' )
Inter page breaks
( Ru , i ) InterPageBreak ( L) i I
b, r : Map ( Lx( Fu(i ), x ), b, r )
j , r ' : j I ( Rv , j ) ( Ru , i )
Map ( Lx( Fu( j ) [b0, b1], x ), b, r ' )
(k , w : w I ( Rv , j ) ( Rk , w) ( Ru , i ))
(r r' )
8
Experiments
E_ACT = (IDD0 - IDD3)*Trc*Vdd*Tcycle *#.pagebreaks
E_STAT = IDD3*Vdd* Tcycle *total_cycles
Benchmarks
qsdpcm
(quadtree-structured motion estimation)
phods (parallel hierarchical motion estimation)
an edge_detect code from UTDSP benchmark suite
Various fetch tile/block shapes (set_1, set_2, set_3)
Architectural assumptions
a
block of data is fetched from SDRAM into local data
memory via Direct Memory Access (ie. software
controlled intermediate memory)
SDRAM (MICRON’s 8MB/4 banked, 32b bus, 1KB
pages)
9
Experiments
SDRAM power (& cycle) simulator to compare the
estimates with
C code
ATOMIUM (memory
instrumentation tool)
Memory reference log (addr. size, time)
SDRAM
cycle simulator
#. page activations
MICRON’s SDRAM
Power Calculator
Total Activation energy 10
Results (qsdpcm, simulation)
Conventional layout shows varying energy numbers
depending on the array size (800X640 vs. 176X144)
Blocked layout shows no variance on the array size
orig(800X640)
orig(176X144)
block-based
8.E-05
7.E-05
E nergy (J)
6.E-05
5.E-05
4.E-05
3.E-05
2.E-05
1.E-05
11
0.E+00
E_STAT
E_ACT
Results (row-major vs. blocked, phods)
Estimated numbers match the corresponding
simulated numbers reasonably for both row-major
and blocked layout
35000
est
#. page breaks
30000
avg. sim
25000
20000
15000
10000
5000
0
row-major
set_1
set_2
set_3
12
Results (blocked layout, estimation vs.
simulation)
Arrays w/ manifest indexes can be estimated without
error (edge_detect)
Arrays w/ dynamic elements (eg. motion vectors) can be
estimated reasonably (phods, qsdpcm)
Varying energy numbers depending on block/tile shapes
(set_1 ~ set_3)
qsdpcm
phods
edge_detect
13
Conclusions and Future Work
Estimation framework tracks page breaks well
Blocked Layout reduces the number of page breaks
significantly
Tile/Block shapes should be chosen carefully
On-going work
Refinement
of estimation formulas for
conventional/blocked layout of higher order
dimensional arrays
Automation
Automatic incorporation of omega library and polylib
Automatic code transformation into main memory
efficient data layout for each array
Exploration
techniques to find optimal data layout
14