Leakage Power Reduction of Embedded Memories on FPGAs through Location Assignment Yan Meng, Tim Sherwood, and Ryan Kastner University of California, Santa Barbara Department of.
Download
Report
Transcript Leakage Power Reduction of Embedded Memories on FPGAs through Location Assignment Yan Meng, Tim Sherwood, and Ryan Kastner University of California, Santa Barbara Department of.
Leakage Power Reduction of
Embedded Memories on FPGAs
through Location Assignment
Yan Meng, Tim Sherwood, and Ryan Kastner
University of California, Santa Barbara
Department of Electrical & Computer Engineering
ExPRESS Group: http://express.ece.ucsb.edu
Outline
Motivation
The leakage problem of embedded memories on
FPGAs is of growing importance
Synthesis techniques for leakage power
optimization of embedded memories
Conclusions
Motivation
FPGAs are attractive options
High processing power, flexibility, reconfigurability
Power is becoming critical
Why worry about power?
Heat dissipation, portability
Where does power go in CMOS?
Dynamic power consumption
Switching power due to charging and
discharging load capacitors
Short circuit currents between supply rails
when both transistors are on during switching
Leakage power consumption
Leakage Power/Total Power
Technology Scaling and
Leakage Power Dissipation
100%
80%
20nm
60%
40%
20%
15nm
30nm
130nm 70nm
50nm
0%
1999 2001 2003 2005 2007 2009
Year
Leakage is dominating over dynamic power
as technology scales down (improving speed,
transistor density and functionality)
On-chip Memory Leakage Control
Why control leakage through on-chip memory?
Caches on microprocessors
Huge portion of chip area
Leakage is proportional to the number of transistors
Major source of leakage consumption [Roy01, Hu01, Flautner02,Mudge04]
50% 2005 [ITRS 02]
Dynamic reshuffling due to cache replacement policies
Cache hierarchy with data replication
Memories on FPGAs
Configuration SRAMs: not on critical paths, high Vth
Embedded memories
Accesses are usually statically scheduled
Not necessary a part of memory hierarchy with inclusion
(2006) Virtex-5
(2005) Cyclone II
(2005) Spartan-3E
(2004) Stratix II
(2004) Virtex-4 SX
(2004) Virtex-4 FX
(2004) Virtex-4 LX
(2003) Spartan-3/3L
(2002) Stratix
(2002) Stratix GX
(2002) Spartan-IIE
(2002) Cyclone
(2001) Virtex-II Pro
(2001) Virtex-II
(2001) APEX II
(2001) Mercury
(2000) Spartan-II
(2000) Virtex-E EM
(2000) Virtex-E
(2000) ACEX 1K
(1999) APEX 20K
(1998) Spartan/XL
New
Mainstream
Mature/others
(1996) Virtex
(1998) Spartan
(1997) FLEX 6000
(1994) FLEX 10K
Leakage problem of embedded memories is
of growing importance
Embedded memory bits/logic cells > 20x
120
100
80
60
40
20x
20
0
Ratio of Embedded Memory Bits/Logic Cells
Leakage Power Optimization of
Embedded Memories on FPGAs
BRAM Line
BRAM Line
0 time
BRAM Line
Motivating Example
t
0 time
Temporal information
Spatial information
t
0 time
t
Outline
Motivation
Synthesis techniques for leakage power
optimization of embedded memories
Temporal
Temporal + spatial
Conclusions
Temporal Information
Precedence order between variables
Saving power on variables
Keep frequently accessed lines active to ensure
high performance
Turn off lines that are not used for a long time
Use low supply voltage to save power for the rest
Using the generalized model to calculate
maximal leakage power savings for variables
[Meng’ HPCA05]
Definitions – Intervals
access(v)
access(v)
|Ii|
Last use
Time
Live interval
Dead interval
time between two successive accesses to the same
variable v within a memory entry
Dead interval
time before the first access or after the last access to
a variable
Definitions – Operating Modes
Active mode
Power on the whole line
No power saving
Sleep mode [Roy01, Hu01]
Sleep/“turn off” transistors
Lose data
Voltage
Vdd
0
|Ii|
Active
Voltage
Vdd
0
Drowsy mode [Flautner02,Mudge04]
Use low supply voltage to save
power when it is not needed
Preserve data for fast reaccess
Wake up to the high voltage and
return data
s1
Voltage
s2
s3
|Ii|
Sleep
Vdd
Vddlow
0
d1
d2
Drowsy
d3
|Ii|
Choosing Operating Modes
|Ii|
Active mode
Sleep mode
Drowsy mode
?
Inflection Points
Which mode to apply on each interval?
Active-drowsy inflection point a
The least amount of time drowsy mode needs to
save energy a arg min {EDrowsysaving (t ) 0} d1 d3
Sleep-drowsy inflection point b
t
The time where sleep and drowsy modes
consume the same amount of energy
b {t : EDrowsy (t ) ESleep (t )}
EDrowsy
ESleep
P (d ) * d
i 1, 2 , 3
L
i
P (s ) * s
i 1, 2 , 3, 4
L
i
i
i
Selecting Operating Modes
with Inflection Points
I
|I|?
a<|I|≤b
Active
Interval
Active
Mode
Drowsy
Interval
Drowsy
Mode
Sleep
Interval
Sleep
Mode
Optimal Leakage Management Policy
Oracle knowledge of all interval lengths
based on static scheduling
Applying the appropriate operating mode
on each variable interval
Obtaining maximal leakage power saving
Formal proof of the optimality [Meng HPCA’05]
Outline
Motivation
Synthesis for leakage power
optimization of embedded memories
Temporal
Temporal + spatial
Conclusions
Spatial Information
BRAM Line
Spatial layout of data leads to different potentials
of power savings
BRAM Line
0 time
t
One variable per entry
0 time
t
Minimal number of entries
BRAM Line
BRAM Line
BRAM Line
Memory Leakage Optimization Techniques
t
0 time
BRAM Line
0 time
sleep-dead
t
0 time
used-active
t
min-entry
BRAM Line
the state-of-the-art
t
BRAM Line
0 time
0 time
drowsy-long
t
0 time
path-place
t
Location Assignment Schemes (I)
The state of the art: no leakage control
BRAM Line
Full-active
0 time
t
Location Assignment Schemes (II)
Turning off the unused part
BRAM Line
Used-active
0 time
t
Location Assignment Schemes (III)
Packing variables into the minimal number of
entries and turning off the rest
BRAM Line
Min-entry
0 time
t
Location Assignment Schemes (IV)
Min entry + sleep dead intervals
BRAM Line
Sleep-dead
0 time
t
Location Assignment Schemes (V)
Min entry + sleep dead + drowsy long
BRAM Line
Drowsy-long
0 time
t
I1
I2
I3
time
start
I1
end
start
e1
4 entries
Extended DAG Modeling
e2
I3
start
e1
e2
E1
I1
I2
w1
w2
e3 I3 e4
w3
e5
I1
I2
w1
w2
e3 I3 e4
E2 w3
E3
e5
end
end
Temporal information
E4
+Spatial information
Path-place Algorithm
Greedily covering DAG with N node-disjoint
paths. The length of a path indicates the
power saving of a memory entry.
First sort all vertices in topological order
A vertex is covered each time to calculate the longest
path reaching it, iff not adjacent to other nodes
Sum the weights of the final level vertices, edges,
and virtual edges from start to end if k < N
Complexity: O((n+e)*N)
Location Assignment Schemes (VI)
Data layout with leakage awareness
Power savings on unused entries, dead and live intervals
BRAM Line
Path-place
0 time
t
BRAM Line
BRAM Line
BRAM Line
Location Assignment Schemes
t
0 time
BRAM Line
0 time
sleep-dead
t
0 time
used-active
t
min-entry
BRAM Line
the state-of-the-art
t
BRAM Line
0 time
0 time
t
drowsy-long
0 time
path-place
t
Embedded Memory Leakage-aware
Design Flow
Exploring temporal and spatial information
Path traversal and location assignment
Introduced for deciding the best data layout within
embedded memory to achieve the maximal leakage saving
Radix-2 FFT Example
for ( le=4, k=0; k<2; k++) {
le /= 2;
for ( j=0; j<le; j++) {
...
for ( i=j; i<4; i += 2*le) {
...
tmpi = imag[i];
imag[i] += imag[i + le];
ti = tmpi - imag[i + le];
imag[i + le] = ...
}
...
}
...
}
intervals
imag[3]
imag[2]
imag[1]
imag[0]
0
Location
Path traversal
Scheduling
assignment
Compilation
intervals
n=0
imag[3]
imag[2]
for ( le=4, k=0; k<2; k++) {
le /= 2;
for ( j=0; j<le; j++) {
...
for ( i=j; i<4; i += 2*le) {
...
tmpi = imag[i];
n=0
imag[i] += imag[i + le];
n=0
ti = tmpi - imag[i + le];
n=0
imag[i + le] = ...
}
10
20
30
...
}
...
}
n=1
n=1
n=0
n=0
imag[1]
imag[0]
n=1
n=1
n=0
0
10
20
30
n=0
50 time
40
n=1
n=1
n=1
n=1
40
50 time
Empirical Study
Experimental setup
Simulation of a configurable double-port
synchronous RAM with 18K-bits
Read/write ports: both ports can read the same
memory cell simultaneously, but can’t write to
the same location (no write conflict).
Configurable: 1-bit, 2-bit, 4-bit, 9-bit, or 18-bit
eCACTI [Dutt’04]: modeling transistor leakage
DSP benchmarks: dft, idft, fft-2, fft-4, filter, mp
Comparing Different Schemes
Percentage of Power Savings
Full-active
Sleep-dead
OPT
Used-active
Drowsy-long
Min-entry
Path-place
100%
95%
80%
76%
60%
37%
40%
20%
0%
idft
dft
fft-4
fft-2
filter
mp
average
Conclusions
Leakage is dominating dynamic power
as technology scaling trends hold
Leakage problem of embedded memories
is of growing importance
Explored temporal and spatial information
for optimizing leakage power, achieving
significant leakage saving 95%
Backup
Multimedia, Internet,
Cellular Telephony
Won’t work
The machine is too hot.
BATTERY
(50+ lbs)
The battery is too
heavy.
Power Optimization Techniques
Power
Design Time
Non-active
modules
Dynamic
Reduced Vdd
Logic synthesis
Pin ordering
Transistor sizing
Multi-Vdd islands
Path balancing
Tradeoff area for power
Clock/power
gating
Leakage
+ Multi-Vth MTCMOS
(critical/non-critical paths)
Sleep transistors
Multi-Vdd
Variable Vth
Run time
DVS
DFS
(based on workload)
+ Variable Vth
Saving Leakage Power without
Performance Degradation
Deriving the interval lengths with
static scheduling
Scheduling any needed data
just before it is needed
Avoiding any performance impact
The Generalized Model
Parameterized model
Inputs
Wake-up latencies
Interval distribution
Leakage power of each state
Transition energy between
states
Output
Maximal power saving
[Meng HPCA’05]
Active
P(Active)
EAS
EAD
EDA
Drowsy
P(Drowsy)
ESA
Sleep
P(Sleep)
5 entries
Example of path-place
5
4
3
2
1
e1
start
e4
E1
start
I1
w1
e2 E2
end
time
I4
w4
E3
I2
w2
e3
TopList: {I4, I1, I2, I3}
e5
E7
E6
5 entries
E4
E5
0 time
t
I3
w3
e6
end
E8
Outline
Motivation
Synthesis for leakage power
optimization of embedded memories
Temporal
Temporal + spatial
Conclusions