Leakage Power Reduction of Embedded Memories on FPGAs through Location Assignment Yan Meng, Tim Sherwood, and Ryan Kastner University of California, Santa Barbara Department of.

Download Report

Transcript Leakage Power Reduction of Embedded Memories on FPGAs through Location Assignment Yan Meng, Tim Sherwood, and Ryan Kastner University of California, Santa Barbara Department of.

Leakage Power Reduction of
Embedded Memories on FPGAs
through Location Assignment
Yan Meng, Tim Sherwood, and Ryan Kastner
University of California, Santa Barbara
Department of Electrical & Computer Engineering
ExPRESS Group: http://express.ece.ucsb.edu
Outline

Motivation



The leakage problem of embedded memories on
FPGAs is of growing importance
Synthesis techniques for leakage power
optimization of embedded memories
Conclusions
Motivation

FPGAs are attractive options


High processing power, flexibility, reconfigurability
Power is becoming critical

Why worry about power?


Heat dissipation, portability
Where does power go in CMOS?


Dynamic power consumption

Switching power due to charging and
discharging load capacitors

Short circuit currents between supply rails
when both transistors are on during switching
Leakage power consumption
Leakage Power/Total Power
Technology Scaling and
Leakage Power Dissipation
100%
80%
20nm
60%
40%
20%
15nm
30nm
130nm 70nm
50nm
0%
1999 2001 2003 2005 2007 2009
Year

Leakage is dominating over dynamic power
as technology scales down (improving speed,
transistor density and functionality)
On-chip Memory Leakage Control

Why control leakage through on-chip memory?




Caches on microprocessors




Huge portion of chip area
Leakage is proportional to the number of transistors
Major source of leakage consumption [Roy01, Hu01, Flautner02,Mudge04]
50% 2005 [ITRS 02]
Dynamic reshuffling due to cache replacement policies
Cache hierarchy with data replication
Memories on FPGAs


Configuration SRAMs: not on critical paths, high Vth
Embedded memories


Accesses are usually statically scheduled
Not necessary a part of memory hierarchy with inclusion
(2006) Virtex-5
(2005) Cyclone II
(2005) Spartan-3E
(2004) Stratix II
(2004) Virtex-4 SX
(2004) Virtex-4 FX
(2004) Virtex-4 LX
(2003) Spartan-3/3L
(2002) Stratix
(2002) Stratix GX
(2002) Spartan-IIE
(2002) Cyclone
(2001) Virtex-II Pro
(2001) Virtex-II
(2001) APEX II
(2001) Mercury
(2000) Spartan-II
(2000) Virtex-E EM
(2000) Virtex-E
(2000) ACEX 1K
(1999) APEX 20K
(1998) Spartan/XL
New
Mainstream
Mature/others
(1996) Virtex
(1998) Spartan
(1997) FLEX 6000
(1994) FLEX 10K
Leakage problem of embedded memories is
of growing importance

Embedded memory bits/logic cells > 20x
120
100
80
60
40
20x
20
0
Ratio of Embedded Memory Bits/Logic Cells
Leakage Power Optimization of
Embedded Memories on FPGAs
BRAM Line
BRAM Line
0 time


BRAM Line
Motivating Example
t
0 time
Temporal information
Spatial information
t
0 time
t
Outline


Motivation
Synthesis techniques for leakage power
optimization of embedded memories



Temporal
Temporal + spatial
Conclusions
Temporal Information


Precedence order between variables
Saving power on variables




Keep frequently accessed lines active to ensure
high performance
Turn off lines that are not used for a long time
Use low supply voltage to save power for the rest
Using the generalized model to calculate
maximal leakage power savings for variables
[Meng’ HPCA05]
Definitions – Intervals
access(v)
access(v)
|Ii|
Last use
Time

Live interval


Dead interval
time between two successive accesses to the same
variable v within a memory entry
Dead interval

time before the first access or after the last access to
a variable
Definitions – Operating Modes

Active mode



Power on the whole line
No power saving
Sleep mode [Roy01, Hu01]


Sleep/“turn off” transistors
Lose data
Voltage
Vdd
0
|Ii|
Active
Voltage
Vdd
0

Drowsy mode [Flautner02,Mudge04]



Use low supply voltage to save
power when it is not needed
Preserve data for fast reaccess
Wake up to the high voltage and
return data
s1
Voltage
s2
s3
|Ii|
Sleep
Vdd
Vddlow
0
d1
d2
Drowsy
d3
|Ii|
Choosing Operating Modes
|Ii|



Active mode
Sleep mode
Drowsy mode
?
Inflection Points


Which mode to apply on each interval?
Active-drowsy inflection point a


The least amount of time drowsy mode needs to
save energy a  arg min {EDrowsysaving (t )  0}  d1  d3
Sleep-drowsy inflection point b
t

The time where sleep and drowsy modes
consume the same amount of energy
b  {t : EDrowsy (t )  ESleep (t )}
EDrowsy 
ESleep 
 P (d ) * d
i 1, 2 , 3
L
i
 P (s ) * s
i 1, 2 , 3, 4
L
i
i
i
Selecting Operating Modes
with Inflection Points
I
|I|?
a<|I|≤b
Active
Interval
Active
Mode
Drowsy
Interval
Drowsy
Mode
Sleep
Interval
Sleep
Mode
Optimal Leakage Management Policy



Oracle knowledge of all interval lengths
based on static scheduling
Applying the appropriate operating mode
on each variable interval
Obtaining maximal leakage power saving

Formal proof of the optimality [Meng HPCA’05]
Outline


Motivation
Synthesis for leakage power
optimization of embedded memories



Temporal
Temporal + spatial
Conclusions
Spatial Information
BRAM Line
Spatial layout of data leads to different potentials
of power savings
BRAM Line

0 time
t
One variable per entry
0 time
t
Minimal number of entries
BRAM Line
BRAM Line
BRAM Line
Memory Leakage Optimization Techniques
t
0 time
BRAM Line
0 time
sleep-dead
t
0 time
used-active
t
min-entry
BRAM Line
the state-of-the-art
t
BRAM Line
0 time
0 time
drowsy-long
t
0 time
path-place
t
Location Assignment Schemes (I)
The state of the art: no leakage control
BRAM Line

Full-active
0 time
t
Location Assignment Schemes (II)
Turning off the unused part
BRAM Line

Used-active
0 time
t
Location Assignment Schemes (III)
Packing variables into the minimal number of
entries and turning off the rest
BRAM Line

Min-entry
0 time
t
Location Assignment Schemes (IV)
Min entry + sleep dead intervals
BRAM Line

Sleep-dead
0 time
t
Location Assignment Schemes (V)
Min entry + sleep dead + drowsy long
BRAM Line

Drowsy-long
0 time
t
I1
I2
I3
time
start
I1
end
start
e1
4 entries
Extended DAG Modeling
e2
I3
start
e1
e2
E1
I1
I2
w1
w2
e3 I3 e4
w3
e5
I1
I2
w1
w2
e3 I3 e4
E2 w3
E3
e5
end
end
Temporal information
E4
+Spatial information
Path-place Algorithm

Greedily covering DAG with N node-disjoint
paths. The length of a path indicates the
power saving of a memory entry.




First sort all vertices in topological order
A vertex is covered each time to calculate the longest
path reaching it, iff not adjacent to other nodes
Sum the weights of the final level vertices, edges,
and virtual edges from start to end if k < N
Complexity: O((n+e)*N)
Location Assignment Schemes (VI)
Data layout with leakage awareness
Power savings on unused entries, dead and live intervals
BRAM Line

Path-place
0 time
t
BRAM Line
BRAM Line
BRAM Line
Location Assignment Schemes
t
0 time
BRAM Line
0 time
sleep-dead
t
0 time
used-active
t
min-entry
BRAM Line
the state-of-the-art
t
BRAM Line
0 time
0 time
t
drowsy-long
0 time
path-place
t
Embedded Memory Leakage-aware
Design Flow


Exploring temporal and spatial information
Path traversal and location assignment

Introduced for deciding the best data layout within
embedded memory to achieve the maximal leakage saving
Radix-2 FFT Example
for ( le=4, k=0; k<2; k++) {
le /= 2;
for ( j=0; j<le; j++) {
...
for ( i=j; i<4; i += 2*le) {
...
tmpi = imag[i];
imag[i] += imag[i + le];
ti = tmpi - imag[i + le];
imag[i + le] = ...
}
...
}
...
}
intervals
imag[3]
imag[2]
imag[1]
imag[0]
0
Location
Path traversal
Scheduling
assignment
Compilation
intervals
n=0
imag[3]
imag[2]
for ( le=4, k=0; k<2; k++) {
le /= 2;
for ( j=0; j<le; j++) {
...
for ( i=j; i<4; i += 2*le) {
...
tmpi = imag[i];
n=0
imag[i] += imag[i + le];
n=0
ti = tmpi - imag[i + le];
n=0
imag[i + le] = ...
}
10
20
30
...
}
...
}
n=1
n=1
n=0
n=0
imag[1]
imag[0]
n=1
n=1
n=0
0
10
20
30
n=0
50 time
40
n=1
n=1
n=1
n=1
40
50 time
Empirical Study

Experimental setup

Simulation of a configurable double-port
synchronous RAM with 18K-bits




Read/write ports: both ports can read the same
memory cell simultaneously, but can’t write to
the same location (no write conflict).
Configurable: 1-bit, 2-bit, 4-bit, 9-bit, or 18-bit
eCACTI [Dutt’04]: modeling transistor leakage
DSP benchmarks: dft, idft, fft-2, fft-4, filter, mp
Comparing Different Schemes
Percentage of Power Savings
Full-active
Sleep-dead
OPT
Used-active
Drowsy-long
Min-entry
Path-place
100%
95%
80%
76%
60%
37%
40%
20%
0%
idft
dft
fft-4
fft-2
filter
mp
average
Conclusions



Leakage is dominating dynamic power
as technology scaling trends hold
Leakage problem of embedded memories
is of growing importance
Explored temporal and spatial information
for optimizing leakage power, achieving
significant leakage saving 95%
Backup
Multimedia, Internet,
Cellular Telephony
Won’t work
The machine is too hot.
BATTERY
(50+ lbs)
The battery is too
heavy.
Power Optimization Techniques
Power
Design Time
Non-active
modules
Dynamic
Reduced Vdd
Logic synthesis
Pin ordering
Transistor sizing
Multi-Vdd islands
Path balancing
Tradeoff area for power
Clock/power
gating
Leakage
+ Multi-Vth MTCMOS
(critical/non-critical paths)
Sleep transistors
Multi-Vdd
Variable Vth
Run time
DVS
DFS
(based on workload)
+ Variable Vth
Saving Leakage Power without
Performance Degradation


Deriving the interval lengths with
static scheduling
Scheduling any needed data
just before it is needed

Avoiding any performance impact
The Generalized Model

Parameterized model

Inputs





Wake-up latencies
Interval distribution
Leakage power of each state
Transition energy between
states
Output

Maximal power saving
[Meng HPCA’05]
Active
P(Active)
EAS
EAD
EDA
Drowsy
P(Drowsy)
ESA
Sleep
P(Sleep)
5 entries
Example of path-place
5
4
3
2
1
e1
start
e4
E1
start
I1
w1
e2 E2
end
time
I4
w4
E3
I2
w2
e3
TopList: {I4, I1, I2, I3}
e5
E7
E6
5 entries
E4
E5
0 time
t
I3
w3
e6
end
E8
Outline


Motivation
Synthesis for leakage power
optimization of embedded memories



Temporal
Temporal + spatial
Conclusions