Activity Estimation for Field

Download Report

Transcript Activity Estimation for Field

An Active Glitch Elimination
Technique for FPGAs
by
Julien Lamoureux, Guy Lemieux,
and Steven J.E. Wilton
University of British Columbia
Funding provided by Altera and NSERC
Context of this Research
• FPGAs use a lot of
power
• Dynamic power
still dominates
Core Dynamic
Core Static
I/O
11%
22%
67%
Source: Altera Stratix II Study (99 Circuits)
2
Overview
• Reducing power in FPGAs by minimizing
glitching
• Using programmable delay circuits to align
arrival times
• Results:
– 18% power savings
– 5% area overhead
– 1% critical-path delay overhead
• No changes to the existing CAD flow
3
What is glitching?
• Unnecessary transitions
• Generated by uneven
arrival times
• Propagated by certain
gates
1
0
b
0
0
a
f
1
0
1
0
1
0
c
1
Glitch generation.
1
0
1
a
1
b
1
c
f
Glitch propagation.
4
How much glitching is there?
Circuit
C1355
C1908
C2670
C3540
C432
C499
C5315
C6288
C7552
C880
alu4
apex2
apex4
des
ex1010
ex5p
misex3
pdc
seq
spla
Average
Switching
Activity
0.319
0.255
0.267
0.419
0.260
0.341
0.400
1.562
0.392
0.232
0.081
0.049
0.044
0.267
0.032
0.168
0.064
0.035
0.048
0.049
-
Functional
Glitching
0.231
0.167
0.208
0.230
0.184
0.232
0.253
0.295
0.228
0.186
0.070
0.042
0.030
0.169
0.015
0.082
0.050
0.024
0.040
0.028
-
0.088
0.088
0.059
0.189
0.076
0.109
0.147
1.267
0.165
0.046
0.011
0.007
0.014
0.098
0.017
0.086
0.013
0.011
0.008
0.021
-
%
Glitching
27.5
34.6
22.2
45.2
29.3
31.9
36.7
81.1
42.0
19.8
13.1
13.7
32.3
36.8
52.9
51.0
20.9
31.8
16.0
42.7
34.1
5
Context of this Research
Circuit
C1355
C1908
C2670
C3540
C432
C499
C5315
C6288
C7552
C880
alu4
apex2
apex4
des
ex1010
ex5p
misex3
pdc
seq
spla
Average
Switching
Activity
0.319
0.255
0.267
0.419
0.260
0.341
0.400
1.562
0.392
0.232
0.081
0.049
0.044
0.267
0.032
0.168
0.064
0.035
0.048
0.049
-
Functional
Glitching
0.231
0.167
0.208
0.230
0.184
0.232
0.253
0.295
0.228
0.186
0.070
0.042
0.030
0.169
0.015
0.082
0.050
0.024
0.040
0.028
-
0.088
0.088
0.059
0.189
0.076
0.109
0.147
1.267
0.165
0.046
0.011
0.007
0.014
0.098
0.017
0.086
0.013
0.011
0.008
0.021
-
%
Glitching
27.5
34.6
22.2
45.2
29.3
31.9
36.7
81.1
42.0
19.8
13.1
13.7
32.3
36.8
52.9
51.0
20.9
31.8
16.0
42.7
34.1
1/3 of Dynamic
Power!
6
Idea
Use programmable delay circuits to lineup
arrival times.
1
0
b
0
0
a
1
f
1
0
1
c
7
Idea
Use programmable delay circuits to lineup
arrival times.
1
0
b
0
0
0
1
0
1
a
b
0
0
f
c
1
1
a
1
f
1
c
8
ASIC vs. FPGA
• ASIC
– Circuit and delays are known before fabrication
– Fixed delay circuits can be used
• FPGA
– Circuit and delays are unknown
– Delay circuits needed to be programmable
– Location of delays must be carefully considered
9
Where should the delays go?
• Option 1: Global Routing
• Option 2: Logic Blocks (LABs)
10
Where in the LAB?
I
I
K
BLE
K
BLE
N
N
K
BLE
Configurable Logic Block (CLB)
11
Where in the LAB?
I
I
K
BLE
K
BLE
N
N
K
BLE
Scheme 1: LUT inputs.
12
Where in the LAB?
I
I
K
BLE
K
BLE
Too
Expensive?
N
N
K
BLE
Scheme 1: LUT inputs.
13
4 Schemes
I
I
K
BLE
K
BLE
N
N
K
I
K
BLE
K
BLE
N
N
K
I
BLE
K
BLE
K
BLE
N
N
BLE
Scheme 1: LUT inputs.
I
I
K
BLE
Scheme 2: LUT inputs + outputs.
I
I
K
BLE
K
BLE
N+B
N
K
BLE
B
Scheme 3: CLB and LUT inputs.
Scheme 4: LUT inputs + bank.
14
Programmable Delay Circuit
2n-1R
2R
R
n
SRAM
in
C
out
n
2n-1R
2R
R
15
Programmable Delay Circuit
Vbp
in
out
2n-1R
Biased PMOS
2R
R
n
SRAM
in
Vbn
in
C
out
n
out
2n-1R
2R
R
Biased NMOS
16
Programmable Delay Circuit
Vbp
in
out
Biased PMOS
Vbp
Vbn
Vbn
in
out
Biasing Circuit
Biased NMOS
17
Calibrating Scheme 1
Calibrating Scheme 1
Three parameters:
1. Number of delay circuits per LUT
2. Maximum delay of the delay circuit
3. Minimum delay of the delay circuit
19
Number of delay circuits per LUT
I
I
K
BLE
K
BLE
N
N
K
BLE
BLE
Scheme 1: LUT inputs.
20
Number of delay circuits per LUT
I
I
K
BLE
K
BLE
N
N
K
BLE
BLE
Scheme 1: LUT inputs.
21
Number of delay circuits per LUT
I
I
K
BLE
K
BLE
N
N
K
BLE
BLE
Scheme 1: LUT inputs.
22
Number of delay circuits per LUT
I
I
K
BLE
K
BLE
N
N
K
BLE
BLE
Scheme 1: LUT inputs.
23
Number of delay circuits per LUT
K-1
100
% Glitch Elimination
80
4-LUT
60
5-LUT
6-LUT
40
20
0
0
1
5
4
3
2
# Inputs with Delay Circuitry
6
7
24
Maximum Delay
Pulse Width Distribution (C6288)
18
16
% of Pulses
14
12
10
8
6
4
2
0
0 1 2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Pulse Width (ns)
4ns
25
Maximum Delay
Pulse Width Distribution (C6288)
18
16
% of Pulses
14
12
10
8
6
4
2
0
0 1 2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Pulse Width (ns)
6ns
26
Maximum Delay
Pulse Width Distribution (C6288)
18
16
% of Pulses
14
12
10
8
6
4
2
0
0 1 2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Pulse Width (ns)
8ns
27
Maximum Delay
8ns
100
% Glitch Elimination
80
K=4
K=5
60
k=6
40
20
0
0
2
4
6
8
10
12
Maximum Delay (ns)
28
Minimum Delay Increment
1.0
0.9
Normalized Power
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
200
400
600
800
1000
Pulse Width (ps)
100ps
29
Minimum Delay Increment
1.0
0.9
Normalized Power
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
200
400
600
800
1000
Pulse Width (ps)
200ps
30
Minimum Delay Increment
1.0
0.9
Normalized Power
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
200
400
600
800
1000
Pulse Width (ps)
300ps
31
Minimum Delay Increment
250ps
100
% Glitch Elimination
80
K=4
K=5
60
k=6
40
20
0
0
0.5
1
1.5
2
2.5
3
Minimum Delay Increment (ns)
32
Glitch Elimination Results
% Glitch Elimination
Circuits
C1355
C1908
C2670
C3540
C432
C499
C5315
C6288
C7552
C880
alu4
apex2
apex4
des
ex1010
ex5p
misex3
pdc
seq
spla
Average
Scheme 1
92.5
91.2
96.3
84.7
62.8
94.5
83.9
62.7
89.6
84.6
96.6
97.0
95.0
86.4
89.6
82.7
97.2
89.3
96.9
95.9
88.5
33
Glitch Elimination Results
Circuits
C1355
C1908
C2670
C3540
C432
C499
C5315
C6288
C7552
C880
alu4
apex2
apex4
des
ex1010
ex5p
misex3
pdc
seq
spla
Average
Scheme 1
92.5
91.2
96.3
84.7
62.8
94.5
83.9
62.7
89.6
84.6
96.6
97.0
95.0
86.4
89.6
82.7
97.2
89.3
96.9
95.9
88.5
% Glitch Elimination
Scheme 2
Scheme 3
92.2
89.6
88.4
74.2
90.5
77.3
73.3
74.4
65.8
63.4
92.5
94.5
71.5
69.6
59.2
56.6
76.3
77.8
80.5
81.1
85.6
88.4
86.5
85.9
91.6
88.2
70.3
75.6
83.0
79.9
74.5
71.4
88.8
87.9
75.6
77.8
88.9
91.5
90.6
91.7
81.3
79.8
Scheme 4
92.2
93.2
96.1
83.9
62.0
94.1
82.3
48.8
90.0
75.7
96.2
97.0
95.0
87.2
89.4
82.2
97.0
87.1
96.7
95.9
87.1
34
Overhead
Overhead
Area
• count minimum width transistor areas
• 5.3 %
36
Overhead
Area
• count minimum width transistor areas
• 5.3 %
Tcrit
• VPR delay + HSPICE delay circuit
• 0.21 %
BLE
37
Overhead
Area
• count minimum width transistor areas
• 5.3 %
Tcrit
• VPR delay + HSPICE delay circuit
• 0.21 %
Power
• VPR power + HSPICE delay circuits
• 0.45 %
38
Overall Results
Final Power Savings
Circuits
C1355
C1908
C2670
C3540
C432
C499
C5315
C6288
C7552
C880
alu4
apex2
apex4
des
ex1010
ex5p
misex3
pdc
seq
spla
Average
% Glitch Elimination
Ideal
Scheme 1
28.8
26.7
21.1
17.6
13.4
12.1
31.7
26.5
17.1
11.2
34.6
33.0
22.8
19.2
73.1
46.3
25.5
22.6
9.6
7.8
3.6
3.3
4.3
4.1
10.1
9.5
17.9
15.4
18.4
17.1
28.1
25.4
8.1
7.8
13.3
11.8
6.1
5.9
21.4
20.8
22.6
18.0
40
Summary
1. Proposed an active glitch elimination
technique for FPGAs
2. Examined how to implement the technique
3. Reduced power by 18% with only 5% area
and 1% speed
4. Proposed technique requires little or no
modifications to the CAD flow or routing
architecture
41