VHDL Design Tips and Low Power Design Techniques Jonathan Alexander Applications Consulting Manager Actel Corporation MAPLD 2004

Download Report

Transcript VHDL Design Tips and Low Power Design Techniques Jonathan Alexander Applications Consulting Manager Actel Corporation MAPLD 2004

VHDL Design Tips and
Low Power
Design Techniques
Jonathan Alexander
Applications Consulting Manager
Actel Corporation
MAPLD 2004
Agenda
Advanced VHDL
ProASICPlus Synthesis, Options and
Attributes
 Timing Specifications
 Design Hints
Power-Conscious Design Techniques
Summary
Alexander
2
MAPLD 2004
Place &
Route
Synthesis
Actel ProASICPlus Design Flow
Alexander
VHDL
Source
Directives
Logic
Optimization
Attributes
Technology
Mapping
Timing
Timing, Pin,
Placement
Technology
Implementation
3
MAPLD 2004
What is Synthesis?
The mapping of a behavioral description to a
specific target technology,
 i.e. Generates a structural netlist from a HDL description
Includes optimization steps
 Optimize the design implementation for
 Higher Speed
 Smaller Area
 Lower Power
Alexander
4
MAPLD 2004
ProASICPlus HDL Attributes and
Directives
 Attributes are used to direct the way your design is
optimized and mapped during synthesis.
 Directives control the way your design is analyzed
prior to synthesis. Because of this, directives must be
included in your VHDL source code.
 Three important ProASICPlus attributes or directives
are available:
 “syn_maxfan” (attribute)
 “syn_keep” (directive)
 “syn_encoding” (attribute)
Alexander
5
MAPLD 2004
ProASICPlus HDL Attributes and
Directives (cont’d)
 syn_maxfan = “Value”
 “Value” Range > 4
 Can be assigned to an input port, register output, or a net
 Overrides the global “Fanout Limit” setting
 The tool will replicate the signal if this attribute is associated with
it
 Syntax
 In the HDL code
 attribute syn_maxfan of data_in : signal is 1000;
 In the constraint file
 define_attribute {clk} syn_maxfan {200}
Alexander
6
MAPLD 2004
ProASICPlus HDL Attributes and
Directives (cont’d)
 syn_keep = 1
 When associated with a signal, this directive prevents Synplify
from combining or collapsing the node.
 This attribute can be associated with combinatorial signals only
 Syntax
 In the HDL code
 Attribute syn_keep of st: signal is Integer :=1 ;
 In the constraint file
 define_attribute {st} syn_keep {1};
Alexander
7
MAPLD 2004
Agenda
Advanced VHDL
 ProASICPlusSynthesis and Options and Attributes
Timing Specifications
 Design Hints
Power-Conscious Design Techniques
Summary
Alexander
8
MAPLD 2004
Timing Constraints Specification
 Synplify ProASICPlus mapper allows specification of the
following:
 Global Design Frequency
 Multi-clock design
 Skew between two clocks
 Input and output delays
 Functional multi-cycle and false paths
 All these timing specifications are available in the
GUI, the presentation will cover the sdc constructs
only.
Alexander
9
MAPLD 2004
Design Frequency Specification
Multiple Clocks
 Graphical User Interface “Frequency” item allows
specification of a global value for all clocks
 This setting influences the operator architecture selection
(speed or area) during mapping
 This value should be set to the highest frequency required in
the design
 To specify individual values for different clocks, use the
following sdc construct
 define_clock {clock_1} -freq <Value1>
 define_clock {clock_2} -freq <Value2>
Alexander
10
MAPLD 2004
Skew Specification in Synplify
 To define a skew between two clocks, use the following
constraint:
 define_clock_delay -rise {clock1} -rise {clock2} “value”
 Example
 define_clock_delay -rise {CLK19M} -rise {MPU_CLK} 1.0
 define_clock_delay -rise {MPU_CLK} -rise {CLK19M} 2.0
Alexander
11
MAPLD 2004
Input Delay
 Specifies the input arrival time of a signal in relation
to the clock.
 It is used at the input ports, to model the interface of
the inputs of the FPGA with the outside environment.
 The value entered should represent the delay outside
of the chip before the signal arrives at the input pin
 To specify the “input delay” on an input port, use the
following constraint:
 define_input_delay {InputPortName} “Value”
Alexander
12
MAPLD 2004
Output Delay
Specifies the delay of the logic outside the
FPGA driven by the top-level outputs.
Used to model the interface of the outputs of
the FPGA with the outside environment.
To specify the “output delay”, use the
following constraints:
 define_output_delay {OutputPortName} “Value”
Alexander
13
MAPLD 2004
Functional False Path
 “define_false_path” allows user to specify paths which
will be ignored for timing analysis, but will still be
optimized, without priority within Synplify.
 The following options are available :



-from
-to
-through
< a register or input pin>
<a register or output pin>
<through a net signal>
 Example
 define_false_path -from Register_A
 define_false_path -to Register_B

#Paths to Register_B are ignored
 define_false_path -through test_net

#Paths through Int_Net are ignored
Alexander
14
MAPLD 2004
Agenda
Advanced VHDL
 ProASICPlus Synthesis, Options and Attributes
 Timing Specifications
Design Hints
Power-Conscious Design Techniques
Summary
Alexander
15
MAPLD 2004
Late Arrival Signals:
Prioritization
-- Initial Description
case State is
when WAIT =>
if Critical then
Target <= Source_1;
else Target <= Source_2;
end if;
-- Modified Description
if Critical then Target <=Source_1;
else
case State is
when WAIT =>
Target <= Source_2;
when ACTIVE =>
Target <= Source_3;
when ….
end case;
end if;
when ACTIVE =>
if Critical then
Target <= Source_1;
else Target <= Source_3;
end if;
when ….
end case;
Source_2
State
Target
Target
Source_1
Source_1
Critical
Critical
Alexander
16
MAPLD 2004
Late Arrival Signal:
Another Hint !
Max
>=
…….
begin
if ((A_late + B) >= Max)
then Out = C;
else Out = D;
end if;
… …
end Process;
A_late
+
B
C
mux
D
Out
A_late
>=
Max
if ((B - Max) >= A_late)
Out = C;
else Out = D;.
B
C
D
Alexander
17
mux
Out
MAPLD 2004
Signal vs Variable
 Variable assignments are sensitive to order.
 Variables are updated immediately
 Signal assignments are order independent.
 Signal assignments are scheduled
Process (Clk)
begin
if (Clk’Event and Clk=‘1’) then
Trgt1 <= In1 xor In2;
Trgt2 <= Trgt1;
Trgt3 <= Trgt2;
end if;
end process;Process (Clk)
begin
if (Clk’Event and Clk=‘1’) then
Trgt2 <= Trgt1;
Trgt3 <= Trgt2;
Trgt1 <= In1 xor In2;
end if;
end process;
Signal vTarg3 : std_logic;
Process (Clk)
Variable vTarg1, vTarg2: ...
begin
if (Clk’Event and Clk=‘1’) then
vTrgt1 := In1 xor In2;
vTrgt2 := vTrgt1;
vTrgt3 <= vTrgt2;
end if;
end process;
Process (Clk)
Variable vTarg1, vTarg2 : ...
begin
if (Clk’Event and Clk=‘1’) then
Trgt3 <= vTrgt2;
vTrgt2 := vTrgt1;
vTrgt1 := In1 xor In2;
end if;
end process;
Trgt3
Trgt3
Trgt3
Alexander
18
MAPLD 2004
Resource Sharing and
“Operand” Alignment
Sel
X
Y
mux
HDL Code
process (X, Y, Z, Sel)
begin
if (Sel = ‘0’) then
Res <= X * Y ;
else
Res <= Y * Z ;
end if;
end process;
*
Y
*
With Resource
Sharing
(Smaller)
Res
Y
Operand
Alignment
(Faster*)
Res
X
mux
Z
mux
Sel
Z
(*) Especially if
Y is a
Late Arrival Signal
Sel
X
*
Y
mux
Y
*
Without Resource
Sharing
(Larger and Slower)
Res
Z
Sel
Implementations
Alexander
19
MAPLD 2004
Resource Sharing to Avoid
Buses
VHDL Code
Sel
With Resource
Sharing
(Larger and Slower)
X
16
mux
=
16
Z
process (X, Y, Z, T, Sel)
begin
if (Sel = ‘0’) then
Eq <= (X = Y);
else
Eq <= (Z =T);
end if;
end process;
1
Y
Eq
mux
T
Sel
X
=
1
Y
mux
Z
1
Eq
=
Without Resource
Sharing
(Smaller and Faster)
T
Sel
Implementation
Alexander
20
MAPLD 2004
Internal Three-state Buffers
 At the VHDL Level
tri_en1
 Either Using the
Multiplexer based
modified VHDL code, or
tri_in1
tri_en2
tri_in2
 Replace the three-state
structure using
the equivalent following
AND-OR structure
tri_en3
tri_in3
tri_en4
tri_in4
tri_en1
tri_in1
tri_en2
tri_in2
tri_out
mux_in4
tri_out
mux_in3
mux_en3
tri_en
3
tri_in3
mux_in2
mux_en2
tri_en4
tri_in4
mux_out
mux_in1
mux_en1
Alexander
21
MAPLD 2004
Agenda
Advanced VHDL
Power-Conscious Design Techniques
Data Path Selection
 FSM Encoding
 Gating Clocks and Signals
 Advanced Power Design Practices
Summary
Alexander
22
MAPLD 2004
Sources of Dynamic Power
Consumption
 Switching
 CMOS circuits dissipate power during switching
 The more logic levels used, the more switching activity needed
 Frequency
 Dynamic power increases linearly with frequency
 Loading
 Dynamic power increases with capacitive loading
 Glitch Propagation
 Glitches cause excessive switching to occur at relatively high
frequencies.
 Clock Trees
 Clock Trees operate at high frequency under heavy loading, so
they contribute significantly to the total power consumption.
Alexander
23
MAPLD 2004
Data Path Elements Selection
 Basic block selection is critical as the power/speed
tradeoff has to be well identified
 Power is switching activity dependent, thus input data
pattern dependent
 Watch the architecture of the basic arithmetic and
logic blocks
 Check area/speed and fanout distribution/number of logic levels
 High fanout + large number of logic level = higher glitch
propagation
 Investigate pipelining effect on power dissipation
 Impact on clock tree power consumption
 Impact on block fanout distribution
Alexander
24
MAPLD 2004
Data Path Architectures
Adders Architectures
 Architecture Evaluation
 Test Results
Multipliers
 Architectures and Power Implications
Pipelined Configurations
 Pipeline Effect on Power
 Pipelining vs re-Timing
Alexander
25
MAPLD 2004
Review: Ripple Adder
Carry signal switching propagates through all the stages
and consumes Power
Alexander
26
MAPLD 2004
Review Carry Look-Ahead Adder
 Carry signal switching propagates through less stages
 However, higher number of Logic Level
Alexander
27
MAPLD 2004
Carry Select Adder Overview
Principle:
Do it twice (considering Carry=0 and Carry=1)
then when actual Carry is ready,
Select appropriate result
 Carry signal switching propagates through less stages
 However, higher duplication and complexity
Alexander
28
MAPLD 2004
Adder Architectures
45
360
Delay (ns)
40
Area (# Tiles)
310
RPL
35
CLA
CLF
RPL
CLA
CLF
BK
BK
260
30
210
25
160
20
110
15
60
10
Bit Width
Width
5
10
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Forward Carry Look Ahead (CLF): Fastest but also largest
Brent and Kung (BK): Almost same speed as CLF but drastically smalle
Carry Look Ahead (CLA): Relatively small and slow
Ripple (RPL): Smallest but slowest
Brent and Kung: Best area/speed tradeoff
Alexander
29
MAPLD 2004
Adders Power Dissipation
Brent and Kung: Lowest Power Dissipation
 Lowest logic levels
 Lowest fanout
Power (mW)
45
40
35
Power Consumption of 32 bit Adder (Speed)
RPL
CLA
BK
CLF
30
25
20
15
10
5
Frequency
0
5
Alexander
6
7
8
9
10
11
12
13
14
15
16
17
18
19
30
20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35
MAPLD 2004
Data Path Architectures
Adders Architectures
 Architecture Evaluation
 Test Results
Multipliers
 Architectures and Power Implications
Pipelined Configurations
 Pipeline Effect on Power
 Pipelining vs re-Timing
Alexander
31
MAPLD 2004
Multiplier’s Power Consumption
 Wallace Advantages Over Carry-Save Multiplier (CSM)
 Uniform switching propagation
 Less logic levels
 Lower average fanout
Alexander
32
MAPLD 2004
Data Path Architectures
Adders Architectures
 Architecture Evaluation
 Test Results
Multiplier
 Architectures and Power Implications
Pipelined Configurations
 Pipeline Effect on Power
 Pipelining vs re-Timing
Alexander
33
MAPLD 2004
Pipelining for Glitch Reduction
Alexander
34
1/2
Processing
Unit
Register
Processing
Unit
Register
Register
 A logically deep internal net is typically affected by more
primary inputs switching, and is therefore more susceptible
to glitches
 Pipelining shortens the depth of combinatorial logic by
inserting pipeline registers
 Pipelining is very effective for data path elements such as
parity trees and multipliers
1/2
Processing
Unit
MAPLD 2004
Pipelining Effect on Power
Power (mW)
400
350
300
FFT_10
Pipelined FFT
ClockTree_10
Pipelined
Clock Tree
FFT30
Non- Pipelined FFT
NonPipelined Clock
ClockTree_30
Tree
250
200
150
100
50
Frequency
0
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Pipelining increases clock tree power, but overall
power is lowered
Alexander
35
MAPLD 2004
Pipelining vs. Re-timing
Pipelining introduces new registers
Re-timing does not introduce new registers
 Example: FIR re-timing
Re-timing also reduces power
 Registers prevent glitch propagation through high logiclevel paths (ie mulitpliers)
Alexander
36
MAPLD 2004
Agenda
Advanced VHDL
Power Conscious Design Techniques
 Data Path Selection
FSM Encoding
 Gating Clocks and Signals
 Advanced Power Design Practices
Summary
Alexander
37
MAPLD 2004
FSM and Counter Encoding:
Impact on Power
State
One Hot
Gray
Binary
SO
S1
S2
S3
S4
S5
S6
S7
OOOOOOO1
OOOOOO1O
OOOOO1OO
OOOO1OOO
OOO1OOOO
OO1OOOOO
O1OOOOOO
1OOOOOOO
OOO
OO1
O11
O1O
11O
111
1O1
1OO
OOO
OO1
O1O
O11
1OO
1O1
11O
111
16
8
11
2
1
3
8
3
3
Total Number of Transitions
Maximum Transitions Per
Clock Cycle
Clock Load
Alexander
38
MAPLD 2004
Counters and FSMs:
State Register Transitions
120
100
80
60
Number of State Register Toggles
140
Gray
Binary
One Hot
40
20
Number of States
0
4
Alexander
8
16
39
32
64
MAPLD 2004
Counter’s Power Measurement
on ProASIC
250
Power (mW)
300
Binary Vs Gray
Power Consumption (mW)
Binary (mW)
Gray(mW)
200
150
100
Frequency
40
38
36
34
32
30
28
26
24
22
20
18
16
14
12
10
50
Power dissipation for 200 instances of 8 bit-counters
As expected Gray counters dissipate less power (~25%)
Alexander
40
MAPLD 2004
FSM Encoding: Effects on Power
140
100
Power (mW)
120
170 States FSM Power Consumption
One Hot
Gray
Binary Clock
Binary
One Hot Clock
80
60
40
20
Frequency
0
5
Alexander
6
7
8
9
10
11
12
13
41
14
15
16
17
18
19
20
MAPLD 2004
Agenda
Advanced VHDL
Power Conscious Design Techniques
 Data Path Selection
 FSM Encoding and Effect on Power
Gating Clocks & Signals
 Advanced Power Design Practices
Summary
Alexander
42
MAPLD 2004
Signal Gating
There are several logic implementations of
signal gating
&
Latch or
FF
Tri-state buffer
Alexander
43
MAPLD 2004
Gating Clocks
Most Used mechanism to gate clocks
Data_Out (N Bits)
New_Data
New_Data (N Bits)
LD_Enable
LD_Enable
FSM
FSM
CLK
L
A
T
C
H
CLK_En
CLK
Gating clock signals with combinatorial logic is not recommended.
Glitches are easily created by the clock gate which may result in incorrect
triggering of the register
Alexander
44
MAPLD 2004
Gating Signals:
Address Decoder Example
IN0
OUT0
IN0
IN1
OUT0
IN1
OUT1
OUT1
OUT2
OUT2
OUT3
Enable/Select
OUT3
A switching activity on one of the input of the decoder
will induce an large number of toggling outputs
Enable/Select signal prevents the propagation of their
switching activity
Alexander
45
MAPLD 2004
Agenda
Advanced VHDL
Power Conscious Design Techniques
 Data Path Selection
 FSM Encoding and Effect on Power
 Gating Clocks and Signals
Advanced Practices
Summary
Alexander
46
MAPLD 2004
VHDL Coding Effect on Power
 Example: IF … THEN …. ELSE ….;
Glitchy
Expression
Stable
Expression
Stable
Expression
Mux
Mux
Glitchy
Expression
Mux
Mux
 Re-organizing the code helps to prevent propagation
of switching activity
Alexander
47
MAPLD 2004
Delay Balancing
If all primary inputs have the same arrival
time and the same switching probability,
balancing trees eliminates switching
propagation
X
T
Z
+
Y
Y
+
+
+
T
X
Z
Un-Balanced
Alexander
+
+
Balanced
48
MAPLD 2004
Guarded Evaluation
 Technique used to reduce switching activity by adding
latches or floating gates at the inputs of combinatorial
blocks if their outputs are not used.
 Example: Results of multiplier may or may not be used
depending on the condition, Adding transparent
Latches or AND gates on the inputs avoids power
dissipation as they mask useless input activity.
Condition
L
a
t
c
h
Condition
Multiplier
Alexander
M
u
x
49
Condition
Multiplier
M
u
x
MAPLD 2004
Pre-computation Based Power
Reduction
Pre-Computation
Input
Gated
Input
R1
Combinatorial
Logic
Outputs
Common Clock
R2
Pre-Computation
Logic
Alexander
50
MAPLD 2004
Operator Reduction
Based on transformations of operations into
computationally equivalent implementations
Example: Distributive Multiplication over
Addition (resource sharing)
 (X*Y) + (Z*Y) = (X+Z) * Y
X
X
Y
*
Alexander
*
+
Y
Z
Z
+
Y
*
51
MAPLD 2004
Input Signals Ordering
 Never forget that adders are commutative and associative
IN
IN
+
>>7
IN
IN
+
>>8
IN
>>8
>>7
+
+
IN
Switching Probability
 Amplitude of IN is larger than the amplitude of IN >> 7 and IN >> 8
Alexander
IN>>8
IN>>7
IN
Sign Bit Correlation
2
4
6
8
10
12 14 ..
Bit Number
52
MAPLD 2004
Summary
Advanced VHDL Design Tips
 Identify critical and late arrival signals in your design
 Write code in a way that reduces the logic levels for such
signals
 Perform functions such as state determination while
waiting for late signals
Low Power Design Techniques
 Reduce switching activity per clock cycle
 Reduce propagation of switching activity
 Use power-efficient architecture and encoding
 Disable logic blocks whose outputs are not used
 Re-evaluate expressions to achieve the above
Alexander
53
MAPLD 2004
Additional Resources
Documents available on
http://www.actel.com
 Low Power Resource Center
 http://www.actel.com/products/rescenter/power/index.html
 Power Conscious Design with ProASIC
 http://www.actel.com/documents/PowerConscious.pdf
 Low Power Design for Antifuse FPGAs
 http://www.actel.com/documents/lowpower.pdf
Alexander
54
MAPLD 2004