Transcript Document 7578428
Asynchronous Design Using Commercial HDL Synthesis Tools
Michiel Ligthart Karl Fant Ross Smith Alexander Taubin Alex Kondratyev
Outline
Added Value of NCL - Simplification of design
Canonical form of gates - The key for optimization
NCL in CAD flow. An example
Validation of optimization
Experimental results
Conclusion and future work
Outline
Added Value of NCL - Simplification of design
Canonical form of gates - The key for optimization
NCL in CAD flow. An example
Validation of optimization
Experimental results
Conclusion and future work
Potential NCL Advantages
Inherent to asynchronous
-
no clock system low EMI free stand-by mode, etc.
Inherent to delay-insensitive
-
nicely fits current/future (DSM) technology easy to reuse design plug ’n’-play SoC design easily portable among technologies
Particular to NULL Convention Logic (NCL)
-
ease of design (reduced time to market) use standard HDL and commercial tools to simulate and synthesize asynchronous circuits
Outline
Added Value of NCL - Simplification of design
Canonical form of gates - The key for optimization
NCL in CAD flow. An example
Validation of optimization
Experimental results
Conclusion and future work
Data Communication Based on DI Encoding
• DI protocol with spacer (NULL) –
NULL propagation / NULL acknowledge
–
Data propagation / Data acknowledge
DATA by codeword
Combinational circuitry
Request for DATA/NULL
Completion detection
NCL: Pushing Two-phase Behavior Down to the Level of Each Gate
Logic gate no data present
NCL: Pushing Two-phase Behavior Down to the Level of Each Gate
Logic gate complete data present Gate output acknowledges input changes
Simplest DI encoding - dual rail [Sims’58]
General Implementation of Hysteresis Gates in CMOS •
Dual-rail circuits under
•
two-phase operation: A transition from NULL to Set is positively unate p-tree Data is monotonic An input transition to NULL resets all gates to NULL
Reset function
g
R
(
x
1 ,...,
x n
)
x
1
Reset
x
2 ...
x n
x1
n-tree
Set function
xn g=S+gR
Refined Implementation of NCL Hysteresis Gates in CMOS Canonical form of reset is the key to use synchronous optimization tools
R
(
x
1
,...,
x n
)
x
1
x
2
...
x n
Depends only on the number of inputs
g x1
n-tree
Set function
xn g=S+gR
Reset of each individual gate scales up to the whole network
DIMS
[Muller’62] [Sparso’92]
1 2
Family of Logic Gates
M
of
N
threshold gates with hysteresis behavior
1 1 2
OR gate equivalents
2 1 2 3 3 4 5 3 4 Room for optimization
a b
Example: 2-of-3 Threshold Gate with Hysteresis
a b • The gate switches to
data
when
M
inputs are
data
to
NULL
when all inputs are
NULL
c
z
a c b c • It is possible to use “negative logic” – reversing pull-up and pull-down networks z=ab+ac+bc +z (a+b+c)
Outline
Added Value of NCL - Simplification of design
Canonical form of gates - The key for optimization
NCL in CAD flow. An example
Validation of optimization
Experimental results
Conclusion and future work
RTL Design Flow – Combinational Optimization Separate combinational logic and registers
Replaced by NCL registration in RTL code Subject of synthesis and optimization The topic of this presentation
Combi national process
Request for data/null reset Request for data/null
Two-Step Synthesis Flow (Using Synopsys' Design Compiler) Generic library NCL library VHDL
Synthesis
Step 1. Translate HDL into “synchronous” netlist
Intermediate netlist Dual-rail definition
Synthesis
Step 2.
Convert intermediate netlist into NCL netlist
NCL netlist
Input to Step 1: RTL Description (Multiplexer Example) • RTL description (MUX)
entity test input a,b,s : ncl_logic ; output z : ncl_logic ; architecture process (a, b, s) is begin if s = ‘1’ then z <= a; else z <= b; end if; end process;
s a b z
s a b MUX Example: Output of Step 1 / Input to Step 2: Intermediate Netlist z s a b x y z Two input NAND gates
Dual-rail Package
•
Define type type dual_rail_logic is record rail1 : std_logic ; rail0 : std_logic ; end record ;
•
Overload operators function “nand” a.0
a.1
22 22 a {0,1,N} 13 b.0
b.1
22 22 a.0
a.1
{0,1} {0,1} z.0
z.1
th22 = two-input C-element th13 = three-input OR function “not” a.0
a.1
z.1
z.0
Optimizing with Design Compiler
Dual-rail expansion
Two phases (set and reset) are separated
Set phase ensures circuit functionality
Reset phase is implied
Optimizations are applied to the set phase
Dual-rail Expansion of MUX
s a b x y z s.t
s.f
a.t
a.f
b.t
b.f
x.t
D-R NAND x.f
D-R NAND z.t
z.f
D-R NAND y.t
y.f
Naive semi-static DIMS implementation – 114 transistors (can be reduced to 63 transistors by merging C-elements with OR-gates) versus 14 for a synchronous circuit
“Images”-Boolean Gates
a b a b
Implementing Set Functions
NCL gates
equivalent for set phase
Boolean gates (images)
z=a+b
z z
Projection for optimization
a b a
z=a+b
z z b
th22
a
z=ab+ z(a+b)
z b c
th33w2 Mapping for implementation
a b c
z=ab
z
z=a(b+c)+ z(a+b+c)
In the initial state:
z=a=b=c=0 z=a(b+c) Hysteresis sequential behavior Combinational behavior
a.t
a.f
b.t
b.f
Image of Dual-rail NAND Gate
D-R NAND out.t
out.f
a.t
b.t
a.f
b.f
C C C C out.t
out.f
C-element equation: z=ab + z(a+b).
Image of Dual-rail NAND Gate
a.t
b.t
out.t
out.f
a.f
b.f
C-element equation: z=ab+z(a+b), initially z=a=b=0 In a set phase it behaves like an AND gate z=ab
s.t
s.f
a.t
a.f
b.t
b.f
Dual-rail Expansion for MUX
x.t
x.f
z.t
z.f
y.t
y.f
Twelve 2-input C-gates & Three 3-input OR-gates
Image Circuit of Dual-rail Expansion for MUX
s.t
s.f
a.t
a.f
b.t
b.f
x.t
x.f
y.t
y.f
z.t
z.f
Optimized with Design Compiler
MUX circuit passes technology independent optimization and is mapped to “images” of gates from NCL library.
s.t
s.f
a.t
a.f
b.t
b.f
A(B+C) image of th33w2 AB+CD
z.f
z.t
image of thXOR
Technology Mapping with Design Compiler
s.t
s.f
a.t
NCL circuit: hysteresis images are replaced by gates with th33w2
2 th22
thXOR
2
Semi-static CMOS implementation of z.t
thXOR.
e
th24w2
f a.f
b.t
b.f
th33w2
2 th22
z.f
m n
2 th24w2
e m f 44 transistors - 30% better than optimized DIMS n k
Outline
Added Value of NCL - Simplification of design
Canonical form of gates - The key for optimization
NCL in CAD flow. An example
Validation of optimization
Experimental results
Conclusion and future work
Virtual object
Optimization Flow
Synchronous
Boolean circuit
translation
Asynchronous dual-rail package Real object DIMS circuit DI equivalence Hysteresis gates
Dual-rail image
optimization
Design compiler
Optimized circuit
tech.mapping
Design compiler
Mapped to images
Validation of Optimization
The validity of transformations (DI equivalence) is based on two properties: Functional equivalence of optimized and original circuits (under two-phase operation) Maintenance of DI properties in optimized circuit Both are based on the properties of prime and irredundant networks and properties factorization [Brayton’90, Hachtel’92] of algebraic
Validation of Optimization: Idea of the Proof
Starting point: prime and irredundant Boolean network
(known to be
100% stuck-at testable
, [Scherz’72])
algebraic transformations
Set of test vectors for stuck-at faults is maintained
[Hachtel’92]
induction by topology order
Testability: each gate acknowledges inputs changes (Delay insensitivity) Same for tree-based technology mapping
Outline
Added Value of NCL - Simplification of design
Canonical form of gates - The key for optimization
NCL in CAD flow. An example
Validation of optimization
Experimental results
Conclusion and future work
Manual vs. Synthesized Designs
Area (transistor number) 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Manual Synthesized For bigger circuits Synthesis/Manual ratio is better (22% improvement for biggest example)
2 5 0 0
Synchronous vs. NCL design
gates transistors 35000 Penalty in transistors:
Dual-rail implementation
Effective delay-insensitivity 30000 clock 2 0 0 0 25000 15 0 0 NCL 20000 15000 10 0 0 10000 5 0 0 5000 0 0
To reduce transistor count: Use four-rail encoding Improve architectural solutions: e.g., OR instead MUX Compromise delay insensitivity
Outline
Added Value of NCL - Simplification of design
Canonical form of gates - The key for optimization
NCL in CAD flow. An example
Validation of optimization
Experimental results
Conclusion and future work
Conclusions
• First methodology to use standard HDL and commercial tools both to simulate and synthesize asynchronous circuits • The methodology is formally validated • The results of the synthesis are acceptable
Future Tasks
Reduce area/power without losing delay insensitivity (e.g., four-rail design) Relax DI requirements to reduce area (e.g., using timing assumptions ) Use peephole optimizations (e.g., merge gates used for registration with their input gates etc.) Write DesignWare components to get better performance for arithmetic units (infer hand designed components )