Document 7578428

Download Report

Transcript Document 7578428

Asynchronous Design Using Commercial HDL Synthesis Tools

Michiel Ligthart Karl Fant Ross Smith Alexander Taubin Alex Kondratyev

Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work

Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work

Potential NCL Advantages

Inherent to asynchronous

-

no clock system low EMI free stand-by mode, etc.

Inherent to delay-insensitive

-

nicely fits current/future (DSM) technology easy to reuse design plug ’n’-play SoC design easily portable among technologies

Particular to NULL Convention Logic (NCL)

-

ease of design (reduced time to market) use standard HDL and commercial tools to simulate and synthesize asynchronous circuits

Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work

Data Communication Based on DI Encoding

• DI protocol with spacer (NULL) –

NULL propagation / NULL acknowledge

Data propagation / Data acknowledge

DATA by codeword

Combinational circuitry

Request for DATA/NULL

Completion detection

NCL: Pushing Two-phase Behavior Down to the Level of Each Gate

Logic gate no data present

NCL: Pushing Two-phase Behavior Down to the Level of Each Gate

Logic gate complete data present Gate output acknowledges input changes

Simplest DI encoding - dual rail [Sims’58]

General Implementation of Hysteresis Gates in CMOS •

Dual-rail circuits under

two-phase operation: A transition from NULL to Set is positively unate p-tree Data is monotonic An input transition to NULL resets all gates to NULL

Reset function

g

R

(

x

1 ,...,

x n

) 

x

1 

Reset

x

2  ...

x n

x1

n-tree

Set function

xn g=S+gR

Refined Implementation of NCL Hysteresis Gates in CMOS Canonical form of reset is the key to use synchronous optimization tools

R

(

x

1

,...,

x n

)

x

1 

x

2 

...

x n

Depends only on the number of inputs

g x1

n-tree

Set function

xn g=S+gR

Reset of each individual gate scales up to the whole network

DIMS

[Muller’62] [Sparso’92]

1 2

Family of Logic Gates

M

of

N

threshold gates with hysteresis behavior

1 1 2

OR gate equivalents

2 1 2 3 3 4 5 3 4 Room for optimization

a b

Example: 2-of-3 Threshold Gate with Hysteresis

a b • The gate switches to

data

when

M

inputs are

data

to

NULL

when all inputs are

NULL

c

z

a c b c • It is possible to use “negative logic” – reversing pull-up and pull-down networks z=ab+ac+bc +z (a+b+c)

Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work

RTL Design Flow – Combinational Optimization Separate combinational logic and registers

Replaced by NCL registration in RTL code Subject of synthesis and optimization The topic of this presentation

Combi national process

Request for data/null reset Request for data/null

Two-Step Synthesis Flow (Using Synopsys' Design Compiler) Generic library NCL library VHDL

Synthesis

Step 1. Translate HDL into “synchronous” netlist

Intermediate netlist Dual-rail definition

Synthesis

Step 2.

Convert intermediate netlist into NCL netlist

NCL netlist

Input to Step 1: RTL Description (Multiplexer Example) • RTL description (MUX)

entity test input a,b,s : ncl_logic ; output z : ncl_logic ; architecture process (a, b, s) is begin if s = ‘1’ then z <= a; else z <= b; end if; end process;

s a b z

s a b MUX Example: Output of Step 1 / Input to Step 2: Intermediate Netlist z s a b x y z Two input NAND gates

Dual-rail Package

Define type type dual_rail_logic is record rail1 : std_logic ; rail0 : std_logic ; end record ;

Overload operators function “nand” a.0

a.1

22 22 a {0,1,N} 13 b.0

b.1

22 22 a.0

a.1

{0,1} {0,1} z.0

z.1

th22 = two-input C-element th13 = three-input OR function “not” a.0

a.1

z.1

z.0

Optimizing with Design Compiler

Dual-rail expansion

Two phases (set and reset) are separated

Set phase ensures circuit functionality

Reset phase is implied

Optimizations are applied to the set phase

Dual-rail Expansion of MUX

s a b x y z s.t

s.f

a.t

a.f

b.t

b.f

x.t

D-R NAND x.f

D-R NAND z.t

z.f

D-R NAND y.t

y.f

Naive semi-static DIMS implementation – 114 transistors (can be reduced to 63 transistors by merging C-elements with OR-gates) versus 14 for a synchronous circuit

“Images”-Boolean Gates

a b a b

Implementing Set Functions

NCL gates

equivalent for set phase

Boolean gates (images)

z=a+b

z z

Projection for optimization

a b a

z=a+b

z z b

th22

a

z=ab+ z(a+b)

z b c

th33w2 Mapping for implementation

a b c

z=ab

z

z=a(b+c)+ z(a+b+c)

In the initial state:

z=a=b=c=0 z=a(b+c) Hysteresis sequential behavior Combinational behavior

a.t

a.f

b.t

b.f

Image of Dual-rail NAND Gate

D-R NAND out.t

out.f

a.t

b.t

a.f

b.f

C C C C out.t

out.f

C-element equation: z=ab + z(a+b).

Image of Dual-rail NAND Gate

a.t

b.t

out.t

out.f

a.f

b.f

C-element equation: z=ab+z(a+b), initially z=a=b=0 In a set phase it behaves like an AND gate z=ab

s.t

s.f

a.t

a.f

b.t

b.f

Dual-rail Expansion for MUX

x.t

x.f

z.t

z.f

y.t

y.f

Twelve 2-input C-gates & Three 3-input OR-gates

Image Circuit of Dual-rail Expansion for MUX

s.t

s.f

a.t

a.f

b.t

b.f

x.t

x.f

y.t

y.f

z.t

z.f

Optimized with Design Compiler

MUX circuit passes technology independent optimization and is mapped to “images” of gates from NCL library.

s.t

s.f

a.t

a.f

b.t

b.f

A(B+C) image of th33w2 AB+CD

z.f

z.t

image of thXOR

Technology Mapping with Design Compiler

s.t

s.f

a.t

NCL circuit: hysteresis images are replaced by gates with th33w2

2 th22

thXOR

2

Semi-static CMOS implementation of z.t

thXOR.

e

th24w2

f a.f

b.t

b.f

th33w2

2 th22

z.f

m n

2 th24w2

e m f 44 transistors - 30% better than optimized DIMS n k

Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work

Virtual object

Optimization Flow

Synchronous

Boolean circuit

translation

Asynchronous dual-rail package Real object DIMS circuit DI equivalence Hysteresis gates

Dual-rail image

optimization

Design compiler

Optimized circuit

tech.mapping

Design compiler

Mapped to images

Validation of Optimization

The validity of transformations (DI equivalence) is based on two properties:  Functional equivalence of optimized and original circuits (under two-phase operation)  Maintenance of DI properties in optimized circuit Both are based on the properties of prime and irredundant networks and properties factorization [Brayton’90, Hachtel’92] of algebraic

Validation of Optimization: Idea of the Proof

Starting point: prime and irredundant Boolean network

(known to be

100% stuck-at testable

, [Scherz’72])

algebraic transformations

Set of test vectors for stuck-at faults is maintained

[Hachtel’92]

induction by topology order

Testability: each gate acknowledges inputs changes (Delay insensitivity) Same for tree-based technology mapping

Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work

Manual vs. Synthesized Designs

Area (transistor number) 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Manual Synthesized For bigger circuits Synthesis/Manual ratio is better (22% improvement for biggest example)

2 5 0 0

Synchronous vs. NCL design

gates transistors 35000 Penalty in transistors:

Dual-rail implementation

Effective delay-insensitivity 30000 clock 2 0 0 0 25000 15 0 0 NCL 20000 15000 10 0 0 10000 5 0 0 5000 0 0

To reduce transistor count:  Use four-rail encoding  Improve architectural solutions: e.g., OR instead MUX  Compromise delay insensitivity

Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work

Conclusions

• First methodology to use standard HDL and commercial tools both to simulate and synthesize asynchronous circuits • The methodology is formally validated • The results of the synthesis are acceptable

Future Tasks

 Reduce area/power without losing delay insensitivity (e.g., four-rail design)  Relax DI requirements to reduce area (e.g., using timing assumptions )  Use peephole optimizations (e.g., merge gates used for registration with their input gates etc.)  Write DesignWare components to get better performance for arithmetic units (infer hand designed components )