Delay Insensitive Methods - Vienna University of Technology

Download Report

Transcript Delay Insensitive Methods - Vienna University of Technology

Advanced Digital Design Asynchronous Design: DI Methods

by A. Steininger and M. Delvai Vienna University of Technology

Outline

     Delay Insensitive design - principle NULL-Convention Logic Code conditions for DI logic Four-State Logic Evaluation of async. design styles    Bundled Data NULL-Convention Logic Four-State Logic Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 2

recal l

Asynchronous Philosophy

 „The control flow requires agreement between source and sink. For this purpose they need to communicate“   Source indicates capture condition for sink.

Sink indicates issue condition for source.

„HANDSHAKE“

3 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

recal l

Handshake Principle

REQ: „Data word valid, you can use it“

When can SNK use its input?

When it is valid and consistent

SRC f(x) SNK

When can SRC apply the next input?

When SNK has consumed the previous one

ACK: „Data word consumed, send the next“

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 4

recal l

A very Important Detail

    The handshake establishes a

closed-loop control

for the data flow between sender and receiver This makes operation more robust than in the synchronous (= open-loop) case The art of asynchronous design is to make many of these closed loops interoperate properly This is much more complicated than a synchronous design.

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 5

Very disappointing…

   For a closed loop we need to

measure

the quantity of interest So far we have not done that:    We have not measured validity & consistency We have used time as an

indirect

measure instead Thus Bounded Delay methods do not provide the benefits of a closed loop BUT: Can we measure validity & consistency at all?

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 6

recal l

Criticality of ACK

SRC f(x)

„latch!“     cannot measure „act of latching“ as an event use latching command instead fork produces race between trigger process and next data wave race is uncritical (but still exists!) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 7

recal l

Criticality of REQ

SRC f(x) SNK

    cannot use issue trigger as an event: produces unacceptable race between data and REQ must introduce timer (bounded delay) OR: find better event (downstream) completion detection Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 8

Completion Detection

      In order to judge when data are valid & consistent we need to be able to see when this is NOT the case not possible with Boolean logic

need representation for INVALID

an ACK in parallel to data (bundled data) will always cause a race need more than two signal states for every individual bit (!) need more than one rail per bit Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 9

Multi-level Logic

     use more than two (e.g. three) voltage levels per rail allows to express „invalid“ in the currently „forbidden“ area between HI and „LO“ requires two thresholds for every gate input output must be able to drive three different levels reliably causes substantial technological problems not further pursued Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 10

recal l

Our Options

  We must only use consistent input vectors How can we tell an input vector is consistent?

(1) use TIME to mark consistent phases   synchronous approach / global time base asynchronous/bounded delay (2) use CODING to add information  asynchronous/delay insensitive 11 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

recal l

Terminology

consistent DW : all bits belong to the same context valid signal : result of function applied to consistent DW Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 12

NULL Convention Logic

Add the value NULL to the alphabet

Signal X X.a X.b

0 0 0 1 1 0

meaning

NULL (N) TRUE (T) FALSE (F) 1 1 illegal two-rail coding: X „DATA“ X.a

X.b

13 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

NCL Functions

AND

T F T T F N N F F F N N N N N

OR

T F T T T F T F N N N N N N N

naive approach: if any input is „N“ then output „N“

NOT

T F F T N N

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 14

NCL Flow Control

 NULL waves enframe DATA waves NULL NULL NULL NULL TRUE TRUE FALSE TRUE NULL TRUE NULL NULL TRUE FALSE NULL FALSE consistent DATA  Completion detection = check wether all bits are „DATA“ (completeness of DATA) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 15 t

Still Problems …

What about this situation?

output NULL NULL NULL NULL NULL DATA TRUE FALSE TRUE NULL NULL TRUE NULL TRUE NULL DATA TRUE FALSE consistent DATA Fast bits may catch up with a slow bit from the previous word. The word containing

the „old“ bit is considered consistent!

t Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 16

Solution Principle

 Enforce „completeness of NULL“ as well:   The output must not go to NULL before all inputs have changed to NULL In a closed loop configuration this keeps the slow paths in synchrony with the fast ones We need different truth table when output is NULL 17 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Two Truth Tables

for DATA waves

AND

T F N T T F N F F F N N N N N

for NULL waves

AND

T F N T T F D F F F N D D D N

D … DATA (T or F)    must hold output in last valid state before new input is complete need „hysteresis“ need to consider current output in truth table Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 18

Feedback Gate

A B Y

T F FN &

Y‘ Y A B Y‘

N N T F N F T F N T T F N T F N T F N T F N T F N T F N T F N T F N T F N T F N N N N T F N F F N F F F F F F F F N T T T T T T T T unstable (Y  Y‘) © A. Steininger & M. Delvai / TU Vienna 19

No more Problems …

Have we solved the problem?

output NULL NULL NULL TRUE DATA NULL TRUE NULL FALSE NULL NULL TRUE NULL consistent DATA t YES! The output now remains at DATA with the slowest bit, thus inhibiting (via the closed loop) the fast bits to convey the next DATA wave.

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 20

NCL Gates

The desired hysteresis requires an NCL gate to

hold

its output until   all inputs are DATA or all inputs are NULL need storage capability (or feedback loop) even in combi national gate X1.a

X1.b

X2.a

X2.b

X1 X2 Mem Y Mem Y.a

Y.b

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 21

NCL Gate Implementation

figure shown for one output rail only p- and n-stack not dual X1.a

X1.b

X2.a

X2.b

X1 X2 Mem Y Mem Y.a

Y.b

memory cell at output CMOS-Transistors only but no standard cells [G. Sobelmann, K. Fant: CMOS Circuit Design of Threshold Gates with Hysteresis] © A. Steininger & M. Delvai / TU Vienna 22

The Charme of NCL

    self-regulating data flow   in a NULL initialized circuit a DATA front will propagate towards the output alternating waves of NULL and DATA pace the data flow (which, in some sense, forms the „clock“) based on direct assessment of validity & consistency no delay assumptions necessary (ideally), no „worst case“, … globally applicable solution Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 23

Validity and Consistency

Consistency

(multiple bits @ input)  all bits that are combined are valid and belong to the same context 

Validity

(single bit @ output)  the bit is the stable result of a combination of consistent bits Consistency implies validity (per definition) but NOT vice versa!

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 24

Val. & Consistcy. in NCL

  Validity:    output is changed only when consistent input is available („hold“ in truth table) coding ensures direct transistion from valid code to another (NULL is valid but spacer only) continuous validity Consistency:   NULL spacer between DATA waves allows identification of context synchronization of context by virtue of „completeness of NULL“ condition Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 25

What about sync. & BD?

   Timing ensures that every data item is both valid and consistent at the time it is used:   choice of clock period (sync) choice of delay values (BD) In contrast to NCL (temporary) invalidity of data is admitted.

No explicit measures (other than timing) are taken/necessary to cope with these issues.

26 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

recal l

Softening the restrictions

 synchronous model  known bounds for delays, global timing  bounded delay model (fundamental)  known bounds for absolute delays, local timing  scalable-delay-insensitive model  bounds for relative deviation between delays known  quasi-delay-insensitive  output paths of a fork have same delay  delay insensitive  no restrictions on delays (just finite) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 27

NCL: A Brief Summary

        validity & consistency

directly

visible

no timing assumptions required

„delay insensitive“ (ideally) (ideally) suitable for CMOS implementation coding of one bit on two rails 2 memory cells per combinational output efficiency: 50% of the data flow are unproductive NULL waves patented und industrially used Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 28

recal l

Our Options

  We must only use consistent input vectors How can we tell an input vector is consistent?

(1) use TIME to mark consistent phases   synchronous approach / global time base asynchronous/bounded delay (2) use CODING to add information  asynchronous/delay insensitive 29 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Conditions for DI Coding

(C1) Identification of every context switch It must be possible to clearly separate two successive data words under all circumstances (C2) Unique context membership The transition from one valid code word to the next must be unambiguous, i.e. no intermediate state may be a valid code Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 30

Conditions for DI coding

(C1) Identification of every context switch It must be possible to clearly separate two successive data words under all circumstances 0,0,0 ?

0,0,0 31 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Conditions for DI coding

(C1) Identification of every context switch It must be possible to clearly separate two successive data words under all circumstances (C2) Unique context membership The transition from one valid code word to the next must be unambiguous, i.e. no intermediate state may be a valid code Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 32

Conditions for DI coding

0,0,0 1,0,0 1,0,1 ?

1,1,1 (C2) Unique context membership The transition from one valid code word to the next must be unambiguous, i.e. no intermediate state may be a valid code 33 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

What about NCL‘s Coding

 (C1) Return to NULL forces separation between successive data waves  (C2) Coding scheme guarantees direct switch from one legal value to next (only one rail changes!)

Signal X X.a

X.b

0 0 1 1 0 1 0 1

value

NULL TRUE FALSE illegal Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 34

Synchronization of Waves

A B Y 0 0 0 0 1 0 1 0 0 1 1 1 no glitch!

Lecture "Advanced Digital Design" A B & N 0 N 0 N © A. Steininger & M. Delvai / TU Vienna successive „0“s clearly separable 35 Y

NCL vs. Trans. Signaling

 A 0 Transition Signaling A=0 A=1 A=1 A 1  A 0 NULL-Convention Logic A=0 A=1 A=1 A 1 A=1 A=0 A=1 A=0 36 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

More Efficient Coding?

    NCL employs a 4-phase (RTZ) version of transition signaling.

The „return to zero“ is due to the NULL waves.

The NULL waves are unproductive and hence undesired.

Can we employ 2-phase (NRZ) transition signaling instead?

37 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Four-State Logic (FSL)

 Use 2 codes per logic value two-rail coding: X X.a

X.b

38 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

FSL Flow Control

 Alternate code sets („phase“) NCL FSL t konsistent phase j 0  Completion detection: Check whether all bits belong to the same phase © A. Steininger & M. Delvai / TU Vienna 39

FSL AND-Gate: Truth Table

IN_1

l

Y

h L H l l l * * l h

IN_2

L h * * * * L L * … hold last valid output H * * L H 40 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Four-State Logic (FSL)

 An FSL gate

holds

its output until all inputs are in the same phase need storage capability (or feedback loop) even in combi national gate X1.a

X1.b

X2.a

X2.b

X1 X2 Mem Y Mem Y.a

Y.b

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 41

FSL and Code Conditions

 (C1) Phase change forces separation between successive data waves  (C2) Coding scheme guarantees direct switch from one legal value in one phase to legal value in next phase (only one rail changes!) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 42

Synchronization of Waves

A B Y 0 0 0 0 1 0 1 0 0 1 1 1 A B & Y no glitch!

F 0 F 1 © A. Steininger & M. Delvai / TU Vienna successive „0“s clearly separable 43

FSL: A Brief Summary

  FSL retains all the charme of NCL FSL provides double data throughput  implementation of 4-phase scheme tends to require more efforts (remains to be investigated) 44 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Evaluation of Properties

Bounded Delay & Delay Insensitive Asynchronous Design Methods

recal l

Ideal Design Method An ideal design method …

 minimizes power consumption  miminizes circuit overhead  naturally supports composability  naturally aids testability  yields robust circuits  yields fast circuits.

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 46

! Area Efficiency BD

area proportion devoted to intended logic function

E area

A F A F

A Ctrl

 100 %  0 (handshake logic negligible) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 47

! Area Efficiency NCL

overheads for   flow control (Micropipeline) two-rail coding  storage cells sum up to about   500% A

F

with standard cells 100%...200% A

F

with custom cells (*)

E area

 Lecture "Advanced Digital Design"

A F A F

A Ctrl

 50 ...

33 % * [Smith & Ligthart ASP-DAC 2001] © A. Steininger & M. Delvai / TU Vienna 48

! Area Efficiency FSL

overheads similar to NCL but NULL state tends to result in more conveni ent implementation than second phase rough estimation:   600% A

F

with standard cells 150%...250% A

F

with custom cells

E area

A F A F

A Ctrl

 40 ...

29 % 49 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Area Efficiency - Comparison Area Efficiency

sync.

50% BD 100% NCL FSL 50…33% 40…29% 50 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Area Efficiency: Summary

   Asynchronous circuits save the need for the clock network, but require (relatively little) area for handshaking.

In addition DI circuits cause substantial circuit overheads for coding and completion detection.

These overheads outweigh the savings for the clock tree, hence BD circuits promise the most area savings (the delay, however, is diffcult to implement) 51 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

! Power Efficiency BD

dynamic part dissipated power (total) power for intended function

P tot

 ( 1 

) 

P F

static part  ( 1  0 .

1 ) 

P F

circuit utilization   90 %

E pwr

Lecture "Advanced Digital Design"  1 0 .

1   0 .

99 assumption: handshaking increases dynamic power by 10% 92 % (

 100 %) 50 % 10 

© A. Steininger & M. Delvai / TU Vienna (

(

 10 %)  0 ) 52

! Power Efficiency NCL

logic overhead (  2.5) coding overhead (2 trans/bit instead of 0.5 => 4x)

P tot

 ( 1 

) 

P F

 ( 1  0 .

1 ) 

P F

  90 %

E pwr

 Lecture "Advanced Digital Design" 1 0 .

25   9 .

9 assumption: handshaking plus completion detection increase dyn. pwr by 10% 10 % 8 % 4 

© A. Steininger & M. Delvai / TU Vienna (

(

 100 %)  10 %) (

 0 ) 53

! Power Efficiency FSL

logic overhead (  3) coding overhead (1 trans/bit instead of 0.5 => 2x)

P tot

 ( 1 

) 

P F

 ( 1  0 .

1 ) 

P F

  90 %

E pwr

 Lecture "Advanced Digital Design" 1 0 .

3   5 .

94 assumption: handshaking plus completion detection increase dyn. pwr by 10% 16 % 11 % 3 .

3 

© A. Steininger & M. Delvai / TU Vienna (

 100 %) (

(

 10 %)  0 ) 54

Pwr Efficiency - Comparison 

=100% sync. 53%

=10% 5.3% BD 92% 50% NCL FSL 10% 16% 8% 11%

 0%

0.53

10

4

3.3

55 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Pwr Efficiency: Summary

   Asynchronous circuits save the power consumed by the clock network, and require less power for handshaking.

The DI circuits‘ additional transitions plus their substantial circuit overheads increase energy consumption.

In summary the DI overheads outweigh the savings, hence BD methods are most effective for low-power applications.

56 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

! Perform. Efficiency BD

E perf

t F t

F

1 

t F

 50 % 100 Lecture "Advanced Digital Design" 50 20 © A. Steininger & M. Delvai / TU Vienna 30 10 20 [Cortadella, ICCD’04] 57

! Perform. Efficiency NCL

 no safety margins necessary but additional delay t

dly

 for: ACK path (feedback required!)   completion detection additional circuit complexity and NULL waves halve throughput

E perf

Lecture "Advanced Digital Design"  2 (

t F t F

t dly

)  50 % (  50 %) * © A. Steininger & M. Delvai / TU Vienna *[Cortadella, ICCD’04] 58

! Perform. Efficiency FSL

 everything like in NCL but:  double throughput

E perf

t F t

F t dly

 100 % Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna *[Cortadella, ICCD’04] 59

Perf. Effic. - Comparison Perf. Efficiency

sync.

44% BD < 50% NCL FSL < 50 % < 100% 60 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

! Composability

     BD is similar to the synchronous case: Delays must be adjusted just like the clock DI circuits work under all conditions function (plus handshake protocol).

, interface spec can be reduced to the If a certain execution time /performance must be guaranteed, however, timing analysis is again necessary. BUT: Even if the operation is too slow for any reason, the circuit will not fail to operate!

No metastability issues! Instead handshake required at all interfaces.

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 61

How DI is DI?

 Basic cells are internally SDI (at best)  Obligatory feedback  On module level DI is attainable, but   Inevitable (?) fork makes ACK path „unsafe“ Reg design critical Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna & 62

! Robustness

  All async techniques: timing is distributed, clock no more single point of failure     DI only: robust timing due to closed-loop control graceful degratation in case of violation multi-rail coding of signals   complexity of interacting control loops larger area 63 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Syn versus FSL

syn Fault injection FSL Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna [Thesis Rahbaran] 64

! Testability

 Scan chain is an extremely powerful concept; hard to beat   Asynchronous circuits (including BD) are said to be much harder to test Only punctual concepts and ad-hoc solutions available 65 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Further Properties

 „Correctness by design“ (DI only)  Beneficial EMR behavior  Conceptual elegance (DI only)  Readiness for future technologies (quantum, bio, optical, …) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 66

Gain of Delay Insensitive

    need to determine clock period circuit functionality is technology dependent considerable design efforts, large design loops     need to make worst-case assumptions necessarily pessimistic no robustness wrt. exceeding them     need to maintain global synchrony clock distribution problems power consumption problems 67 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Current status

 working 16-bit processor „ASPEAR“ (on FPGA platform) in FSL    working design flow based on Synopsys formal investigation of delay insensitivity (Modelchecking) experimental comparison of robustness:SPEAR versus ASPEAR 68 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna

Our visions

    autonomous sensor node     no crystal oscillator UART-like communication low power (by diverse means) high robustness (harsh environments) develop „tailored“ lib cells / ASIC delay insensitive memory (exp.) comparison with other approaches Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 69

Conclusion for asyn:

  benefits for     low activation Applications (  ) high robustness technologies with unknown timing largely varying operatring conditions not good for     small feature sizes (static current!) low area real-time high speed (?) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 70