Transcript Delay Insensitive Methods - Vienna University of Technology
Advanced Digital Design Asynchronous Design: DI Methods
by A. Steininger and M. Delvai Vienna University of Technology
Outline
Delay Insensitive design - principle NULL-Convention Logic Code conditions for DI logic Four-State Logic Evaluation of async. design styles Bundled Data NULL-Convention Logic Four-State Logic Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 2
recal l
Asynchronous Philosophy
„The control flow requires agreement between source and sink. For this purpose they need to communicate“ Source indicates capture condition for sink.
Sink indicates issue condition for source.
„HANDSHAKE“
3 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
recal l
Handshake Principle
REQ: „Data word valid, you can use it“
When can SNK use its input?
When it is valid and consistent
SRC f(x) SNK
When can SRC apply the next input?
When SNK has consumed the previous one
ACK: „Data word consumed, send the next“
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 4
recal l
A very Important Detail
The handshake establishes a
closed-loop control
for the data flow between sender and receiver This makes operation more robust than in the synchronous (= open-loop) case The art of asynchronous design is to make many of these closed loops interoperate properly This is much more complicated than a synchronous design.
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 5
Very disappointing…
For a closed loop we need to
measure
the quantity of interest So far we have not done that: We have not measured validity & consistency We have used time as an
indirect
measure instead Thus Bounded Delay methods do not provide the benefits of a closed loop BUT: Can we measure validity & consistency at all?
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 6
recal l
Criticality of ACK
SRC f(x)
„latch!“ cannot measure „act of latching“ as an event use latching command instead fork produces race between trigger process and next data wave race is uncritical (but still exists!) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 7
recal l
Criticality of REQ
SRC f(x) SNK
cannot use issue trigger as an event: produces unacceptable race between data and REQ must introduce timer (bounded delay) OR: find better event (downstream) completion detection Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 8
Completion Detection
In order to judge when data are valid & consistent we need to be able to see when this is NOT the case not possible with Boolean logic
need representation for INVALID
an ACK in parallel to data (bundled data) will always cause a race need more than two signal states for every individual bit (!) need more than one rail per bit Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 9
Multi-level Logic
use more than two (e.g. three) voltage levels per rail allows to express „invalid“ in the currently „forbidden“ area between HI and „LO“ requires two thresholds for every gate input output must be able to drive three different levels reliably causes substantial technological problems not further pursued Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 10
recal l
Our Options
We must only use consistent input vectors How can we tell an input vector is consistent?
(1) use TIME to mark consistent phases synchronous approach / global time base asynchronous/bounded delay (2) use CODING to add information asynchronous/delay insensitive 11 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
recal l
Terminology
consistent DW : all bits belong to the same context valid signal : result of function applied to consistent DW Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 12
NULL Convention Logic
Add the value NULL to the alphabet
Signal X X.a X.b
0 0 0 1 1 0
meaning
NULL (N) TRUE (T) FALSE (F) 1 1 illegal two-rail coding: X „DATA“ X.a
X.b
13 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
NCL Functions
AND
T F T T F N N F F F N N N N N
OR
T F T T T F T F N N N N N N N
naive approach: if any input is „N“ then output „N“
NOT
T F F T N N
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 14
NCL Flow Control
NULL waves enframe DATA waves NULL NULL NULL NULL TRUE TRUE FALSE TRUE NULL TRUE NULL NULL TRUE FALSE NULL FALSE consistent DATA Completion detection = check wether all bits are „DATA“ (completeness of DATA) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 15 t
Still Problems …
What about this situation?
output NULL NULL NULL NULL NULL DATA TRUE FALSE TRUE NULL NULL TRUE NULL TRUE NULL DATA TRUE FALSE consistent DATA Fast bits may catch up with a slow bit from the previous word. The word containing
the „old“ bit is considered consistent!
t Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 16
Solution Principle
Enforce „completeness of NULL“ as well: The output must not go to NULL before all inputs have changed to NULL In a closed loop configuration this keeps the slow paths in synchrony with the fast ones We need different truth table when output is NULL 17 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Two Truth Tables
for DATA waves
AND
T F N T T F N F F F N N N N N
for NULL waves
AND
T F N T T F D F F F N D D D N
D … DATA (T or F) must hold output in last valid state before new input is complete need „hysteresis“ need to consider current output in truth table Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 18
Feedback Gate
A B Y
T F FN &
Y‘ Y A B Y‘
N N T F N F T F N T T F N T F N T F N T F N T F N T F N T F N T F N T F N T F N N N N T F N F F N F F F F F F F F N T T T T T T T T unstable (Y Y‘) © A. Steininger & M. Delvai / TU Vienna 19
No more Problems …
Have we solved the problem?
output NULL NULL NULL TRUE DATA NULL TRUE NULL FALSE NULL NULL TRUE NULL consistent DATA t YES! The output now remains at DATA with the slowest bit, thus inhibiting (via the closed loop) the fast bits to convey the next DATA wave.
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 20
NCL Gates
The desired hysteresis requires an NCL gate to
hold
its output until all inputs are DATA or all inputs are NULL need storage capability (or feedback loop) even in combi national gate X1.a
X1.b
X2.a
X2.b
X1 X2 Mem Y Mem Y.a
Y.b
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 21
NCL Gate Implementation
figure shown for one output rail only p- and n-stack not dual X1.a
X1.b
X2.a
X2.b
X1 X2 Mem Y Mem Y.a
Y.b
memory cell at output CMOS-Transistors only but no standard cells [G. Sobelmann, K. Fant: CMOS Circuit Design of Threshold Gates with Hysteresis] © A. Steininger & M. Delvai / TU Vienna 22
The Charme of NCL
self-regulating data flow in a NULL initialized circuit a DATA front will propagate towards the output alternating waves of NULL and DATA pace the data flow (which, in some sense, forms the „clock“) based on direct assessment of validity & consistency no delay assumptions necessary (ideally), no „worst case“, … globally applicable solution Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 23
Validity and Consistency
Consistency
(multiple bits @ input) all bits that are combined are valid and belong to the same context
Validity
(single bit @ output) the bit is the stable result of a combination of consistent bits Consistency implies validity (per definition) but NOT vice versa!
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 24
Val. & Consistcy. in NCL
Validity: output is changed only when consistent input is available („hold“ in truth table) coding ensures direct transistion from valid code to another (NULL is valid but spacer only) continuous validity Consistency: NULL spacer between DATA waves allows identification of context synchronization of context by virtue of „completeness of NULL“ condition Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 25
What about sync. & BD?
Timing ensures that every data item is both valid and consistent at the time it is used: choice of clock period (sync) choice of delay values (BD) In contrast to NCL (temporary) invalidity of data is admitted.
No explicit measures (other than timing) are taken/necessary to cope with these issues.
26 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
recal l
Softening the restrictions
synchronous model known bounds for delays, global timing bounded delay model (fundamental) known bounds for absolute delays, local timing scalable-delay-insensitive model bounds for relative deviation between delays known quasi-delay-insensitive output paths of a fork have same delay delay insensitive no restrictions on delays (just finite) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 27
NCL: A Brief Summary
validity & consistency
directly
visible
no timing assumptions required
„delay insensitive“ (ideally) (ideally) suitable for CMOS implementation coding of one bit on two rails 2 memory cells per combinational output efficiency: 50% of the data flow are unproductive NULL waves patented und industrially used Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 28
recal l
Our Options
We must only use consistent input vectors How can we tell an input vector is consistent?
(1) use TIME to mark consistent phases synchronous approach / global time base asynchronous/bounded delay (2) use CODING to add information asynchronous/delay insensitive 29 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Conditions for DI Coding
(C1) Identification of every context switch It must be possible to clearly separate two successive data words under all circumstances (C2) Unique context membership The transition from one valid code word to the next must be unambiguous, i.e. no intermediate state may be a valid code Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 30
Conditions for DI coding
(C1) Identification of every context switch It must be possible to clearly separate two successive data words under all circumstances 0,0,0 ?
0,0,0 31 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Conditions for DI coding
(C1) Identification of every context switch It must be possible to clearly separate two successive data words under all circumstances (C2) Unique context membership The transition from one valid code word to the next must be unambiguous, i.e. no intermediate state may be a valid code Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 32
Conditions for DI coding
0,0,0 1,0,0 1,0,1 ?
1,1,1 (C2) Unique context membership The transition from one valid code word to the next must be unambiguous, i.e. no intermediate state may be a valid code 33 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
What about NCL‘s Coding
(C1) Return to NULL forces separation between successive data waves (C2) Coding scheme guarantees direct switch from one legal value to next (only one rail changes!)
Signal X X.a
X.b
0 0 1 1 0 1 0 1
value
NULL TRUE FALSE illegal Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 34
Synchronization of Waves
A B Y 0 0 0 0 1 0 1 0 0 1 1 1 no glitch!
Lecture "Advanced Digital Design" A B & N 0 N 0 N © A. Steininger & M. Delvai / TU Vienna successive „0“s clearly separable 35 Y
NCL vs. Trans. Signaling
A 0 Transition Signaling A=0 A=1 A=1 A 1 A 0 NULL-Convention Logic A=0 A=1 A=1 A 1 A=1 A=0 A=1 A=0 36 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
More Efficient Coding?
NCL employs a 4-phase (RTZ) version of transition signaling.
The „return to zero“ is due to the NULL waves.
The NULL waves are unproductive and hence undesired.
Can we employ 2-phase (NRZ) transition signaling instead?
37 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Four-State Logic (FSL)
Use 2 codes per logic value two-rail coding: X X.a
X.b
38 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
FSL Flow Control
Alternate code sets („phase“) NCL FSL t konsistent phase j 0 Completion detection: Check whether all bits belong to the same phase © A. Steininger & M. Delvai / TU Vienna 39
FSL AND-Gate: Truth Table
IN_1
l
Y
h L H l l l * * l h
IN_2
L h * * * * L L * … hold last valid output H * * L H 40 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Four-State Logic (FSL)
An FSL gate
holds
its output until all inputs are in the same phase need storage capability (or feedback loop) even in combi national gate X1.a
X1.b
X2.a
X2.b
X1 X2 Mem Y Mem Y.a
Y.b
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 41
FSL and Code Conditions
(C1) Phase change forces separation between successive data waves (C2) Coding scheme guarantees direct switch from one legal value in one phase to legal value in next phase (only one rail changes!) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 42
Synchronization of Waves
A B Y 0 0 0 0 1 0 1 0 0 1 1 1 A B & Y no glitch!
F 0 F 1 © A. Steininger & M. Delvai / TU Vienna successive „0“s clearly separable 43
FSL: A Brief Summary
FSL retains all the charme of NCL FSL provides double data throughput implementation of 4-phase scheme tends to require more efforts (remains to be investigated) 44 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Evaluation of Properties
Bounded Delay & Delay Insensitive Asynchronous Design Methods
recal l
Ideal Design Method An ideal design method …
minimizes power consumption miminizes circuit overhead naturally supports composability naturally aids testability yields robust circuits yields fast circuits.
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 46
! Area Efficiency BD
area proportion devoted to intended logic function
E area
A F A F
A Ctrl
100 % 0 (handshake logic negligible) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 47
! Area Efficiency NCL
overheads for flow control (Micropipeline) two-rail coding storage cells sum up to about 500% A
F
with standard cells 100%...200% A
F
with custom cells (*)
E area
Lecture "Advanced Digital Design"
A F A F
A Ctrl
50 ...
33 % * [Smith & Ligthart ASP-DAC 2001] © A. Steininger & M. Delvai / TU Vienna 48
! Area Efficiency FSL
overheads similar to NCL but NULL state tends to result in more conveni ent implementation than second phase rough estimation: 600% A
F
with standard cells 150%...250% A
F
with custom cells
E area
A F A F
A Ctrl
40 ...
29 % 49 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Area Efficiency - Comparison Area Efficiency
sync.
50% BD 100% NCL FSL 50…33% 40…29% 50 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Area Efficiency: Summary
Asynchronous circuits save the need for the clock network, but require (relatively little) area for handshaking.
In addition DI circuits cause substantial circuit overheads for coding and completion detection.
These overheads outweigh the savings for the clock tree, hence BD circuits promise the most area savings (the delay, however, is diffcult to implement) 51 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
! Power Efficiency BD
dynamic part dissipated power (total) power for intended function
P tot
( 1
)
P F
static part ( 1 0 .
1 )
P F
circuit utilization 90 %
E pwr
Lecture "Advanced Digital Design" 1 0 .
1 0 .
99 assumption: handshaking increases dynamic power by 10% 92 % (
100 %) 50 % 10
© A. Steininger & M. Delvai / TU Vienna (
(
10 %) 0 ) 52
! Power Efficiency NCL
logic overhead ( 2.5) coding overhead (2 trans/bit instead of 0.5 => 4x)
P tot
( 1
)
P F
( 1 0 .
1 )
P F
90 %
E pwr
Lecture "Advanced Digital Design" 1 0 .
25 9 .
9 assumption: handshaking plus completion detection increase dyn. pwr by 10% 10 % 8 % 4
© A. Steininger & M. Delvai / TU Vienna (
(
100 %) 10 %) (
0 ) 53
! Power Efficiency FSL
logic overhead ( 3) coding overhead (1 trans/bit instead of 0.5 => 2x)
P tot
( 1
)
P F
( 1 0 .
1 )
P F
90 %
E pwr
Lecture "Advanced Digital Design" 1 0 .
3 5 .
94 assumption: handshaking plus completion detection increase dyn. pwr by 10% 16 % 11 % 3 .
3
© A. Steininger & M. Delvai / TU Vienna (
100 %) (
(
10 %) 0 ) 54
Pwr Efficiency - Comparison
=100% sync. 53%
=10% 5.3% BD 92% 50% NCL FSL 10% 16% 8% 11%
0%
0.53
10
4
3.3
55 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Pwr Efficiency: Summary
Asynchronous circuits save the power consumed by the clock network, and require less power for handshaking.
The DI circuits‘ additional transitions plus their substantial circuit overheads increase energy consumption.
In summary the DI overheads outweigh the savings, hence BD methods are most effective for low-power applications.
56 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
! Perform. Efficiency BD
E perf
t F t
F
1
t F
50 % 100 Lecture "Advanced Digital Design" 50 20 © A. Steininger & M. Delvai / TU Vienna 30 10 20 [Cortadella, ICCD’04] 57
! Perform. Efficiency NCL
no safety margins necessary but additional delay t
dly
for: ACK path (feedback required!) completion detection additional circuit complexity and NULL waves halve throughput
E perf
Lecture "Advanced Digital Design" 2 (
t F t F
t dly
) 50 % ( 50 %) * © A. Steininger & M. Delvai / TU Vienna *[Cortadella, ICCD’04] 58
! Perform. Efficiency FSL
everything like in NCL but: double throughput
E perf
t F t
F t dly
100 % Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna *[Cortadella, ICCD’04] 59
Perf. Effic. - Comparison Perf. Efficiency
sync.
44% BD < 50% NCL FSL < 50 % < 100% 60 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
! Composability
BD is similar to the synchronous case: Delays must be adjusted just like the clock DI circuits work under all conditions function (plus handshake protocol).
, interface spec can be reduced to the If a certain execution time /performance must be guaranteed, however, timing analysis is again necessary. BUT: Even if the operation is too slow for any reason, the circuit will not fail to operate!
No metastability issues! Instead handshake required at all interfaces.
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 61
How DI is DI?
Basic cells are internally SDI (at best) Obligatory feedback On module level DI is attainable, but Inevitable (?) fork makes ACK path „unsafe“ Reg design critical Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna & 62
! Robustness
All async techniques: timing is distributed, clock no more single point of failure DI only: robust timing due to closed-loop control graceful degratation in case of violation multi-rail coding of signals complexity of interacting control loops larger area 63 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Syn versus FSL
syn Fault injection FSL Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna [Thesis Rahbaran] 64
! Testability
Scan chain is an extremely powerful concept; hard to beat Asynchronous circuits (including BD) are said to be much harder to test Only punctual concepts and ad-hoc solutions available 65 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Further Properties
„Correctness by design“ (DI only) Beneficial EMR behavior Conceptual elegance (DI only) Readiness for future technologies (quantum, bio, optical, …) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 66
Gain of Delay Insensitive
need to determine clock period circuit functionality is technology dependent considerable design efforts, large design loops need to make worst-case assumptions necessarily pessimistic no robustness wrt. exceeding them need to maintain global synchrony clock distribution problems power consumption problems 67 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Current status
working 16-bit processor „ASPEAR“ (on FPGA platform) in FSL working design flow based on Synopsys formal investigation of delay insensitivity (Modelchecking) experimental comparison of robustness:SPEAR versus ASPEAR 68 Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna
Our visions
autonomous sensor node no crystal oscillator UART-like communication low power (by diverse means) high robustness (harsh environments) develop „tailored“ lib cells / ASIC delay insensitive memory (exp.) comparison with other approaches Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 69
Conclusion for asyn:
benefits for low activation Applications ( ) high robustness technologies with unknown timing largely varying operatring conditions not good for small feature sizes (static current!) low area real-time high speed (?) Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 70