Advanced Digital Design - Vienna University of Technology

Download Report

Transcript Advanced Digital Design - Vienna University of Technology

Advanced Digital Design
Asynchronous EDA
by A. Steininger, J. Lechner and R. Najvirt
Vienna University of Technology
Overview




Synchronous-Asynchronous Direct
Translation (SADT)
Null Convention Logic
Syntax Directed Compilation (Balsa)
Martin Synthesis (Caltech
Asynchronous Synthesis Tools)
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
2
Synchronous-Asynchronous
Direct Translation (SADT)






Starting point: synchronous circuit
description in a standard HDL
Synthesis with conventional tools into
sync. gate-level netlist
Transformation of synchronous netlist
into asynchronous netlist
Technology mapping
Place and Route
Timing Verification
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
3
De-synchronization





SADT approach
Design style: Bundled data
Substitution of flip-flops by latches
Substitution of clock by local
asynchronous controllers
De-synchronized circuits ...


never halt (liveness)
perform same computations as
synchronous circuit (flow-equivalence)
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
4
De-synchronization
Conversion steps
1.
Conversion of Flip-flops to latches

2.
Generation of delays elements for
request signals

3.
D-FF separated into master/slave latches
matched to length of critical path of
combinational logic
Implementation and wiring of
asynchronous latch controllers
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
5
De-synchronization
Circuit Architecture
Synchronous circuit
De-synchronized circuit
[Cortadella et al., 06]
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
6
De-synchronization
Asynchronous Controllers

Controller for master/slave latches


Different controller implementations
with more or less concurrency possible






4-phase protocol
Non-overlapping
Semi-decoupled 4-phase
Fully-decoupled 4-phase
De-synchronization control
More concurrency => fast pipeline
More concurrency => larger controllers
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
7
De-synchronization
Flow Equivalence

Definition: Two circuits are flowequivalent if they ...


have the same set of latches
For each latch, the sequence of stored
values is the same in both circuits
[Cortadella et al., 06]
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
8
De-synchronization
Pros/Cons

Advantages






Use of standard HDLs
Use of industrial-strength synthesis tools
Almost no re-education for hardware
designers necessary
Simple porting of legacy designs
Negligible area overhead compared to
synchronous implementation
Disadvantages

1-to-1 mapping of sync. circuits can lead to
sub-optimal designs
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
9
Click Elements





Published as an implementation style
for data-driven compilation (Haste)
Also useful for implementing
asynchronous equivalents of
synchronous circuits
Uses flip-flops for storage
Most elements implementable with
cells from a standard (sync) library
Arbiter still required (not for SADT)
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
10
Click Elements
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
11
Null Convention Logic
Synthesis

RTL Synthesis

Transform VHDL/Verilog to 3NCL netlist


Off-the-shelf synthesis tools



Netlist contains just AND & INV gates
NULL values are treated as “don’t care”
Logic optimizations
Dual-rail expansion


3NCL netlist to 2NCL netlist
DIMS implementation of AND & INV gates


Produces a delay-insenstive circuit
Logic optimizations
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner / TU Vienna
12
Dual Rail NAND
DIMS implementation
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner / TU Vienna
[Ligthart et al.,
2000]
13
Null Convention Logic
Technology Mapping


DIMS implementation inefficient
Techn. mapping on threshold gates




Circuit functionality fully described by set
function of DIMS implementation
DIMS smoothing: Derive boolean network
representing set function
Threshold gates have specific set function
Perform logic optimization and map
boolean network to available threshold
gates
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner / TU Vienna
14
Dual Rail NAND
[Ligthart et al.,
2000]
DIMS implementation
Lecture "Advanced Digital Design"
Set function
© A. Steininger & J. Lechner / TU Vienna
15
Null Convention Logic
Threshold Gates

Library of threshold gates by Theseus

all unate functions with up to 4 inputs
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner / TU Vienna
16
Syntax-Directed Compilation



1-to-1 mapping of language
constructs to handshake circuit
components
Uses a library of highly optimized
standard cell components for simpler
physical synthesis and verification
Allows experienced designer to easily
envision the resulting circuit but limits
optimization potential
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
17
Balsa
Handshake Circuits


Approx. 40 handshake components
Connected over channels





Push channel


Data path associated
Pure control channels (no data transferred)
Active ports initiate communication
Passive ports respond to request
Data flow from active to passive port
Pull channel

Data flow from passive to active port
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner / TU Vienna
18
Example: Handshake
Components

Fetch ()


Transfers data upon request
Case (@)

Conditional control flow element
Source: [Balsa Manual]
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner / TU Vienna
19
Example:
Modulo-10 Counter
import [balsa.types.basic]
type C_size is nibble
constant max_count = 9
procedure count10(sync aclk; output count: C_size) is
variable count_reg : C_size
variable tmp : C_size
begin
loop
sync aclk;
if count_reg /= max_count then
tmp := (count_reg + 1 as C_size)
else
tmp := 0
end || count <- count_reg ;
count_reg := tmp
end -- loop
end -- begin
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner / TU Vienna
20
Example:
Modulo-10 Counter
Source: [Balsa Manual]
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner / TU Vienna
21
Martin synthesis



The so-called Martin synthesis process
is seminal work of the async group
around A. J. Martin at Caltech
Design entry is CHP, result is PRS
Performs several transformations with
designer modifiable intermediate steps
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
22
Process Decomposition



First transformation
Reduces processes with complex
control structures to simple concurrent
subprocesses
Either syntax-directed (SDD) or datadriven (DDD)
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
24
Syntax Directed Decomposition
Rule: A process P with construct S can be
replaced with processes P1, P2 and a new
channel C by replacing S with the
communication C and creating P2 of the
form *[[#C -> S; C]]
E.g. P: *[A; *[B1 -> S1 [] B2 -> S2]; B]
P1: *[A; C; B]
P2: *[[#C & B1 -> S1
[]#C & B2 -> S2
[]#C & ~B1 & ~B2 -> C]]
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
25
Data Driven Decomposition



More fine-grained than SDD
At the end, clustering can be
performed to merge subprocesses
again for better performance
First transformation to dynamic single
assignment (DSA) form:
Each variable can be written only once in
each main loop iteration, e.g.:
*[A?a; X!a; B?a; Y!a]
*[A?a1; X!a1; B?a2; Y!a2]
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
26
Data Driven Decomposition (2)


Second transformation is projection
First, transformations to allow projection
e.g. variable duplication and channel
addition:
*[A?a; x := a, y := ~a; X!x, Y!y]
*[A?a; a1 := a, a2 := a; x := a1, y := ~a2; X!x, Y!y]
*[A?a; {Ax!a, Ax?a1}, {Ay!a, Ay?a2};
x := a1, y := ~a2; X!x, Y!y]

Then projection to some sets of
assignments
Sets: {A?, a, Ax!, Ay!} {Ax?, a1, x, X!} {Ay?, a2, y, Y!}
Projection: *[A?a; Ax!a, Ay!a],
*[Ax?a1; x := a1; X!x], *[Ay?a2; y := ~a2; Y!y]
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
27
Handshake Expansion (HSE)

Each communication channel is
replaced by handshake signals, e.g.:
*[…; C; …], *[#C -> …; C]
is transformed to (4-phase handshake)
*[…; r := 1; [a]; r := 0; [~a]; …],
*[r -> …; a := 1; [~r]; a := 0]

Reshuffling can then be used to
increase concurrency/performance
(different handshake controllers)
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
28
Production Rule Expansion (PRE)

Transforms HSE to PR in three steps:




State variable insertion
PR generation
Symmetrisation
Sequencing must be implemented
explicitly
*[[Lr]; Rr := 1; [Ra]; Rr := 0; [~Ra];
La := 1; [~Lr]; La := 0]
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
Lr
Ra
~Ra
~Lr
->
->
->
->
Rr+
RrLa+
La-
29
Production Rule Expansion (PRE)

Transforms HSE to PR in three steps:




State variable insertion
PR generation
Symmetrisation
Sequencing must be implemented
explicitly
*[[Lr]; Rr := 1; [Ra]; Rr := 0; [~Ra];
~x & Lr
La := 1; [~Lr]; La := 0]
Ra
*[[Lr]; Rr := 1; [Ra]; x := 1; [x];
x
Rr := 0; [~Ra]; La := 1; [~Lr];
x & ~Ra
x := 0; [~x]; La := 0]
~Lr
~x
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
->
->
->
->
->
->
Rr+
x+
RrLa+
xLa-
30
Production Rule Expansion (PRE)

Transforms HSE to PR in three steps:




State variable insertion
PR generation
Symmetrisation
Sequencing must be implemented
explicitly
*[[Lr]; Rr := 1; [Ra]; Rr := 0; [~Ra];
~x & Lr
La := 1; [~Lr]; La := 0]
Ra
*[[Lr]; Rr := 1; [Ra]; x := 1; [x];
~Lr | x
Rr := 0; [~Ra]; La := 1; [~Lr];
x & ~Ra
x := 0; [~x]; La := 0]
~Lr
Ra | ~x
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
->
->
->
->
->
->
Rr+
x+
RrLa+
xLa-
31
Summary

Synchronous-Asynchronous Direct
Translation



Synthesis with standard tools
Syncronous-Asynchronous transformation
Martin Synthesis



Process decomposition
Handshake expansion
Production rule expanstion
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
32
References




Jordi Cortadella, Alex Kondratyev, Luciano Lavagno,
Christos P. Sotiriou. Desynchronization: Synthesis
of Asynchronous Circuits From Synchronous
Specifications. 2006
Alain J. Martin. Programming in VLSI: From
Communicating Processes to Self-timed VLSI
Circuits. 1987
Catherine G. Wong and Alain J. Martin. High-Level
Synthesis of Asynchronous Systems by DataDriven Decomposition. 2003
Ad Peeters, Frank te Beest, Mark de Wit, Willem
Mallon. Click Elements – An Implementation
Style for Data-Driven Compilation. 2010
Lecture "Advanced Digital Design"
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
33