Introduction to Silicon Programming in the Tangram/Haste language

Download Report

Transcript Introduction to Silicon Programming in the Tangram/Haste language

Introduction to Silicon Programming
in the Tangram/Haste language
Material adapted from lectures by:
Prof.dr.ir Kees van Berkel
[Dr. Johan Lukkien]
[Dr.ir. Ad Peeters]
at the Technical University of Eindhoven, the Netherlands
TU/e
Handshake signaling and data
request ar
push channel
active
side
acknowledge ak
passive
side
data ad
versus
request ar
pull channel
active
side
acknowledge ak
passive
side
data ad
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
2
TU/e
Handshake signaling: push channel
time
req ar
ack ak
early ad
broad ad
late ad
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
3
TU/e
Data bundling
In order to maintain event ordering at both sides of a channel, the
circuit must satisfy data bundling constraint:
• for push channel: delay along request wire must exceed delay of
data wire;
• for pull channel: delay along acknowledge wire must exceed
delay of data wire.
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
4
TU/e
Handshake signaling: pull channel
When data wires are invalid:
req multiple
ar
time transitions allowed.
and incomplete
ack ak
early ad
broad ad
late ad
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
5
TU/e
Tangram assignment x:= f(y,z)

yw
y
f
zw
 xw0
z
|
x
xr
xw1
Handshake circuit
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
6
TU/e
Four-phase data transfer
time
r / br
ba / cr

ca / a
b

c
bd / cd
1
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
2
3
4
5
7
TU/e
Handshake latch
[ [ w ; [w : rd:= wd]
[] r ; r
]]
• 1-bit handshake latch:
wd  wr  rd 
wd  wr  rd 
wk = wr
rk = rr
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
w
wd
wr
x
r
rd
8
TU/e
N-bit handshake latch
wr
rr
wd1
rd1
wd2
rd2
...
wdN
rd
area, delay, energy
• area: 2(N+1) gate eqs.
• delay per cycle:
4
gate delays
• energy per write cycle:
4 + 0.5*2N transitions,
in average
N
wk
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
rk
9
TU/e
Transferrer
[ [ a : (b ; c)]
; [ a : (b ; cd:= bd ; c ; cd:= )]
]
ar ak
a
b

c
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
br
bk
bd
ck
cr
cd
10
TU/e
Multiplexer
[ [ a : c ; a : (cd:= ad; c ; cd:= )
[] b : c ; b : (cd:= bd; c ; cd:= )
]]
Restriction:
ar  br must hold at all times!
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
a
|
c
b
11
TU/e
Multiplexer realization
control
circuit
data
circuit
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
12
TU/e
Logic/arithmetic operator
[ [ a : (b || c) ]
; [ a : ((b || c) ; ad:= f(bd , cd ))]
]
b
f
a
c
Cheaper realization (delay sensitive):
[ [ a : (b || c) ]
; [ a : ((b || c) ; ad:= f(bd , cd ))]
; “delay” ; ad:= 
]
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
13
TU/e
A one-place fifo buffer
byte = type [0..255]
& BUF1 = main proc
(a?chan byte & b!chan byte).
begin
x: var byte
| forever do a?x ; b!x od
end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
a
BUF1
b
14
TU/e
A one-place fifo buffer
byte = type [0..255]

& BUF1 = main proc
(a?chan byte & b!chan byte).
begin
x: var byte
| forever do a?x ; b!x od
end
a
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10

;

x

b
15
TU/e
2-place buffer
a
BUF1 b BUF1
c
byte = type [0..255]
& BUF1 = proc (a?chan byte & b!chan byte).
begin x: var byte | forever do a?x ; b!x od end
& BUF2: main proc (a?chan byte & c!chan byte).
begin b: chan byte | BUF1(a,b) || BUF1(b,c) end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
16
TU/e
Two-place ripple buffer
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
17
TU/e
Two-place wagging buffer

byte = type [0..255]
& wag2: main proc
(a?chan byte & b!chan byte).
begin x,y: var byte
| a?x
; forever do
(a?y || b!x)
; (a?x || b!y)
a
od
end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
b
18
TU/e
Two-place ripple register
…
begin
x0, x1: var byte
| forever do b!x1 ; x1:=x0; a?x0 od
end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
19
TU/e
4-place ripple register
byte = type [0..255]
& rip4: main proc (a?chan byte & b!chan byte).
begin
x0, x1, x2, x3: var byte
| forever do b!x3 ; x3:=x2 ; x2:=x1 ; x1:=x0 ; a?x0 od
end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
20
TU/e
4-place ripple register
x0
x1
x2
x3
• area
: N (Avar + Aseq )
• cycle time : Tc = (N+1) T:=
• cycle energy: Ec = N E:=
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
21
TU/e
Introducing vacancies
…
begin
x0, x1, x2, x3, v: var byte
| forever do
(b!x3 ; x3:=x2 ; x2:=v) || (v:=x1 ; x1:=x0 ; a?x0) od
end
• what is wrong?
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
22
TU/e
Introducing vacancies
forever do
((b!x3 ; x3:=x2) || (v:=x1 ; x1:=x0 ; a?x0))
; x2:=v
od
or:
forever do
((b!x3 ; x3:=x2) || (v:=x1 ; x1:=x0))
; (x2:=v || a?x0)
od
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
23
TU/e
“synchronous” 4-p ripple register
m0
m1
m2
x0
m3
b
s0
s1
s2
forever do
(s0:=m0 || s1:=m1 || s2:=m2 || b!m3 )
; ( a?m0 || m1:=s0 || m2:=s1 || m3:=s2)
od
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
24
TU/e
4-place wagging register
x0
x1
a
x2
b
x2
y0
x3
y1
forever do
b!x1 ; x1:=x0 ; a?x0
; b!y1 ; y1:=y0 ; a?y0
od
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
25
TU/e
8-place register
4-way wagging
forever do
b!u1 ; u1:=u0 ; a?u0
; b!v1 ; v1:=v0 ; a?v0
; b!x1 ; x1:=x0 ; a?x0
; b!y1 ; y1:=y0 ; a?y0
od
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
26
TU/e
Four 88 shift registers compared
type
area
cycle time energy/message
[gate eq.] [nanosec.]
[nanojoule]
linear
167
43
0.75
pseudo
synchronous
4-way
wagging
wagging
264
23
1.46
238
26
0.29
201
34
0.48
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
27
TU/e
Tangram/Haste
• Purpose: programming language for asynchronous VLSI
circuits.
• Creator: Tangram team @ Philips Research Labs (protoTangram 1986; release 2 in 1998).
• Inspiration: Hoare’s CSP, Dijkstra’s GCL.
• Lectures: no formal introduction; manual hand-out (learn by
example, learn by doing).
• Main tools: compiler, analyzer, simulator, viewer.
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
28
TU/e
2-place buffer
a
BUF1 b BUF1
c
byte = type [0..255]
& BUF1 = proc (a?chan byte & b!chan byte).
begin x: var byte | forever do a?x ; b!x od end
& BUF2: main proc (a?chan byte & c!chan byte).
begin b: chan byte | BUF1(a,b) || BUF1(b,c) end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
29
TU/e
Median filter
median: main proc (a? chan W & b! chan W).
begin x,y,z: var W
& xy, yz, zw: var bool
| forever do
((z:=y; y:=x) || yz:=xy) ; a?x
; (xy:= x<=y || zx:= z<=x)
; if zx=xy then b!x
or xy=yz then b!y
or yz=zx then b!z
a
fi
Median
od
end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
b
30
TU/e
Greatest Common Divisor
gcd: main proc (ab?chan <<byte,byte>> & c!chan byte).
begin x,y: var byte
| forever do
ab?<<x,y>>
; do x<y then y:= y-x
or x>y then x:= x-y
od
; c!x
od
end
ab
GCD
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
c
31
TU/e
Nacking Arbiter
nack: main proc (a?chan bool & b!chan bool).
begin na,nb: var bool
|
<<na,nb>> := <<true,true>>
; forever do
sel probe(a) then a!nb || na:= na#nb
or probe(b) then b!na || nb:= nb#na
les
od
a
Nacking
end
b
arbiter
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
32
TU/e
C : Tangram  handshake circuit
C(T) =
C(R;S)=


;
T
a
b
R
a
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
S
c
33
TU/e
C : Tangram  handshake circuit
C(R;S)=
C(R;S)=


;
;
R
S
a
c
S
a
|
R
c
b
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
34
TU/e
C : Tangram  handshake circuit

C (R||S) =
||

Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10

i
S
|
R
o
rx
35
TU/e
Tangram Compilation
Tangram program T
H
C
Handshake circuit
||
Handshake
process
E
VLSI circuit
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
 · H · T = || · C
·T
36
TU/e
VLSI programming of asynchronous
circuits
behavior,
Tangram
program
feedback
compiler
simulator
area, time,
energy,
test coverage
Handshake
circuit
expander
Asynchronous circuit
(netlist of gates)
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
37
TU/e
Tangram tool box
Let Rlin4.tg be a Tangram program:
• htcomp -B Rlin4
– compiles Rlin4.tg into Rlin4.hcl, a handshake circuit
• htmap Rlin4
– produces Rlin4*.v files, a CMOS standard-cell circuit
• htsim Rlin4 a b
– executes Rlin4.hcl with files a, b for input/output
• htview Rlin4
– provides interactive viewing of simulation results
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
38
TU/e
Tangram program “Conway”
a
P
b
Q
B1 = type [0..1]
& B2 = type <<B1,B1>>
& B3 = type <<B1,B1,B1>>
&P =… & Q=… &
c
R
d
R=…
& conway: main proc (a?chan B2 & d!chan B3).
begin b,c: chan B1 | P(a,b) || Q(b,c) || R(c,d) end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
39
TU/e
Tangram program “Conway”
& P = proc(a?chan B2 & b!chan B1).
begin x: var B2
| forever do a?x; b!x.0; b!x.1 od end
& Q= proc(b?chan B1 & c!chan B1).
begin y: var B1
| forever do b?y; c!y od end
& R= proc(c?chan B1 & d!chan B3).
begin x,y,z: var B1
| forever do c?x; c?y; c?z; d!<<x,y,z>> od end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
40
TU/e
VLSI programming for …
• Low costs:
– introduce resource sharing.
• Low delay (high throughput):
– introduce parallelism.
• Low energy (low power):
– reduce activity; …
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
41
TU/e
VLSI programming for low costs
• Keep it simple!!
• Introduce resource sharing: commands, auxiliary
variables, expressions, operators.
• Enable resource sharing, by:
– reducing parallelism
– making similar commands equal
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
42
TU/e
Command sharing
P : proc(). S
S ;… ;S
0
1
S
1
|
0
P() ; … ; P()
S
S
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
43
TU/e
Command sharing: example
ax : proc(). a?x
a?x ; … ; a?x
ax() ; … ; ax()
0
0
a
|
1

Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
|
1
|

xw
a

xw
44
TU/e
Procedure definition vs declaration
Procedure definition: P = proc (). S
– provides a textual shorthand (expansion)
– each call generates copy of resource, i.e. no
sharing
Procedure declaration: P : proc (). S
– defines a sharable resource
– each call generates access to this resource
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
45
TU/e
Command sharing
• Applies only to sequentially used commands.
• Saves resources, almost always
(i.e. when command is more costly than a mixer).
• Impact on delay and energy often favorable.
• Introduced by means of procedure declaration.
• Makes Tangram program less well readable.
Therefore, apply after program is correct & sound.
• Should really be applied by compiler.
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
46
TU/e
Sharing of auxiliary variables
• x:=E is an auto assignment when E depends on x.
This is compiled as aux:=E; x:= aux ,
where aux is a “fresh” auxiliary variable.
• With multiple auto assignments to x, as in:
x:=E; ... ; x:=F
auxiliary variables can be shared, as in:
aux:=E; aux2x(); ... ; aux:=F; aux2x()
with aux2x(): proc(). x:=aux
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
47
TU/e
Expression sharing
f : func(). E
x:=E ; … ; a!E
x:=f() ; … ; a!f()
e0
E
e1
E
|
E
e0
e1
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
48
TU/e
Expression sharing
• Applies only to sequentially used expressions.
• Often saves resources, (i.e. when expression is more
costly than the demultiplexer).
• Introduced by means of function declarations.
• Makes Tangram program less well readable.
Therefore apply after program is correct & sound.
• Should really be applied by compiler.
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
49
TU/e
Operator sharing
• Consider x0 := y0+z0 ; … ; x1 := y1+z1 .
• Operator + can be shared by introducing
add : func(a,b? var T): T. a+b
and applying it as in
add(y0, z0) ; … ; x1 := add(y1,z1) .
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
x0 :=
50
TU/e
Operator sharing: the costs
• Operator sharing may introduce multiplexers to
(all) inputs of the operator and a demultiplexer to
its output.
• This form of sharing only reduces costs when:
– operator is expensive,
– some input(s) and/or output are common.
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
51
TU/e
Operator sharing: example
• Consider x := y+z0 ; … ; x := y+z1 .
• Operator + can be shared by introducing
add2y : proc(b? var T). x:=y+b
and applying it as in
add2y(z0) ; … ; add2y(z1) .
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
52
TU/e
Greatest Common Divisor
gcd: main proc (ab?chan <<byte,byte>> & c!chan byte).
begin x,y: var byte
| forever do
ab?<<x,y>>
; do x<y then y:= y-x
or x>y then x:= x-y
od
; c!x
ab
c
GCD
od
end
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
53
TU/e
Assigment: make GCD smaller
• Both assignments (y:= y-x and x:= x-y) are auto assignments
and hence require an auxiliary variable.
• Program requires 4 arithmetic resources (twice < and –) .
• Reduce costs of GCD by saving on auxiliary variables and
arithmetic resources. (Beware the costs of multiplexing!)
• Use of ff variables not allowed for this exercise.
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10
54