Retiming and Re-synthesis Outline: • Retiming • Retiming and Resynthesis (RnR)

Download Report

Transcript Retiming and Re-synthesis Outline: • Retiming • Retiming and Resynthesis (RnR)

Retiming and Re-synthesis
Outline:
• Retiming
• Retiming and Resynthesis (RnR)
• Resynthesis of Pipelines
1
Optimizing Sequential Circuits
by Retiming Netlist of Gates
Netlist of gates and registers:
Inputs
Various Goals:
Outputs
– Reduce clock cycle time
– Reduce area
• Reduce number of latches
2
Retiming
Problem
– Pure combinational optimization can be myopic
since relations across register boundaries are
disregarded
Solutions
– Retiming: Move register(s) so that
• clock cycle decreases, or number of registers
decreases and
• input-output behavior is preserved
– RnR: Combine retiming with combinational
optimization techniques
• Move latches out of the way temporarily
• optimize larger blocks of combinational
3
Circuit Represetation
[Leiserson, Rose and Saxe (1983)]
Circuit representation: G(V,E,d,w)
–
–
–
–
V  set of gates
E  set of wires
d(v) = delay of gate/vertex v, (d(v)0)
w(e) = number of registers on edge e, (w(e)0)
4
Circuit Representation
Example: Correlator
+
0
Host
0

(x, y) = 1 if x=y
0 otherwise

7
0
0
2
3
0
3
Graph (Directed)
a
Circuit
b
Every cycle in Graph has at least one register
i.e. no combinational loops.
Operation delay

3
+
7
5
Preliminaries
For a path p :
e0
k
e1
v0  v1 
d ( p)   d (vi )
ek 1
vk 1  vk
(includes endpoints)
i 0
k 1
w( p)   w(ei )
Clock cycle
i 0
c  max {d ( p)}
p: w( p ) 0
0
0
7
Path with
0
0
2
3
0
w(p)=0
3
For correlator c = 13
6
Basic Operation
• Movement of registers from input to output
of a gate or vice versa
Retime by -1
Retime by 1
• Does not affect gate functionalities
• A mathematical definition: retardation
– r: V  Z, an integer vertex labeling
– wr(e) = w(e) + r(v) - r(u) for edge e = (u,v)
7
Basic Operation
Thus in the example, r(u) = -1, r(v) = -1 results
in
0
0
7
0
2
3
0
0
u
0
v
0
3
7
1
1
3
1
v
u
0
3
• For a path p: st, wr(p) = w(p) + r(t) - r(s)
• Retardation
– r: VZ, an integer vertex labeling
– wr(e) =w(e) + r(v) - r(u) for edge e= (u,v)
– A retiming r is legal if wr(e)  0, eE
8
Retiming for minimum clock cycle
Problem Statement: (minimum cycle time)
Given G (V, E, d, w), find a legal retiming r so that
c  max {d ( p)}
p: wr ( p ) 0
is minimized
Retiming: 2 important matrices
• Register weight matrix
W (u, v)  min{w( p) : u 
 v}
p
p
• Delay matrix
p
D(u, v)  max{d ( p) : u 
 v, w( p)  w(u, v)}
p
9
Retiming for minimum clock cycle
0
13
10
7
V0
V1
V2
V3
0
0
2
3
V1
W
V0 V1 V2 V3
V0
V1
V2
V3
V0 V1 V2 V3
7
0
v0 0
W = register path
weight matrix
(minimum # latches
on all paths between
u and v)
D = path delay matrix
(maximum delay on
all paths between
u and v)
0222
0000
0200
0220
3
0
V2
D
3
3
13
10
6
6
3
13
13
13
10
7
c    p, if d(p)   then w(p)  1
10
Conditions for Retiming
Assume that we are asked to check if a retiming exists for a clock
cycle 
Legal retiming: wr(e)  0 for all e. Hence
wr(e) = w(e) = r(v) - r(u)  0 or
r (u) - r (v)  w (e)
For all paths p: u  v such that d(p)  , we require wr(p)  1
– Thus
k 1
1  wr ( p)   wr (ei )
i 0
k 1
  [ w(ei )  r (vi 1 )  r (vi )]
i 0
 w( p)  r (vk )  r (v0 )
 w( p)  r (v)  r (u )
Or take the least w(p) (tightest constraint) r(u)-r(v)  W(u,v)-1
Note: this is independent of the path from u to v, so we just need to apply it to
u, v such that D(u,v)  
11
Solving the constraints
• All constraints in difference-of-2-variable form
• Related to shortest path problem
Correlator:  = 7
V0 V1 V2 V3
V0 V1 V2 V3
D>7:
Legal: r(u)-r(v)w(e) r(u)-r(v)W(u,v)-1
r (v0 )  r (v1 )  2
r (v1 )  r (v2 )  0
r (v1 )  r (v3 )  0
r (v2 )  r (v3 )  0
r (v3 )  r (v0 )  0
D
W
r (v0 )  r (v3 )  1
r (v1 )  r (v0 )  1
r (v1 )  r (v3 )  1
r (v2 )  r (v0 )  1
r (v2 )  r (v1 )  1
r (v2 )  r (v3 )  1
r (v3 )  r (v1 )  1
r (v3 )  r (v2 )  1
V0
V1
V2
V3
0222
0000
0200
0
13
10
7
0220
0
v0 0
3
3
13
10
13
13
10
7
7
0
0
2
6
6
3
13
3
v1
0
3
V2
12
V0
V1
V2
V3
Solving the constraints
• Do shortest path on constraint graph: (O(|V|3 )).
• A solution exists if and only if there exists no negative
weighted cycle.
D>7:
Legal: r(u)-r(v)w(e) r(u)-r(v)W(u,v)-1
r (v0 )  r (v1 )  2
r (v1 )  r (v2 )  0
r (v1 )  r (v3 )  0
r (v2 )  r (v3 )  0
r (v3 )  r (v0 )  0
r (v0 )  r (v3 )  1
r (v1 )  r (v0 )  1
r (v1 )  r (v3 )  1
r (v2 )  r (v0 )  1
r (v2 )  r (v1 )  1
r (v2 )  r (v3 )  1
r (v3 )  r (v1 )  1
r (v3 )  r (v2 )  1
A solution is r(v0) = r(v3) = 0, r(v1) = r(v2) = -1
-1
0
r(0)
-1
2
0
-1
r(2)
-1
1
1
1
1
0,-1
1
r(1)
0,-1
r(3)
0
Constraint graph
13
Retiming
To find the minimum cycle time, do a binary search among the
entries of the D matrix (0(V3 logV))
0
v0 0
7
0
2
W
0
3
0
v1
V0 V1 V2 V3
V0
V1
V2
V3
3
V2
0220
Retimed correlator:
+
Retime
Host
Clock cycle
= 3+3+7=13
0222
0000
0200
D
V0 V1 V2 V3
0
13
10
7
3
3
13
10
6
6
3
13
13
13
10
7
+
Host




Clock cycle = 7
a
b
a
b
14
V0
V1
V2
V3
Retiming: 2 more algorithms
1. Relaxation based:
–
–
Repeatedly find critical path;
retime vertex at end of path by +1 (O(VElogV))
v
u
+1
Critical path
2. Also, Mixed Integer Linear Program formulation
15
Retiming for minimum area
(minimum # latches)
Goal: minimize number of registers used
min N r   wr (e)
eE

 ( w(e)  r (v)  r (u ))
e:u  v
  w(e) 
eE
 (r (v)  r (u ))
e:u v
 N   (r (v)  r (u ))
u v
 N    r (v )(# fanin(v )  # fanout (v ) 
vV
 N   aV r (v )
vV
where av is a constant.
16
Minimum registers formulation
Minimize:
 a r (v )
vV
v
Subject to: wr(e) =w(e) + r(v) - r(u)  0
• Reducible to a flow problem
17
Retiming and resynthesis:
motivation
Goal: incorporate combinational optimization into sequential
optimization
• Naïve approach: carve out combinational regions, do
optimization on each region. Only local gains made.
• Can we do any better?
RnR: a new approach
Sentovich, Malik, Brayton andSangiovanni-Vincentelli (‘89)
3 step approach
1.
2.
3.
Move registers to boundary of circuit
Optimize network
Move registers back in an optimal way
18
RnR: circuit representation
Circuit representation: communication graph
– internal/peripheral edges
– edge-weight = register count
c
1
a
0
2
b
1
1
i
j
internal
peripheral
19
Extended Retiming
• Move register to the periphery
• Negative edge-weights permitted
-1
negative
latch
1
• A negative latch has the interpretation that
it advances its output by 1 clock cycle instead
of delaying it
20
Peripheral Retiming
A retiming is called a peripheral retiming if it results in
all internal edges having zero weight
Peripheral edges can have negative weight
2
n
1
All internal edge
weights are 0
1
2
k k
peripheral
weights
n
21
Peripheral Retiming
A circuit that undergoes peripheral retiming followed by
a legal retiming, i.e. one that results in all weights  0
is “functionally equivalent” to the original circuit
Functional equivalence: equivalence of finite state
automata:
But we have to be careful about the initial conditions and
initializing sequences. The resulting circuit may only exhibit
equivalence after an appropriate delay. See [Singhal et.al
ICCAD 1995].
This holds even for regular retiming
22
Peripheral Retiming - an example
c
1
a
0
c
0
1
Not possible for all circuits:
0
i
j
1
j
i
j
i
b
a
b
1
0
0
Peripheral retiming
2
2
0
a
b
0
0
0
1
c
d
0
0
o1
d
0
o2
23
Path Weight Matrix (PWM)
Matrix W
– rows:
inputs
– columns: outputs
o2
o1
a
no path i  j
*

Wij  i  j w(e) same weight for all paths
~
if paths with different weights

1
b
1
i
j
k
o1
I 1
j 0
k *
o2
*
~
0
24
PWM and Peripheral Retiming
Satisfiable path weight matrix:
1. Wij  ~, i, j and
2. i, j, such that Wij= I + j , Wij  *
Peripheral retiming possible  matrix is satisfiable
– i, j specify registers on peripheral edge
– some i, j can be negative
Complexity: linear in size of communication graph
25
Path Weight Matrix - generation
From inputs to outputs:
generate output columns of W.
Example:
o2
[ 1 0 * ] o1
[*~0]
Paths from inputs
to node
a
[0**]
1
([ * 0 * ]+0) & ([ * * 0 ]+0)
=[*0*]&[**0]
=[*00]
b
1
i
o2
*
~
0
( [ * 0 * ]+1) & [ * 0 0 ]
=[*1*]&[*00]
=[*~0]
[10*]
([ 0 * * ]+1) & ([ * 0 * ]+0)
=[1**]&[*0*]
=[10*]
o1
I 1
j 0
k *
j
[*0*]
k
[**0]
26
Computing , 
Each constraint of the form I + j = Wij
Procedure
1.
2.
3.
4.
set 1 = 0
use first row to generate ‘s
determine ‘s from ‘s
check for consistency
Example:
o
c
0/2
1/0
2/0
a
b
1/0
1/1
i
j
o
i 2
j 3
1 = 0
2 = 1
1 = 2
27
Computing ,
Example:
i
j
0
0
a
b
o1 o2
i 0 0
j 0 1
0
0
0
1
c
d
0
0
1+ 1 = 0
o1
d
0
o2
2+ 1 = 0
1+ 2 = 0 2+ 2 = 1
-----------------------------1- 2 = 0
1- 2 = -1
contradiction
28
Optimizing Acyclic Sequential
Circuits
Acyclic circuits with satisfiable W
– Do peripheral retiming putting i,j registers at
the I/O
– Resynthesize interior
– Do a legal retiming (move registers in)
• May not always be possible
Acyclic circuits with unsatisfiable W
– Identify maximal sub-circuits with satisfiable W
– Cut connections
– Repeat previous procedure
29
Acyclic Sequential Circuits
cut
circuits
Note that both of the cut circuits can be peripherally
retimed, unlike the original.
30
Optimizing Cyclic Sequential
Circuits
Cyclic circuits
1.
Make circuit acyclic by breaking cycles
–
Different feedback-cuts give different W’s
2. Find sub-circuits with satisfiable W’s
3. Repeat procedure
31
FSM Optimization
X
Y
OUT
A
B
Break feedback
X
X
Y
Y
OUT
B
A
32
FSM Optimization
Peripheral retiming
combinational
X
X
Y
Y
OUT
B
A
X
Resynthesize
X
Y
Y
OUT
B
A
33
FSM Optimization
Retime
X
X
Y
Y
OUT
B
X
A
Y
Reconnect
OUT
B
A
34
FSM Optimization
Original circuit
OUT
A
B
Resynthesized circuit
OUT
A
B
35
Resynthesis of Pipelines
Goal: Performance optimization of pipeline
circuits
Example: Pipeline circuit C
In
Ii
I1
Cn
Ci
On
Oi
In
Peripheral
Retiming
Cn
-(n -1)
Ii
Ci
C1
I1
n -1
Oi
i-1
Combinational
circuit
-(i-1)
O1
On
C1
O1
36
Resynthesis of Pipelines
Parameters:
–
–
–
–
A
R
c
C
vector of arrival times for inputs
vector of required times for outputs
target clock cycle
circuit
Pipeline performance problem:
PP(CP,cP,AP,RP)
Combinational performance problem:
PC(CC,cC,AC,RC)
37
Resynthesis of Pipelines
Problem Transformation:
– Relax PP(CP,cP,AP,RP) to P ‘P(CP,cp+,AP,RP)
• where  = largest possible single gate delay
– Convert: pipeline P ‘P to combinational problem P ‘C
CP 
 CC
peripheral retime
stagei
C
 (i  1)  c  R
stagei
C
 (i  1)  c  A
R
A
stagei
P
stagei
P
38
Perfomance Synthesis of Pipeline
• Arrival and Required Times for P ‘C :
In
Ii
I1
Cn
Ci
On
Oi
In
Cn
Peripheral
Retiming
On
(i-1)c+Ri
Ii
Ci
Oi
Combinational
circuit
(i-1)c+Ai
C1
O1
I1
C1
O1
Theoretical Contribution:
– If P ‘C(cp+) has a solution then retiming yields a solution to
P ‘P (cp+)
– If there is a solution to PP (cp) then peripheral retiming yields
39
a solution to P ‘C (cp+)