Transcript ppt

Efficient Timing-Driven Incremental
Routing for VLSI Circuits Using
DFS and Localized Slack-Satisfaction
Computations
Hasan Arslan and Shantanu Dutt
Electrical & Computer Eng.
University of Illinois at Chicago
DATE 2006
Outline

Introduction
•
•
•

A DFS-Based TD Incr Routing Algorithm
(TIDE)
•
•
•


Importance of Incremental Routing
Previous Work
Our Goals
Previous work on global TD routing
Global Routing with slack tolerance concepts
Detailed Routing with DFS-based B&R
Experimental Results
Conclusion
Dutt & Arslan, UIC
Incremental Routing
•
After chip layout is completed


•
Need correcting changes to the circuit/system

•
•
•
•
•

Time/noise/thermal violation
One or more optimization metrics (speed/power/area)
unsatisfactory
Engineering Change Order (ECO) process
Enormous resources and time already spent
Time to meet market requirements
Most ECOs lead to a requirement of routing changes
after various design changes at earlier levels
The ECO could also be at the routing level
Incremental routing & interconnects critical
Need a time-efficient & effective TD-incremental routing
algorithm
Dutt & Arslan, UIC
Incremental Routing (Cont.)

Incremental Routing Problem
•
•

Set of existing routed nets R = E – D, E = original
nets before ECO, D = deleted nets
Set of new nets S (resulting from correcting re-
synthesis at different levels of the VLSI design flow)
Quality metrics
•
•
•
Time-efficient near-optimal incr solns for S subject to
given constraints (slack satisfaction, crosstalk
bounding, etc.)
Minimal changes to previous routing results
Complete incr routing in the available metal layers (if
such a soln exists)
Dutt & Arslan, UIC
Prior Work on Incremental Routing








1) Emmert and Bhatia, “Incremental Routing in FPGA”, IEEE Int.
ASIC Conference, 1998.
2) Cong and Sarrafzadeh, “Incremental Physical Design”, ISPD 2000.
3) Dutt, Shanmugavel and Trimberger, “Efficient Incremental
Rerouting for Fault Reconfiguration in FPGAs”, ICCAD 1999.
4) Dutt, Verma and Arslan “A Search-Based Bump and Refit
Approach to Incremental Routing for ECO Applications in FPGAs”,
TODAES 2002
5) Xiang, Chao, Wong “An ECO Algorithm for Eliminating Crossalk
Violations”, ISPD 2004
6) S. Raman, et al., “A Timing-Constrained Incremental Routing
Algorithm for Symmetrical FPGAs “, DATE 1996
7) H. Arslan and S. Dutt, “A Depth-First Search Controlled Gridless
incremental routing Algorithm for VLSI Circuits”, ICCD 2004.
No work on TD incremental routing for ASICs
Dutt & Arslan, UIC
Prior Work (Cont.)
Emmert-Bhatia (ASIC’98)



Nets connected to faulty PLB, deleted and rerouted
Standard single-net routing mode (global then detailed)
Do not perturb or move existing nets
Cong-Sarrafzadeh (ISPD’00)


Standard Net Routing : Route new nets without
perturbing existing nets
Rip & Reroute : If some nets cannot be routed, rip-up
“blocking” existing nets. Reroute the ripped up nets.
Dutt & Arslan, UIC
Our Goals


TD-incremental routing for VLSI (ASIC) circuits
Address quality metrics of incr. routing and
satisfy constraints
 Satisfy slack constraints on new and existing nets
that may be affected by new net routing
 Fast near-WL,via-optimal incr solutions
 Min. changes to existing routing-bounded WL, via
increase
 Complete incr routing in the available metal
layers—aggressive exploration of routing space
within above constraints
Dutt & Arslan, UIC
High-Level Approach
V2
V
Global Routing of new net based on:
1) A new iterative slack-satisfaction algorithm
IntAl for connecting next pin on the net
based on local slack tolerances
2) Congestion + WL + efficiency-based
min-cost on a grid-graph for each 2-pin path
0
Vi
V
2
Vj
V
m
V1
Detailed routing of new net based on:
1) WL + via + bumping (degree of bumped
net) min-cost path for each global route
2) Constraint-satisfying DFS-based partialB&R process for “overlapped” or “bumped”
existing nets so that: a) slacks not violated,
b) WL-increment bounded
Adj-via
nj
n2
n1
Adj-via
n3
n2
Bumped seg.
Dutt & Arslan, UIC
Our Approach – TD Global Routing
• In an iterative connections of pins on routing tree, most imp.
Q: Where to connect the next pin for slack-satisfaction of all
pins and min-WL?
v0
• Simplified rule of thumb:
• Closer is the connection to CC, more
interconnect sharing there is w/ partial
tree T & less additional delay seen by
other sinks. But more “baggage”
(accumulated delay) for new pin vu.
• Higher up and away from CC the
connection is, lesser is the
accumulated delay of T seen by vu, but
less sharing and more delay seen by
other sinks due to more wire-cap load
v1
vi
(closest
connection)
CC
vu
v2
New pin
v3
• Classic Prim-Dijkstra tradeoffs discussed in [Alpert et al., TCAD’95]
• Optimal solution (even just slack-satisfying soln. somewhere in between
• No one has solved it exactly (satisfying all slacks) or optimally (w/ min-WL)
• We provide a near-min-WL all-slack-satisfying solution here
Dutt & Arslan, UIC
Various Approaches to TD Global Routing
• In [Boese, et al., TCAD-95] (SERT/ERT algorithm):
– The delay on any sink is a concave function of lx the distance
from CC of the connection point
– Same for weighted sum of all sink delays (obj. func)—min. @
either vi or CC
– Choose vi or CC based on min-WL
delay
v0
vi
v1
lx
CC
vi
vx
lx
CC
vu
v2
New pin
v3
• Does not solve the core TD problem of slack satisfaction
Dutt & Arslan, UIC
Various Approaches to TD Global Routing (contd)
• In [Hou & Sapatnekar, ISPD’98] (MVERT):
– Constraint satisfaction of all sinks vk is explicitly considered:
d(vk) - slack(vk) <= 0 (LHS is also concave); uses non-Hanan points
– The technique involves navigating max[slack(vk)– d(vk)] via
intersection points of the various concave curves
– Use binary search to find min-WL point for constr. satisfaction
Max envelope
delay - slack
v0
0
v1
vi
lx
Optimal slacksatisfying conn.
point
vk
vi
CC
vx
lx
CC
vu
v2
New pin
v3
• Time complex—
, where k = # sink pins (our analysis)
• Misses some slack satisfaction solutions from initial SERT/ERT handoff
Dutt & Arslan, UIC
Various Approaches to TD Global Routing (contd)
• As our competitor, we consider a mix of SERT [Boese, TCAD’95]
and SOAR [Wang and Kuh, MCMC’97] (SERT/SOAR):
– Check if connection to CC satisfies all slacks
– Else make a connection to driver v0
– Rationale:
 If connection to CC violates slack to vu, then this is most likely
due to the “baggage” delay of shared interconnects.
 This can be avoided maximally by routing directly to vo
 Classical Prim-Dijkstra tradeoff
 Fast
v0
v1
vi
vx
lx
CC
vu
v2
New pin
v3
Dutt & Arslan, UIC
Our Approach to TD Global Routing
• Exact slack satisfaction of all sinks in “constant” time by checking
satisfaction of derived slacks (called tolerances) as a function of lx of
only 3 classes of sinks:
– vu
– sinks in T(CC), where T(u) is routing subtree rooted at u (e.g., v3)
– sinks below T(vi)/T(CC) (e.g., v1, v2)
• For this we need tolerance concepts discussed next
v0
v1
Derived
tolerances
vi
vx
lx
CC
vu
v2
New pin
v3
Dutt & Arslan, UIC
Elmore Delay Model
• D(vj) = D(vi) + (r.c.l2ij)/2 + r.lij.Cdnj
• Cdnj = gate + wiring capacitance of
subtree rooted at vj
• If vj is a sink pin, Cdnj = Cg(vj)
[gate cap of vj]
• r, c = unit wire cap, res
• Has good fidelity
v0
vi
lij
vj
Cdnj
Dutt & Arslan, UIC
Tolerance Concepts—Delay Tolerance
Toldel (v j )  max delay increase D
can be tolerated at v j
Toldel (v j )  max delay increase D
Toldel (v j ) 
Toldel (vm )

v children ( v )
min
m
j
Toldel (v j )  max delay increase D
Dutt & Arslan, UIC
Tolerance Concepts—Capacitance Tolerances
Tolcap (v j )  max cap incr.
downstream in the subtree Tv j such that
sink pin on Tv j will not be violated
= upstream
res. from vo
To vj

Rvupm 


Tolcap (v j )  min Tolcap (vm ) up 
vm children ( v j )
Rv j 



Min
Tolcap (v1 )
Rvup1
up
vj
R
Tolcap (v3 )
Rvup3
Rvupj
Dutt & Arslan, UIC
Global Routing: Connecting a New Pin—Interval
Intersection (IntAl) Algorithm
V0
V6
Vi
V1
x
Vu
Set1
Properties
delay
Set3
Δcap
Theorem 1: The tree
truncation method
in the IntAl
algorithm will
V’
always find a slackIntersection V5
satisfying connection
point for new pin
up(v )/R
up(v’)
nearest
valid
if
Tolcap(v’) = Tol
cap(vj).R
j edge,
CC
Optimal
one exists, of the
point
partial routing tree
T
Set2
Updates of tolerances
done on
x
an as-needed
basis.
E.g.,
Concave
function
change in delay at a sink pin
due to re-routing is only
2+bx+d <= 0
propagated
to ancestors’
f(x)= -r.c.x
tolerances. Later when a
node’s tolerance is needed, it
may not be updated but this is
accomplished
by scanning
2+ex+mall
g(x)= -r.c.x
<= 0
its ancestors.
Q(h) time complexity per reh(x)
l <= 0
routing of
T, h=isc.x
T’s+height
V2
V3
V4
IntAl Algorithm
Vj
• Determine valid intervals for each inequality
• If all intervals are non-empty, take intersection
• If intersection non-empty
– then take bottom point of intersection as minWL slack-satisfaction point
– else prune tree and repeat process with next
nearest pruned-tree branch
Dutt & Arslan, UIC
Timing-Driven Incremental Detailed Routing
Detailed routing of new net based on:
1) WL + via + bumping min-cost path for
Adj-via
each global route
2) Constraint-satisfying DFS-based partialB&R process for “overlapped” or “bumped”
existing nets so that: a) slacks not violated,
b) WL-increment bounded

n2
n1
Adj-via
n3
n2
Bumped seg.
If a portion of net ni is overlapped




nj
Length of overlapped portion might be increased.
Increase the capacitance.
Slack of sink pins might be violated.
Possible overlapping:



With leaf interconnect
Interior edge
Steiner point
Dutt & Arslan, UIC
TD Incr. Detailed Routing—Overlapping a Leaf
Interconnect
n1
V0
T
Self Test:
D(v2 )  S (v2 )
V6
Downstream Test:
Vi
V1
V5
cap  Tolcap (v j )
Upstream Test
cap  Anc _ Tolcap (v j )
n2
Vj
Δcap
V2
n2
V4
Dutt & Arslan, UIC
TD Incr. Detailed Routing—Overlapping an Interior
Interconnect
n1
Upstream Test:
cap  Anc _ Tolcap (v j )
i
V0
T
Δcap
n2
Vi
V6
n2
D(v’j)
V’j
Vj
V’’j
Vm
D(v2) V2
V3
V4
D(vm)
Downstream Test:
1) D(v’j) <= Toldel(v’j) = Toldel(vj)
2) For each child vk (sink pin or
Steiner node):
• D(vk) <= Toldel(vk)
[e.g., D(v2) <= Toldel(v2),
D(vm) <= Toldel(vm) ]
V5
Do these checks only
Dutt & Arslan, UIC
TD Incr. Detailed Routing—Overlapping Steiner Point
n1
n1
n1
T
T
T
Vn
Vn
Vn
V5
Vi
n2
n2
Vi
n2
Vi
V’j
Δcap
Vk
V5
V5
Δcap
Vj
Vj
V’k
V1 V2
n2
V3
V4
V1 V2
n2
V3
V4
V1 V2
V’j
V3
(a)
(b)
(b) moving vj upwards will increase capacitance and resistance
(c) moving vj downwards decrease cap. change Steiner node
n2
V4
(c)
Dutt & Arslan, UIC
Constraint Satisfying DFS-Controlled Routing
with Partial B&R
• Adapted from [Arslan & Dutt, ICCD’04]
n3..v1
nj
n2
n3
nj
nj
n2..h1
n1
n1..b-seg
n3..h1
n2..h2
n2..h2
n1..b-seg
n1..b-seg
n1.b-seg
P1
n2.pin
or
obs
P2
n2.h2
Pi= i-via path is explored
• Exploring a richer solution space via
partial bump-&-reroute (B&R) of existing
nets
• Constraint of minimal effect on B&R’ed
nets need to be satisfied: slack
staisfaction, min-WL
DFS retractions:
• pin or logic as obstacles
• ancestor nets bumped
• slack violation of current
net
• WL of currently re-routed
net excessive
• other constraints
Dutt & Arslan, UIC
DFS-Controlled Routing with Partial B&R
n3..v1
nj
n2
n3
nj
nj
n2..h1
n1
n2..h2
n3..h1
n2..h2
n1..b-seg
Pi= i-via path is explored
n1.b-seg
P1
P2
n2.pin
or
obs
n2.h2
P1
n3.v1
Dutt & Arslan, UIC
DFS-Controlled Routing with Partial B&R
n3..v1
nj
n2
n3
nj
nj
n2..h1
n1
n1.b-seg
n2..h2
n3..h1
n2..h2
P1
n2.pin
or
obs
n1..b-seg
Pi= i-via path is explored
P2
n2.h2
P1
P1
obs
n3.v1
P1
anc
P2-P4
obs or
anc.n1 or
anc.nj
Dutt & Arslan, UIC
DFS-Controlled Routing with Partial B&R
n3..v1
nj
n2
n3
nj
nj
n2..h1
n1
n1.b-seg
P1
n3..h1
n2..h2
n1..b-seg n2..h2
Pi= i-via path is explored
P2
n2.pin
or
obs
n2.h2
P2-P3
P1
P1
obs
n3.v1
P1
anc
P2-P4
obs or
anc.n1 or
anc.nj
P1
obs
n3.h1
P2-P4
obs or
anc.n1 or
anc.nj
Dutt & Arslan, UIC
DFS-Controlled Routing with Partial B&R
n3..v1
nj
n2
n3
nj
nj
n1..b-seg
n2..h1
n1
P1
n3..h1
n2.h1
P2
n2.pin
or
obs
n1..b-seg n2..h2
Pi= i-via path is explored
P2
n1.b-seg
n2.h2
P1
obs
P2-P3
P1
P1
obs
n3.v1
P1
anc
P2-P4
obs or
anc.n1 or
anc.nj
P1
obs
n3.h1
P2-P4
obs or
anc.n1 or
anc.nj
Dutt & Arslan, UIC
DFS-Controlled Routing with Partial B&R
n3..v1
nj
n2
n3
nj
nj
n2..h1
P1
n3..h1
n1
P2
n1.b-seg
n2.pin
or
obs
n2..h2
n2.h1
P2
n2.h2
obs
P1
obs
• net lenghts and # of vias of all
modified/bumped nets unchanged
n3.v1
P1
anc
VGL
P2-P3
P1
Pi= i-via path is explored
P2
P1
P2-P4
obs or
anc.n1 or
anc.nj
P1
obs
n3.h1
P2-P4
obs or
anc.n1 or
an.nj
Dutt & Arslan, UIC
Benchmark Circuits

Benchmarks:









10 circuits ranging from 1643 to 10435 nets & 7200 to 47520 pins
Base 2x2 tile of Mcc1 bench. is repl. with diff. cell sizes and diff. #
of pins
Nets randomly generated & routed using SERT/SOAR
Net distribution: 2-pin: 30%, 3-4 pins: each 20%, 5 pins: 10%, 6-7
pins: each 5%, 8-10 pins: each 2%, 11-14 pins: each 1%
Pin slacks normally distributed in range [0,5% max delay on net]
Ran on 2.6 Ghz Pentium Linux machines, 1GB RAM
Simulation: Randomly deleted 10% nets & rand. gen. 10% new
nets
Evaluation: Crash Test—routing as many nets as possible under
the constraint of only 2 metal layers & slack satisfaction
(TD-S, TD-R, TIDE)
TD-S (TD-R) is SERT/SOAR overlaid on Std (R&R)
Dutt & Arslan, UIC
Results
1) % Unrouted Nets
Times TIDE is better
% Unrt.nets
% Unrt.net
7x
60
40
TD-S
20
0
TIDE
TD-R
Ckt1
1.6k
Ckt2
2.3k
Ckt3
2.9k
Ckt4
3.6k
Ckt5
4.3k
Ckt6
4.9k
Ckt7
5.6k
Ckt8
6.3k
Ckt9
7.6k
Ckt10 A vg
10.4k 5.0k
% Unrt.net
%
U nrt.nets
80
G lob
nets
45
40
35
30
25
20
15
10
5
0
6.7x
TD-S
TD-R
TIDE
Avg
Ckts
9.8x 9.5x
Glob nets
2) Slack Violations
S lack viols
100
400
300
TD -S
TD -R
200
TID E
100
0
Slack viols
S lack viols
Slack viols
5.6x
500
4.7x
80
5.3x
4.2x
60
40
20
0
C kt1 C kt2 C kt3 C kt4 C kt5 C kt6 C kt7 C kt8 C kt9 C kt10 A vg G lob
1.6k 2.3k 2.9k 3.6k 4.3k 4.9k 5.6k 6.3k 7.6k 10.4k 5.0k nets
C kts
Avg
Glob nets
TD-S
TD-R
TIDE
Results
3) Average Routed Net Length
Times TIDE is better
Av rout net length
45
40
35
30
25
20
15
10
5
0
Av rout net length
Av rout net length
Av rout net length
TD-S
TD-R
TIDE
Ckt1
1.6k
Ckt2
2.3k
Ckt3
2.9k
Ckt4
3.6k
Ckt5
4.3k
Ckt6
4.9k
Ckt7
5.6k
Ckt8
6.3k
Ckt9 Ckt10 Avg
7.6k 10.4k 5.0k
6.7x
45
40
35
30
25
20
15
10
5
0
TD-S
TD-R
TIDE
3x
Glob
nets
Avg
Glob nets
4) Vias per New Net
Vias per new net
4.4x
80
60
TD-S
TD-R
TIDE
40
20
0
Ckt1
1.6k
Ckt2
2.3k
Ckt3
2.9k
Ckt4
3.6k
Ckt5
4.3k
Ckt6
4.9k
Ckt7
5.6k
Ckt8
6.3k
Ckt9
7.6k
Ckt10
10.4k
Avg
5.0k
Glob
nets
V ias per new net
Vias per new net
Vias per new net
80
70
60
50
40
30
20
10
0
2.6x
TD -S
TD -R
TID E
Avg
G lob nets
Ckts
Dutt & Arslan, UIC
Results
5) Modified Nets per New Net
Times TIDE is better
TD-S
TD-R
TIDE
Ckt1
1.6k
Ckt2
2.3k
Ckt3
2.9k
Ckt4
3.6k
Ckt5
4.3k
Ckt6 Ckt7
4.9k 5.6k
Ckts
Ckt8
6.3k
Ckt9 Ckt10 Avg
7.6k 10.4k 5.0k
Glob
nets
modif nets per new net
modif nets per new net
35
30
25
20
15
10
5
0
9.5x
35
30
25
TD-S
TD-R
TIDE
20
15
2x
10
5
0
Avg
Glob nets
6) Runtime
Runtime
Runtime
2000
2.4x
1000
1500
TD-S
TD-R
TIDE
1000
500
0
Ckt1
1.6k
Ckt2
2.3k
Ckt3
2.9k
Ckt4
3.6k
Ckt5
4.3k
Ckt6
4.9k
Ckts
Ckt7
5.6k
Ckt8
6.3k
Ckt9
7.6k
Ckt10
10.4k
Avg
5.0k
Runtime (secs)
Runtime (secs)
Modif nets per new net
Modif nets per new net
800
TD-S
TD-R
TIDE
600
400
0.5x
200
0
Avg
Dutt & Arslan, UIC
Conclusions

New TD Incremental Routing Algorithm TIDE

Uses new concepts of derived tolerances @ Steiner nodes to:
a) In global routing—quickly determine slack-satisfying near-min-WL
connection of the next pin for a new net routing
b) In detailed routing—quickly determine slack satisfaction of B&R’ed
nets



In global routing, the IntAl algorithm is the first in pruning trees
(proven correct) for determining the next nearest connection point
after the recent attempt failed.
In detailed routing, high-level DFS control  good routing soln. for
new nets with min. impact on existing nets
Produces significant improvement over TD-Std and TD-R&R
in all important metrics of interest and is reasonably fast
Future Work

TD incremental placement and integration with TIDE
Dutt & Arslan, UIC
THANK YOU
Dutt & Arslan, UIC