Transcript ppt
Efficient Timing-Driven Incremental Routing for VLSI Circuits Using DFS and Localized Slack-Satisfaction Computations Hasan Arslan and Shantanu Dutt Electrical & Computer Eng. University of Illinois at Chicago DATE 2006 Outline Introduction • • • A DFS-Based TD Incr Routing Algorithm (TIDE) • • • Importance of Incremental Routing Previous Work Our Goals Previous work on global TD routing Global Routing with slack tolerance concepts Detailed Routing with DFS-based B&R Experimental Results Conclusion Dutt & Arslan, UIC Incremental Routing • After chip layout is completed • Need correcting changes to the circuit/system • • • • • Time/noise/thermal violation One or more optimization metrics (speed/power/area) unsatisfactory Engineering Change Order (ECO) process Enormous resources and time already spent Time to meet market requirements Most ECOs lead to a requirement of routing changes after various design changes at earlier levels The ECO could also be at the routing level Incremental routing & interconnects critical Need a time-efficient & effective TD-incremental routing algorithm Dutt & Arslan, UIC Incremental Routing (Cont.) Incremental Routing Problem • • Set of existing routed nets R = E – D, E = original nets before ECO, D = deleted nets Set of new nets S (resulting from correcting re- synthesis at different levels of the VLSI design flow) Quality metrics • • • Time-efficient near-optimal incr solns for S subject to given constraints (slack satisfaction, crosstalk bounding, etc.) Minimal changes to previous routing results Complete incr routing in the available metal layers (if such a soln exists) Dutt & Arslan, UIC Prior Work on Incremental Routing 1) Emmert and Bhatia, “Incremental Routing in FPGA”, IEEE Int. ASIC Conference, 1998. 2) Cong and Sarrafzadeh, “Incremental Physical Design”, ISPD 2000. 3) Dutt, Shanmugavel and Trimberger, “Efficient Incremental Rerouting for Fault Reconfiguration in FPGAs”, ICCAD 1999. 4) Dutt, Verma and Arslan “A Search-Based Bump and Refit Approach to Incremental Routing for ECO Applications in FPGAs”, TODAES 2002 5) Xiang, Chao, Wong “An ECO Algorithm for Eliminating Crossalk Violations”, ISPD 2004 6) S. Raman, et al., “A Timing-Constrained Incremental Routing Algorithm for Symmetrical FPGAs “, DATE 1996 7) H. Arslan and S. Dutt, “A Depth-First Search Controlled Gridless incremental routing Algorithm for VLSI Circuits”, ICCD 2004. No work on TD incremental routing for ASICs Dutt & Arslan, UIC Prior Work (Cont.) Emmert-Bhatia (ASIC’98) Nets connected to faulty PLB, deleted and rerouted Standard single-net routing mode (global then detailed) Do not perturb or move existing nets Cong-Sarrafzadeh (ISPD’00) Standard Net Routing : Route new nets without perturbing existing nets Rip & Reroute : If some nets cannot be routed, rip-up “blocking” existing nets. Reroute the ripped up nets. Dutt & Arslan, UIC Our Goals TD-incremental routing for VLSI (ASIC) circuits Address quality metrics of incr. routing and satisfy constraints Satisfy slack constraints on new and existing nets that may be affected by new net routing Fast near-WL,via-optimal incr solutions Min. changes to existing routing-bounded WL, via increase Complete incr routing in the available metal layers—aggressive exploration of routing space within above constraints Dutt & Arslan, UIC High-Level Approach V2 V Global Routing of new net based on: 1) A new iterative slack-satisfaction algorithm IntAl for connecting next pin on the net based on local slack tolerances 2) Congestion + WL + efficiency-based min-cost on a grid-graph for each 2-pin path 0 Vi V 2 Vj V m V1 Detailed routing of new net based on: 1) WL + via + bumping (degree of bumped net) min-cost path for each global route 2) Constraint-satisfying DFS-based partialB&R process for “overlapped” or “bumped” existing nets so that: a) slacks not violated, b) WL-increment bounded Adj-via nj n2 n1 Adj-via n3 n2 Bumped seg. Dutt & Arslan, UIC Our Approach – TD Global Routing • In an iterative connections of pins on routing tree, most imp. Q: Where to connect the next pin for slack-satisfaction of all pins and min-WL? v0 • Simplified rule of thumb: • Closer is the connection to CC, more interconnect sharing there is w/ partial tree T & less additional delay seen by other sinks. But more “baggage” (accumulated delay) for new pin vu. • Higher up and away from CC the connection is, lesser is the accumulated delay of T seen by vu, but less sharing and more delay seen by other sinks due to more wire-cap load v1 vi (closest connection) CC vu v2 New pin v3 • Classic Prim-Dijkstra tradeoffs discussed in [Alpert et al., TCAD’95] • Optimal solution (even just slack-satisfying soln. somewhere in between • No one has solved it exactly (satisfying all slacks) or optimally (w/ min-WL) • We provide a near-min-WL all-slack-satisfying solution here Dutt & Arslan, UIC Various Approaches to TD Global Routing • In [Boese, et al., TCAD-95] (SERT/ERT algorithm): – The delay on any sink is a concave function of lx the distance from CC of the connection point – Same for weighted sum of all sink delays (obj. func)—min. @ either vi or CC – Choose vi or CC based on min-WL delay v0 vi v1 lx CC vi vx lx CC vu v2 New pin v3 • Does not solve the core TD problem of slack satisfaction Dutt & Arslan, UIC Various Approaches to TD Global Routing (contd) • In [Hou & Sapatnekar, ISPD’98] (MVERT): – Constraint satisfaction of all sinks vk is explicitly considered: d(vk) - slack(vk) <= 0 (LHS is also concave); uses non-Hanan points – The technique involves navigating max[slack(vk)– d(vk)] via intersection points of the various concave curves – Use binary search to find min-WL point for constr. satisfaction Max envelope delay - slack v0 0 v1 vi lx Optimal slacksatisfying conn. point vk vi CC vx lx CC vu v2 New pin v3 • Time complex— , where k = # sink pins (our analysis) • Misses some slack satisfaction solutions from initial SERT/ERT handoff Dutt & Arslan, UIC Various Approaches to TD Global Routing (contd) • As our competitor, we consider a mix of SERT [Boese, TCAD’95] and SOAR [Wang and Kuh, MCMC’97] (SERT/SOAR): – Check if connection to CC satisfies all slacks – Else make a connection to driver v0 – Rationale: If connection to CC violates slack to vu, then this is most likely due to the “baggage” delay of shared interconnects. This can be avoided maximally by routing directly to vo Classical Prim-Dijkstra tradeoff Fast v0 v1 vi vx lx CC vu v2 New pin v3 Dutt & Arslan, UIC Our Approach to TD Global Routing • Exact slack satisfaction of all sinks in “constant” time by checking satisfaction of derived slacks (called tolerances) as a function of lx of only 3 classes of sinks: – vu – sinks in T(CC), where T(u) is routing subtree rooted at u (e.g., v3) – sinks below T(vi)/T(CC) (e.g., v1, v2) • For this we need tolerance concepts discussed next v0 v1 Derived tolerances vi vx lx CC vu v2 New pin v3 Dutt & Arslan, UIC Elmore Delay Model • D(vj) = D(vi) + (r.c.l2ij)/2 + r.lij.Cdnj • Cdnj = gate + wiring capacitance of subtree rooted at vj • If vj is a sink pin, Cdnj = Cg(vj) [gate cap of vj] • r, c = unit wire cap, res • Has good fidelity v0 vi lij vj Cdnj Dutt & Arslan, UIC Tolerance Concepts—Delay Tolerance Toldel (v j ) max delay increase D can be tolerated at v j Toldel (v j ) max delay increase D Toldel (v j ) Toldel (vm ) v children ( v ) min m j Toldel (v j ) max delay increase D Dutt & Arslan, UIC Tolerance Concepts—Capacitance Tolerances Tolcap (v j ) max cap incr. downstream in the subtree Tv j such that sink pin on Tv j will not be violated = upstream res. from vo To vj Rvupm Tolcap (v j ) min Tolcap (vm ) up vm children ( v j ) Rv j Min Tolcap (v1 ) Rvup1 up vj R Tolcap (v3 ) Rvup3 Rvupj Dutt & Arslan, UIC Global Routing: Connecting a New Pin—Interval Intersection (IntAl) Algorithm V0 V6 Vi V1 x Vu Set1 Properties delay Set3 Δcap Theorem 1: The tree truncation method in the IntAl algorithm will V’ always find a slackIntersection V5 satisfying connection point for new pin up(v )/R up(v’) nearest valid if Tolcap(v’) = Tol cap(vj).R j edge, CC Optimal one exists, of the point partial routing tree T Set2 Updates of tolerances done on x an as-needed basis. E.g., Concave function change in delay at a sink pin due to re-routing is only 2+bx+d <= 0 propagated to ancestors’ f(x)= -r.c.x tolerances. Later when a node’s tolerance is needed, it may not be updated but this is accomplished by scanning 2+ex+mall g(x)= -r.c.x <= 0 its ancestors. Q(h) time complexity per reh(x) l <= 0 routing of T, h=isc.x T’s+height V2 V3 V4 IntAl Algorithm Vj • Determine valid intervals for each inequality • If all intervals are non-empty, take intersection • If intersection non-empty – then take bottom point of intersection as minWL slack-satisfaction point – else prune tree and repeat process with next nearest pruned-tree branch Dutt & Arslan, UIC Timing-Driven Incremental Detailed Routing Detailed routing of new net based on: 1) WL + via + bumping min-cost path for Adj-via each global route 2) Constraint-satisfying DFS-based partialB&R process for “overlapped” or “bumped” existing nets so that: a) slacks not violated, b) WL-increment bounded n2 n1 Adj-via n3 n2 Bumped seg. If a portion of net ni is overlapped nj Length of overlapped portion might be increased. Increase the capacitance. Slack of sink pins might be violated. Possible overlapping: With leaf interconnect Interior edge Steiner point Dutt & Arslan, UIC TD Incr. Detailed Routing—Overlapping a Leaf Interconnect n1 V0 T Self Test: D(v2 ) S (v2 ) V6 Downstream Test: Vi V1 V5 cap Tolcap (v j ) Upstream Test cap Anc _ Tolcap (v j ) n2 Vj Δcap V2 n2 V4 Dutt & Arslan, UIC TD Incr. Detailed Routing—Overlapping an Interior Interconnect n1 Upstream Test: cap Anc _ Tolcap (v j ) i V0 T Δcap n2 Vi V6 n2 D(v’j) V’j Vj V’’j Vm D(v2) V2 V3 V4 D(vm) Downstream Test: 1) D(v’j) <= Toldel(v’j) = Toldel(vj) 2) For each child vk (sink pin or Steiner node): • D(vk) <= Toldel(vk) [e.g., D(v2) <= Toldel(v2), D(vm) <= Toldel(vm) ] V5 Do these checks only Dutt & Arslan, UIC TD Incr. Detailed Routing—Overlapping Steiner Point n1 n1 n1 T T T Vn Vn Vn V5 Vi n2 n2 Vi n2 Vi V’j Δcap Vk V5 V5 Δcap Vj Vj V’k V1 V2 n2 V3 V4 V1 V2 n2 V3 V4 V1 V2 V’j V3 (a) (b) (b) moving vj upwards will increase capacitance and resistance (c) moving vj downwards decrease cap. change Steiner node n2 V4 (c) Dutt & Arslan, UIC Constraint Satisfying DFS-Controlled Routing with Partial B&R • Adapted from [Arslan & Dutt, ICCD’04] n3..v1 nj n2 n3 nj nj n2..h1 n1 n1..b-seg n3..h1 n2..h2 n2..h2 n1..b-seg n1..b-seg n1.b-seg P1 n2.pin or obs P2 n2.h2 Pi= i-via path is explored • Exploring a richer solution space via partial bump-&-reroute (B&R) of existing nets • Constraint of minimal effect on B&R’ed nets need to be satisfied: slack staisfaction, min-WL DFS retractions: • pin or logic as obstacles • ancestor nets bumped • slack violation of current net • WL of currently re-routed net excessive • other constraints Dutt & Arslan, UIC DFS-Controlled Routing with Partial B&R n3..v1 nj n2 n3 nj nj n2..h1 n1 n2..h2 n3..h1 n2..h2 n1..b-seg Pi= i-via path is explored n1.b-seg P1 P2 n2.pin or obs n2.h2 P1 n3.v1 Dutt & Arslan, UIC DFS-Controlled Routing with Partial B&R n3..v1 nj n2 n3 nj nj n2..h1 n1 n1.b-seg n2..h2 n3..h1 n2..h2 P1 n2.pin or obs n1..b-seg Pi= i-via path is explored P2 n2.h2 P1 P1 obs n3.v1 P1 anc P2-P4 obs or anc.n1 or anc.nj Dutt & Arslan, UIC DFS-Controlled Routing with Partial B&R n3..v1 nj n2 n3 nj nj n2..h1 n1 n1.b-seg P1 n3..h1 n2..h2 n1..b-seg n2..h2 Pi= i-via path is explored P2 n2.pin or obs n2.h2 P2-P3 P1 P1 obs n3.v1 P1 anc P2-P4 obs or anc.n1 or anc.nj P1 obs n3.h1 P2-P4 obs or anc.n1 or anc.nj Dutt & Arslan, UIC DFS-Controlled Routing with Partial B&R n3..v1 nj n2 n3 nj nj n1..b-seg n2..h1 n1 P1 n3..h1 n2.h1 P2 n2.pin or obs n1..b-seg n2..h2 Pi= i-via path is explored P2 n1.b-seg n2.h2 P1 obs P2-P3 P1 P1 obs n3.v1 P1 anc P2-P4 obs or anc.n1 or anc.nj P1 obs n3.h1 P2-P4 obs or anc.n1 or anc.nj Dutt & Arslan, UIC DFS-Controlled Routing with Partial B&R n3..v1 nj n2 n3 nj nj n2..h1 P1 n3..h1 n1 P2 n1.b-seg n2.pin or obs n2..h2 n2.h1 P2 n2.h2 obs P1 obs • net lenghts and # of vias of all modified/bumped nets unchanged n3.v1 P1 anc VGL P2-P3 P1 Pi= i-via path is explored P2 P1 P2-P4 obs or anc.n1 or anc.nj P1 obs n3.h1 P2-P4 obs or anc.n1 or an.nj Dutt & Arslan, UIC Benchmark Circuits Benchmarks: 10 circuits ranging from 1643 to 10435 nets & 7200 to 47520 pins Base 2x2 tile of Mcc1 bench. is repl. with diff. cell sizes and diff. # of pins Nets randomly generated & routed using SERT/SOAR Net distribution: 2-pin: 30%, 3-4 pins: each 20%, 5 pins: 10%, 6-7 pins: each 5%, 8-10 pins: each 2%, 11-14 pins: each 1% Pin slacks normally distributed in range [0,5% max delay on net] Ran on 2.6 Ghz Pentium Linux machines, 1GB RAM Simulation: Randomly deleted 10% nets & rand. gen. 10% new nets Evaluation: Crash Test—routing as many nets as possible under the constraint of only 2 metal layers & slack satisfaction (TD-S, TD-R, TIDE) TD-S (TD-R) is SERT/SOAR overlaid on Std (R&R) Dutt & Arslan, UIC Results 1) % Unrouted Nets Times TIDE is better % Unrt.nets % Unrt.net 7x 60 40 TD-S 20 0 TIDE TD-R Ckt1 1.6k Ckt2 2.3k Ckt3 2.9k Ckt4 3.6k Ckt5 4.3k Ckt6 4.9k Ckt7 5.6k Ckt8 6.3k Ckt9 7.6k Ckt10 A vg 10.4k 5.0k % Unrt.net % U nrt.nets 80 G lob nets 45 40 35 30 25 20 15 10 5 0 6.7x TD-S TD-R TIDE Avg Ckts 9.8x 9.5x Glob nets 2) Slack Violations S lack viols 100 400 300 TD -S TD -R 200 TID E 100 0 Slack viols S lack viols Slack viols 5.6x 500 4.7x 80 5.3x 4.2x 60 40 20 0 C kt1 C kt2 C kt3 C kt4 C kt5 C kt6 C kt7 C kt8 C kt9 C kt10 A vg G lob 1.6k 2.3k 2.9k 3.6k 4.3k 4.9k 5.6k 6.3k 7.6k 10.4k 5.0k nets C kts Avg Glob nets TD-S TD-R TIDE Results 3) Average Routed Net Length Times TIDE is better Av rout net length 45 40 35 30 25 20 15 10 5 0 Av rout net length Av rout net length Av rout net length TD-S TD-R TIDE Ckt1 1.6k Ckt2 2.3k Ckt3 2.9k Ckt4 3.6k Ckt5 4.3k Ckt6 4.9k Ckt7 5.6k Ckt8 6.3k Ckt9 Ckt10 Avg 7.6k 10.4k 5.0k 6.7x 45 40 35 30 25 20 15 10 5 0 TD-S TD-R TIDE 3x Glob nets Avg Glob nets 4) Vias per New Net Vias per new net 4.4x 80 60 TD-S TD-R TIDE 40 20 0 Ckt1 1.6k Ckt2 2.3k Ckt3 2.9k Ckt4 3.6k Ckt5 4.3k Ckt6 4.9k Ckt7 5.6k Ckt8 6.3k Ckt9 7.6k Ckt10 10.4k Avg 5.0k Glob nets V ias per new net Vias per new net Vias per new net 80 70 60 50 40 30 20 10 0 2.6x TD -S TD -R TID E Avg G lob nets Ckts Dutt & Arslan, UIC Results 5) Modified Nets per New Net Times TIDE is better TD-S TD-R TIDE Ckt1 1.6k Ckt2 2.3k Ckt3 2.9k Ckt4 3.6k Ckt5 4.3k Ckt6 Ckt7 4.9k 5.6k Ckts Ckt8 6.3k Ckt9 Ckt10 Avg 7.6k 10.4k 5.0k Glob nets modif nets per new net modif nets per new net 35 30 25 20 15 10 5 0 9.5x 35 30 25 TD-S TD-R TIDE 20 15 2x 10 5 0 Avg Glob nets 6) Runtime Runtime Runtime 2000 2.4x 1000 1500 TD-S TD-R TIDE 1000 500 0 Ckt1 1.6k Ckt2 2.3k Ckt3 2.9k Ckt4 3.6k Ckt5 4.3k Ckt6 4.9k Ckts Ckt7 5.6k Ckt8 6.3k Ckt9 7.6k Ckt10 10.4k Avg 5.0k Runtime (secs) Runtime (secs) Modif nets per new net Modif nets per new net 800 TD-S TD-R TIDE 600 400 0.5x 200 0 Avg Dutt & Arslan, UIC Conclusions New TD Incremental Routing Algorithm TIDE Uses new concepts of derived tolerances @ Steiner nodes to: a) In global routing—quickly determine slack-satisfying near-min-WL connection of the next pin for a new net routing b) In detailed routing—quickly determine slack satisfaction of B&R’ed nets In global routing, the IntAl algorithm is the first in pruning trees (proven correct) for determining the next nearest connection point after the recent attempt failed. In detailed routing, high-level DFS control good routing soln. for new nets with min. impact on existing nets Produces significant improvement over TD-Std and TD-R&R in all important metrics of interest and is reasonably fast Future Work TD incremental placement and integration with TIDE Dutt & Arslan, UIC THANK YOU Dutt & Arslan, UIC