Buffering Interconnect From Basics to Breakthroughs

Transcript Buffering Interconnect From Basics to Breakthroughs

A Faster Approximation Scheme for Timing
Driven Minimum Cost Layer Assignment
Shiyan Hu*, Zhuo Li**, and Charles J. Alpert**
*Dept of ECE, Michigan Technological University
**IBM Austin Research Lab
Outline
Introduction
Problem Formulation
The Algorithm
•Linear time dynamic programming
•Bound independent oracle search
Experimental Results
Conclusion
2
Layer Assignment
4X
2X
1X

In 45nm technology, layer assignment is critical for
timing and buffer area optimization
3
Wire RC and Delay
M2
M4
M2
M6
M6
2.50E+02
Capacitance
2.00E+00
1.50E+00
1.00E+00
5.00E-01
2.00E+02
1.50E+02
1.00E+02
5.00E+01
0.00E+00
0.00E+00
90
80
70
wire delay
Resistance
M4
60
M2
M4
M6
50
40
30
Wire in higher
layer has much
smaller delay
20
10
0
0.1
0.2
0.3
0.4
wire length
0.5
0.6
0.7
4
Impact to Buffering

A buffer can
drive longer
distance in
higher layer


Timing is
improved
Fewer buffers
are needed
5
Impact to Routing/Buffering
IP
IP
6
Problem Formulation
Can be different
layers



A
layer refers to a pair of horizontal and vertical
Given
layers with similar RC characteristics
– A buffered Steiner tree
with n wire
Between
any segments
buffers, one layer is used
– early
Timingdesign
constraint
In
stage, when buffering effect is
– m wire layers
RC
considered,
wirewith
shaping
is not important [Alpert
parameters and cost
TCAD’01]
Same Layercould improve
 In post-routing stage, wire shaping
timing, reduce vias and reduce coupling and so
 Find a minimal cost layer assignment such that the
forth
timing constraint is satisfied.

7
Fully Polynomial Time
Approximation Scheme (FPTAS)
A Fully Polynomial Time
Approximation Scheme
• Provably good
• Within (1+ɛ) optimal
cost for any ɛ>0
• Runs in time
polynomial in n
(segments), m (layers)
and 1/ɛ
• Ultimate solution for
an NP-hard problem in
theory
• Highly practical

4X
2X
1X
8
Previous Work in ICCAD’08

It depends on M and uses a DP of O(mn3/ɛ2) time
Ratio between upper and
lower bounds of the cost of
optimal layer assignment
Bound independent oracle
query

An iterative DP with
incremental W
Our DP needs one run for
all W
New FPTAS runs in O(mn2/ɛ) time
9
The Rough Picture
W*: the cost of optimal solution
Make guess on W*
Check it
Not
Good
Good (close to W*)
Return the solution
Key 1: Efficient checking
Key 2: Smart guess
10
Key 1: Efficient Checking
Benefit of guess
• Only maintain the
solutions with cost
no greater than the
guessed cost
• Accelerate DP
11
The Oracle

Oracle (x):Setup
the checker,
able to decide whether x>W* or not
upper and lower bounds of cost W*
– Without knowing W*
– Answer efficiently
Guess x within the bounds
Oracle (x)
Update the
bounds
12
Construction of Oracle(x)
Dynamic
Programming
Only interested in
whether there is a
solution with cost
up to x satisfying
timing constraint
Scale and round
each wire cost
 w 
w
 x / n 
Perform DP to
scaled problem
with cost bound
n/ɛ. Time
polynomial in n/ɛ
13
Scaling and Rounding
0
Rounding error at each wire
costrounding
is integer
after
Wire
xɛ/n, total
error
xɛ.
rounding
with
•scaling
Larger x:and
larger
error, fewer
distinct
andn/ɛ.
faster
uppercosts
bound
Total #
•solutions
Smaller x: is
smaller
error, in
more
bounded
DP
distinct costs and slower
• Rounding is the reason of
xɛ/n
2xɛ/n
3xɛ/n
acceleration
Wire cost
4xɛ/n
14
Dynamic Programming Results
DP result w/ all w are integers  n/ɛ
Yes, there is a solution
satisfying timing constraint
No, no such solution
With cost rounding
back, the solution has
cost at most n/ɛ • xɛ/n
+ xɛ= (1+ɛ)x > W*
With cost rounding
back, the solution has
cost at least n/ɛ • xɛ/n
= x  W*
15
Solution Characterization

To model effect to
upstream, a
candidate solution
is associated with
•
v: a node
•
Q: required arrival
time
•
W: cumulative
wire cost
16
Cost (W)-Bounded Dynamic
Programming (DP)



Candidate solutions are
propagated toward the source
Start from sinks
Candidate solutions
are generated
Two operations
– Subtree
processing
– Solution update
at buffer

Solution Pruning
17
Subtree Processing

(Qu,Wu)
– pa: a -> u
– Pb: b -> u
– Pc: c -> u
(Qc,Wc)
(Qa,Wa)
(Qb,Wb)
Three paths



Qu(l)=min{Qa-d(pa,l),Qbd(pb,l),Qc-d(pc,l)}
Wu(l)=Wa+Wb+Wc+w(T,l)
Wires are in the same
layer l
18
Exponential # of Solutions

For two solutions at a W
node
with
the same
(=n/ɛ)
solutions
,W )
at Q
each
downstream
W, the (Qone
with smaller
is dominated
buffer
(Q ,W )
Try to konly generate
non-dominated
(Q ,W )
 Naïve merging
takes
k) solutions
solutions since(Q most
of
O(W
,W )
O(Wk) time with k
,W )
are dominated(Q solutions
branches
u

a
(Qa,1,Wa,1)
(Qb,1,Wb,1)
(Qa,2,Wa,2)
(Qb,2,Wb,2)
(Qa,3,Wa,3)
(Qb,3,Wb,3)
(Qa,4,Wa,4)
(Qb,4,Wb,4)
c,1
c,1
c,2
c,2
c,3
c,3
c,4
c,4
19
Multi-Way Merging

If best Q for cost w is obtained by merging
Q(a1i1), Q(a2i2),..., Q(akik), where i1+i2+…ik=w,
best Q for cost w+1 is obtained by
max 1  r  k min {Q(a1i1),Q(a2i2),..., Q(arir+1), ...,Q(akik)}
20
Four-Branch Example
Solution(w=8, Q=9) is shown.
To compute Solution (w=9, Q)
21
Four-Branch Example – Case 1
Candidate Solution (w=9, Q=8)
22
Four-Branch Example – Case 2
Candidate Solution (w=9, Q=4)
23
Four-Branch Example – Case 3
Candidate Solution (w=9, Q=5)
24
Four-Branch Example – Case 4
Candidate Solution (w=9, Q=7)
25
Linear Time Multi-Way Merging

Lemma: given a subtree with m layers, k
branches and W non-dominated solutions at each
downstream buffer, one can merge them in
O(mkW) time.
26
Solution Update at Buffer

(Qu,Wu)
(Qc,Wc)

(Qa,Wa)
(Qb,Wb)
After merging, one nondominated solution per
layer per cost, totally
O(mW) solutions
For each cost, find
largest Q for all layers
after buffer and
propagate it
27
Cost-Bounded DP

Lemma: given a tree with n wire segments and
m layers, the optimal layer assignment subject
to cost budget W=n/ɛ can be computed in
O(mnW)=O(mn2/ɛ) time.
28
Key 2: Bound Independent Guess


U (L): upper (lower) bound on W*
Naive binary search style approach
Set U and L on W*
x=(U+L)/2 and W=n/ɛ
Oracle (x)
W*<(1+ɛ)x
U= (1+ɛ)x

W*  x
L= x
Runtime depends on the initial bounds U and L
29
Adapt ɛ

Rounding factor xɛ/n for cost

Larger ɛ: faster with rough estimation

Smaller ɛ: slower with accurate estimation

Adapt ɛ and relate it with U and L
30
U/L Related Scale & Round
Wire cost
U/L
0
xɛ/n xɛ/n
31
Conceptually

Begin with large ɛ’ and progressively reduce it
according to U/L as x approaches W*
• Set ɛ’ as a geometric sequence of …, 8, 4, 2, 1,
1/2, …, ɛ
• One run of DP takes about O(n/ɛ) time. Total
runtime is O(… + n/8 + n/4 + n/2 + … + n/ɛ) =
O(n/ɛ). Independent of # of iterations
32
Oracle Query Till U/L<2
Wu*,i
 
'
i
*
l ,i
W
 1, x 
W
 
W
*
u ,i 1
*
l ,i 1
*
u ,i
*
l ,i
W
W
O(mn
2




1   i'
3/ 4

1
*
l ,i
*
u ,i
W
W
W
 
W
*
l ,i
*
u ,i
  )  O(mn 
1i t
2
'
i
1i t
W
O(mn  
0 j t  W
2
Wu*,i Wl*,i
*
l ,i
*
u ,i




4/3
W
W
W
 
W
*
l ,t
*
u ,t
W
)  O(mn  
W
1i t  W
*
l ,i
*
u ,i
W
)  O(mn
*
l ,i
*
u ,i
2
1/ 2( 4 / 3) j





*
l ,i
*
u ,i
2
 0.59
1/ 2( 4 / 3) j
0 j t




( 4 / 3)t i
1/ 2( 4 / 3)t i




)
)  O(mn )
2
33
When U/L<2
Scale and round each cost by Lɛ/n

W=2n/ɛ
Run DP

At least one
feasible solution,
otherwise no
solution w/ cost
2n/ɛ • Lɛ/n = 2L
U
Runs in O(mn2/ɛ)
time
Pick min cost solution satisfying
timing at driver
34
FPTAS for Layer Assignment
 Theorem: a (1+ ɛ) approximation to the timing
constrained minimum cost layer assignment
problem can be computed in O(mn2/ɛ) time for
any ɛ>0.
35
The Algorithmic Flow
Set U and L of W*
Adapting ɛ =[U/L-1]1/2
Update U or L
Set x=[UL/(1+ ɛ)]1/2
Oracle (x)
U/L<2
Compute final solution
36
Experiments

Experimental Setup
– 1000 industrial nets
Compared to Dynamic Programming
and the previous FPTAS [ICCAD’08]

37
Cost Ratio Compared to DP
Old FPTAS
New FPTAS
0.4
0.3
0.2
0.1
Approximation Ratio ɛ
0.
5
0.
4
0.
3
0.
2
0.
1
0
0.
05
Wire Cost Ratio
0.5
38
Speedup Compared to DP
Old FPTAS
New FPTAS
7
6
5
3
2
1
Approximation Ratio ɛ
0.
5
0.
4
0.
3
0.
2
0.
1
0
0.
05
Speedup
4
39
Observations

FPTAS always achieves the theoretical guarantee

Larger ɛ leads to more speedup

3.9x faster with 2.2% additional wire area compared to DP

Up to 6.5x faster than DP

On average about 2x faster than previous FPTAS
40
Conclusion

Propose a (1+ ɛ) approximation for timing
constrained layer assignment for any ɛ > 0 running
in O(mn2/ɛ) time
– Linear time DP running in O(mnW) time
– Bound independent oracle query
– Up to 6.5x faster than DP and 2x faster than
previous FPTAS
– Few percent additional wire area compared to
DP as guaranteed theoretically
41
Thanks
42

Buffering Interconnect From Basics to Breakthroughs

Transcript Buffering Interconnect From Basics to Breakthroughs

Directory