Buffering Interconnect From Basics to Breakthroughs
Download
Report
Transcript Buffering Interconnect From Basics to Breakthroughs
A Faster Approximation Scheme for Timing
Driven Minimum Cost Layer Assignment
Shiyan Hu*, Zhuo Li**, and Charles J. Alpert**
*Dept of ECE, Michigan Technological University
**IBM Austin Research Lab
Outline
Introduction
Problem Formulation
The Algorithm
•Linear time dynamic programming
•Bound independent oracle search
Experimental Results
Conclusion
2
Layer Assignment
4X
2X
1X
In 45nm technology, layer assignment is critical for
timing and buffer area optimization
3
Wire RC and Delay
M2
M4
M2
M6
M6
2.50E+02
Capacitance
2.00E+00
1.50E+00
1.00E+00
5.00E-01
2.00E+02
1.50E+02
1.00E+02
5.00E+01
0.00E+00
0.00E+00
90
80
70
wire delay
Resistance
M4
60
M2
M4
M6
50
40
30
Wire in higher
layer has much
smaller delay
20
10
0
0.1
0.2
0.3
0.4
wire length
0.5
0.6
0.7
4
Impact to Buffering
A buffer can
drive longer
distance in
higher layer
Timing is
improved
Fewer buffers
are needed
5
Impact to Routing/Buffering
IP
IP
6
Problem Formulation
Can be different
layers
A
layer refers to a pair of horizontal and vertical
Given
layers with similar RC characteristics
– A buffered Steiner tree
with n wire
Between
any segments
buffers, one layer is used
– early
Timingdesign
constraint
In
stage, when buffering effect is
– m wire layers
RC
considered,
wirewith
shaping
is not important [Alpert
parameters and cost
TCAD’01]
Same Layercould improve
In post-routing stage, wire shaping
timing, reduce vias and reduce coupling and so
Find a minimal cost layer assignment such that the
forth
timing constraint is satisfied.
7
Fully Polynomial Time
Approximation Scheme (FPTAS)
A Fully Polynomial Time
Approximation Scheme
• Provably good
• Within (1+ɛ) optimal
cost for any ɛ>0
• Runs in time
polynomial in n
(segments), m (layers)
and 1/ɛ
• Ultimate solution for
an NP-hard problem in
theory
• Highly practical
4X
2X
1X
8
Previous Work in ICCAD’08
It depends on M and uses a DP of O(mn3/ɛ2) time
Ratio between upper and
lower bounds of the cost of
optimal layer assignment
Bound independent oracle
query
An iterative DP with
incremental W
Our DP needs one run for
all W
New FPTAS runs in O(mn2/ɛ) time
9
The Rough Picture
W*: the cost of optimal solution
Make guess on W*
Check it
Not
Good
Good (close to W*)
Return the solution
Key 1: Efficient checking
Key 2: Smart guess
10
Key 1: Efficient Checking
Benefit of guess
• Only maintain the
solutions with cost
no greater than the
guessed cost
• Accelerate DP
11
The Oracle
Oracle (x):Setup
the checker,
able to decide whether x>W* or not
upper and lower bounds of cost W*
– Without knowing W*
– Answer efficiently
Guess x within the bounds
Oracle (x)
Update the
bounds
12
Construction of Oracle(x)
Dynamic
Programming
Only interested in
whether there is a
solution with cost
up to x satisfying
timing constraint
Scale and round
each wire cost
w
w
x / n
Perform DP to
scaled problem
with cost bound
n/ɛ. Time
polynomial in n/ɛ
13
Scaling and Rounding
0
Rounding error at each wire
costrounding
is integer
after
Wire
xɛ/n, total
error
xɛ.
rounding
with
•scaling
Larger x:and
larger
error, fewer
distinct
andn/ɛ.
faster
uppercosts
bound
Total #
•solutions
Smaller x: is
smaller
error, in
more
bounded
DP
distinct costs and slower
• Rounding is the reason of
xɛ/n
2xɛ/n
3xɛ/n
acceleration
Wire cost
4xɛ/n
14
Dynamic Programming Results
DP result w/ all w are integers n/ɛ
Yes, there is a solution
satisfying timing constraint
No, no such solution
With cost rounding
back, the solution has
cost at most n/ɛ • xɛ/n
+ xɛ= (1+ɛ)x > W*
With cost rounding
back, the solution has
cost at least n/ɛ • xɛ/n
= x W*
15
Solution Characterization
To model effect to
upstream, a
candidate solution
is associated with
•
v: a node
•
Q: required arrival
time
•
W: cumulative
wire cost
16
Cost (W)-Bounded Dynamic
Programming (DP)
Candidate solutions are
propagated toward the source
Start from sinks
Candidate solutions
are generated
Two operations
– Subtree
processing
– Solution update
at buffer
Solution Pruning
17
Subtree Processing
(Qu,Wu)
– pa: a -> u
– Pb: b -> u
– Pc: c -> u
(Qc,Wc)
(Qa,Wa)
(Qb,Wb)
Three paths
Qu(l)=min{Qa-d(pa,l),Qbd(pb,l),Qc-d(pc,l)}
Wu(l)=Wa+Wb+Wc+w(T,l)
Wires are in the same
layer l
18
Exponential # of Solutions
For two solutions at a W
node
with
the same
(=n/ɛ)
solutions
,W )
at Q
each
downstream
W, the (Qone
with smaller
is dominated
buffer
(Q ,W )
Try to konly generate
non-dominated
(Q ,W )
Naïve merging
takes
k) solutions
solutions since(Q most
of
O(W
,W )
O(Wk) time with k
,W )
are dominated(Q solutions
branches
u
a
(Qa,1,Wa,1)
(Qb,1,Wb,1)
(Qa,2,Wa,2)
(Qb,2,Wb,2)
(Qa,3,Wa,3)
(Qb,3,Wb,3)
(Qa,4,Wa,4)
(Qb,4,Wb,4)
c,1
c,1
c,2
c,2
c,3
c,3
c,4
c,4
19
Multi-Way Merging
If best Q for cost w is obtained by merging
Q(a1i1), Q(a2i2),..., Q(akik), where i1+i2+…ik=w,
best Q for cost w+1 is obtained by
max 1 r k min {Q(a1i1),Q(a2i2),..., Q(arir+1), ...,Q(akik)}
20
Four-Branch Example
Solution(w=8, Q=9) is shown.
To compute Solution (w=9, Q)
21
Four-Branch Example – Case 1
Candidate Solution (w=9, Q=8)
22
Four-Branch Example – Case 2
Candidate Solution (w=9, Q=4)
23
Four-Branch Example – Case 3
Candidate Solution (w=9, Q=5)
24
Four-Branch Example – Case 4
Candidate Solution (w=9, Q=7)
25
Linear Time Multi-Way Merging
Lemma: given a subtree with m layers, k
branches and W non-dominated solutions at each
downstream buffer, one can merge them in
O(mkW) time.
26
Solution Update at Buffer
(Qu,Wu)
(Qc,Wc)
(Qa,Wa)
(Qb,Wb)
After merging, one nondominated solution per
layer per cost, totally
O(mW) solutions
For each cost, find
largest Q for all layers
after buffer and
propagate it
27
Cost-Bounded DP
Lemma: given a tree with n wire segments and
m layers, the optimal layer assignment subject
to cost budget W=n/ɛ can be computed in
O(mnW)=O(mn2/ɛ) time.
28
Key 2: Bound Independent Guess
U (L): upper (lower) bound on W*
Naive binary search style approach
Set U and L on W*
x=(U+L)/2 and W=n/ɛ
Oracle (x)
W*<(1+ɛ)x
U= (1+ɛ)x
W* x
L= x
Runtime depends on the initial bounds U and L
29
Adapt ɛ
Rounding factor xɛ/n for cost
Larger ɛ: faster with rough estimation
Smaller ɛ: slower with accurate estimation
Adapt ɛ and relate it with U and L
30
U/L Related Scale & Round
Wire cost
U/L
0
xɛ/n xɛ/n
31
Conceptually
Begin with large ɛ’ and progressively reduce it
according to U/L as x approaches W*
• Set ɛ’ as a geometric sequence of …, 8, 4, 2, 1,
1/2, …, ɛ
• One run of DP takes about O(n/ɛ) time. Total
runtime is O(… + n/8 + n/4 + n/2 + … + n/ɛ) =
O(n/ɛ). Independent of # of iterations
32
Oracle Query Till U/L<2
Wu*,i
'
i
*
l ,i
W
1, x
W
W
*
u ,i 1
*
l ,i 1
*
u ,i
*
l ,i
W
W
O(mn
2
1 i'
3/ 4
1
*
l ,i
*
u ,i
W
W
W
W
*
l ,i
*
u ,i
) O(mn
1i t
2
'
i
1i t
W
O(mn
0 j t W
2
Wu*,i Wl*,i
*
l ,i
*
u ,i
4/3
W
W
W
W
*
l ,t
*
u ,t
W
) O(mn
W
1i t W
*
l ,i
*
u ,i
W
) O(mn
*
l ,i
*
u ,i
2
1/ 2( 4 / 3) j
*
l ,i
*
u ,i
2
0.59
1/ 2( 4 / 3) j
0 j t
( 4 / 3)t i
1/ 2( 4 / 3)t i
)
) O(mn )
2
33
When U/L<2
Scale and round each cost by Lɛ/n
W=2n/ɛ
Run DP
At least one
feasible solution,
otherwise no
solution w/ cost
2n/ɛ • Lɛ/n = 2L
U
Runs in O(mn2/ɛ)
time
Pick min cost solution satisfying
timing at driver
34
FPTAS for Layer Assignment
Theorem: a (1+ ɛ) approximation to the timing
constrained minimum cost layer assignment
problem can be computed in O(mn2/ɛ) time for
any ɛ>0.
35
The Algorithmic Flow
Set U and L of W*
Adapting ɛ =[U/L-1]1/2
Update U or L
Set x=[UL/(1+ ɛ)]1/2
Oracle (x)
U/L<2
Compute final solution
36
Experiments
Experimental Setup
– 1000 industrial nets
Compared to Dynamic Programming
and the previous FPTAS [ICCAD’08]
37
Cost Ratio Compared to DP
Old FPTAS
New FPTAS
0.4
0.3
0.2
0.1
Approximation Ratio ɛ
0.
5
0.
4
0.
3
0.
2
0.
1
0
0.
05
Wire Cost Ratio
0.5
38
Speedup Compared to DP
Old FPTAS
New FPTAS
7
6
5
3
2
1
Approximation Ratio ɛ
0.
5
0.
4
0.
3
0.
2
0.
1
0
0.
05
Speedup
4
39
Observations
FPTAS always achieves the theoretical guarantee
Larger ɛ leads to more speedup
3.9x faster with 2.2% additional wire area compared to DP
Up to 6.5x faster than DP
On average about 2x faster than previous FPTAS
40
Conclusion
Propose a (1+ ɛ) approximation for timing
constrained layer assignment for any ɛ > 0 running
in O(mn2/ɛ) time
– Linear time DP running in O(mnW) time
– Bound independent oracle query
– Up to 6.5x faster than DP and 2x faster than
previous FPTAS
– Few percent additional wire area compared to
DP as guaranteed theoretically
41
Thanks
42