Implementation and Extensibility of an Analytic Placer UCSD CSE Department

Transcript Implementation and Extensibility of an Analytic Placer UCSD CSE Department

Implementation and Extensibility of
an Analytic Placer
Andrew B. Kahng and Qinke Wang
UCSD CSE Department
{abk, qiwang}@cs.ucsd.edu
Work partially supported by Cadence Design Systems, Inc., the California
MICRO program, the MARCO Gigascale Silicon Research Center, NSF
MIP-9901174 and the Semiconductor Research Corporation.
Motivation
• Automated placement: always critical
– new challenges: larger design sizes, shorter
turnaround times, a variety of additional physical
and geometrical constraints, etc.
• New analytical methods: simultaneously
spread cells and optimize wirelength
– force-directed placement [Eisenmann et al. 98]
– cell attracting and repelling (ARP)
[Etawil et al. 99]
• Problem: wirelength is easily damaged by
improper forces and attractors
Our Contribution
• A novel objective function for spreading cells is
proposed recently [Naylor et al. 01]
• We implement an analytic placer (APlace)
based on this idea
– study characteristics of the objective function
– extend the objective function with congestion
information
– implement a top-down multi-level placer:
WL outperforms that of QPlace, Dragon and Capo
– extend the placer to perform I/O-core co-placement
for area-array I/O designs
– extend the placer with constraint handling for mixedsignal designs
Outline
•
•
•
•
Problem Formulation
Implementation & Results
Extensions
Conclusion and Ongoing Work
Outline
• Problem Formulation
– cell spreading = density control
– wirelength minimization
• Implementation & Results
• Extensions
• Conclusion and Ongoing Work
Cell Spreading (I)
• Common strategy
– divide the placement area into grids
– equalize the total cell area in each grid
• Penalty of an uneven cell distribution
– not smooth or differentiable
– difficult to optimize
Cell Spreading (II)
• A bell-shaped cell potential function
[Naylor et al. US Patent 2001]
• Cell c has potential(c, g) with respect to grid g
p(d)
•
•
•
•
•
Cell c at (CellX, CellY) has area A
Grid point g = (GridX, GridY)
p(d) : bell-shaped function
r : the radius of cells' potential r
C : a proportionality factor, s.t.
1-2d2/r2
2(r-d)2/r2
d
r/2
r/2
r
Cell Spreading (III)
• Penalty function
– conjugate gradient solver
– stop when max movement of
any cell between iterations is
small
– Discrepancy(A)
– max ratio of actual total cell
area to expected cell area
over all windows with area A
– measure evenness of cell
distribution
– disc = Discrepancy(1% area)
EXPERIMENT: Cell distribution
results with different number of grids
and cell potential radii (r's) for the
ibm01-easy circuit.
grids
10
20
30
40
50
60
70
80
90
100
disc
1.656
1.159
1.121
1.125
1.079
1.063
-
r=2
iter CPU (s)
50
56
29
30
46
47
83
81
235 206
758 695
-
disc
1.890
2.799
1.162
1.089
1.128
1.111
1.077
1.083
1.083
1.081
r=4
iter CPU (s)
183 399
113
237
46
108
42
92
54
116
56
123
83
190
111
235
167 343
273 577
Outline
• Problem Formulation
– cell spreading = density control
– wirelength minimization
• Implementation & Results
• Extensions
• Conclusion and Ongoing Work
Wirelength Formulation (I)
• Linear vs. quadratic objective functions
• Approximation of linear objectives
– precise
– continuously differentiable
• Previous works
– Gordian-L objective [Sigl et al. 91]
– α-order objective function [Lillis et al. 95]
– convex approximations of HPWL
[Alpert et al. 98] [Baldick et al. 99] [Kennings
and Markov 00]
Wirelength Formulation (II)
• Approximation of HPWL [Naylor et al. 01]
– log-sum-exp formula: pick the most dominant
terms among pin coordinates
–  : smoothing parameter
Wirelength Formulation (III)
• Experiments
– init HPWL = 7.311
– 300 iterations
– α smaller
 wirelength
formulation more
accurate
– α larger
 WL minimized
more quickly, and
smaller final HPWL
EXPERIMENT: Wirelength minimization
results with different smoothing parameters
(α's) for the ibm01-easy circuit.
grids
10
20
30
40
50
60
70
80
90
100
alpha init WL final HPWL
3336
1668
1112
834
667
556
476
417
370
333.6
7.533
7.369
7.337
7.326
7.321
7.318
7.316
7.315
7.314
7.314
0.803
0.836
0.913
1.274
1.400
1.499
1.583
1.653
1.712
1.764
Outline
• Problem Formulation
• Implementation & Results
– Conjugate gradient optimizer
– Control factors
– Top-down hierarchical algorithm
– Placement results
• Extensions
• Conclusion and Ongoing Work
Conjugate Gradient Optimizer
• A series of line minimizations
– one-dimensional function minimization along
some search direction
• gk : the gradient f(xk)
• dk : the search direction
• sk : a step length obtained by a Golden Section
search algorithm
• k : ensures that dk is the conjugate direction when
the function is quadratic and the line search finds the
exact minimum along the direction
– Polak-Ribiere formula
Control Factors
• Weights of wirelength and density objectives
– density weight
• larger: spread the cells out hastily without a good wirelength
– wirelength weight
• larger: contract cells together and prevents them from spreading out
Control Factors
• Weights of wirelength and density objectives
– density weight: fixed
• larger: spread the cells out hastily without a good wirelength
– wirelength weight
• larger: contract cells together and prevents them from spreading out
• set to be large in the beginning
• divided by 2 when the solver slows down and an optimal solution appears
• repeat until cells are spread evenly over the placement area
• #grids
– coarser grids at the beginning: spread out the cells faster
– finer grids at the final stages: a more even distribution
Top-Down Multi-Level Algorithm
• A hierarchy of clusters
– MLPart [Caldwell et al. 99]
• Coarse grid: average cluster size
• Density penalty
– regard each cluster as a macro cell
– area of the macro cell = total area of the cluster
• Wirelength
– cells: at center of clusters
Discrepancy and Wirelength
Discrepancy as a function of
iterations for the ibm01-easy circuit.
HPWL as a function of iterations for
the ibm01-easy circuit.
7
20
6x10
15
7
HPWL
Discrepancy
5x10
10
7
4x10
5
7
0
3x10
0
500
1000
Iteration
1500
2000
2500
0
500
1000
Iteration
1500
2000
2500
Placement Process
• Iter 100
– WL: 4.06E5
– disc: 10.69
• Iter 200
– WL: 5.05E5
– disc: 4.17
• Iter 300
– WL: 4.04E5
– disc: 2.53
• Iter 400
– WL: 4.31E5
– disc: 1.86
Legalization
• A simple Tetris legalization algorithm
[Hill 02]
– sort cells according to vertical coordinates
– from left to right, search the current nearest
available position for each cell
– fast
– increases WL by 5% on average for IBMPLACE 2.0 circuits
• Orientation optimization and row ironing
[UCLApack]
Placement Results
Placement results of APlace for eight IBM-PLACE 2.0 circuits.
IBM-Place 2.0
ckts
cells
ibm01_easy 12282
ibm01_hard 12028
ibm02_easy 19321
ibm02_hard 19062
ibm07_easy 45135
ibm07_hard 44811
ibm08_easy 50977
ibm08_hard 50672
QPlace Dragon Capo
nets WL_l WL_l WL_l
11507 0.59
0.57
0.57
11507 0.56
0.55
0.56
18429 1.56
1.60
1.60
18429 1.52
1.47
1.56
44394 3.72
3.66
3.71
44394 3.70
3.44
3.56
47944 3.95
3.61
3.93
47944 3.85
3.45
3.90
Aplace 1.0
WL WL_l disc iter CPU (m)
0.48 0.52 1.19 1098 12.6
0.46 0.50 1.18 1006 21.2
1.41 1.45 1.12 1097 30.3
1.38 1.44 1.11 1208 32.5
3.17 3.29 1.14 968
63.8
3.09 3.24 1.15 968
50.8
3.51 3.65 1.11 887
75.4
3.45 3.68 1.11 806
55.3
• Comparison (HPWL)
– Cadence QPlace (SE5.4): 9.0% (4.5% ~ 12.7%)
– UCLA Dragon (2002): 4.8% (-6.5% ~ 10.2%)
– Capo (v8.7): 8.7% (5.7% ~ 11.4%)
• Comparison (Running Time)
– Xeon server (2.4GHz CPU, double-threaded)
– faster than Dragon (0.8X), much slower than Capo (13.2X)
Outline
• Problem Formulation
• Implementation & Results
• Extensions
– Congestion-directed placement
– IO-core co-placement
– Constraint handling
• Conclusion and Ongoing Work
Congestion-Directed Placement (I)
• Accurate bend-based congestion estimator
[Kahng and Xu, SLIP-03]
• Congestion-directed placement
– ExpPotential(g)
• expected total potential at grid point g
• reduced, if g is congested
•  : congestion adjustment factor
Congestion-Directed Placement (II)
• Experiments
– routability
• WL in gcell grid
• # over-capacity gcells
– routability 38%
better with  = 0.05
– routability
deteriorates with
larger 
Placement and global routing results
with varying congestion adjustment
factors ( 's) for the ibm01-hard circuit.

0.00
0.02
0.04
0.05
0.06
0.08
0.10
WL
0.459
0.462
0.464
0.474
0.477
0.486
0.507
Placement
WL_l disc
0.502 1.18
0.505 1.18
0.509 1.23
0.523 1.26
0.529 1.28
0.541 1.36
0.562 1.56
Iter
1006
997
1006
1086
1086
1006
804
Global Routing
WL over-cap
0.119
4035
0.118
3488
0.119
3249
0.120
2486
0.121
2576
0.123
2806
0.129
3350
Experimental Results
Placement and routing results of APlace for eight IBM-PLACE
2.0 circuits with comparison to QPlace, Dragon and Capo.
Placer Placement
Routing
Placer Placement
Routing
WL CPU violations WL
vias
CPU
WL CPU violations WL
vias
CPU
Ckts
Ckts
ibm01e QPlace 0.59 3
0
0.84 138563 58 ibm07e QPlace 3.72 12
0
4.61 572512 98
Dragon 0.57 27
0
0.86 141304 60
Dragon 3.66 65
0
4.58 569087 103
Capo 0.57 1
587
0.85 146706 1446*
Capo 3.71 7
42
4.93 599806 996
APlace 0.53 23
0
0.75 139134 100
APlace 3.30 68
0
3.99 532963 74
ibm01h QPlace 0.56 3
0
0.80 138593 82 ibm07h QPlace 3.70 12
0
5.04 617942 184
Dragon 0.55 26
0
0.80 139993 90
Dragon 3.44 66
15
4.63 606561 135
Capo 0.56 1
1029 0.84 173715 1446*
Capo 3.56 8
1799 5.14 631456 1483*
APlace 0.51 22
0
0.71 138745 82
APlace 3.26 55
0
4.10 547398 96
• Comparison (Routed WL)
– Cadence QPlace (SE5.4): 8.2%
– UCLA Dragon (2002): 4.2%
– Capo (v8.7): 10.4%
• With orientation optimization and row ironing
– Cadence QPlace (SE5.4): 12.0%
– UCLA Dragon (2002): 8.1%
– Capo (v8.7): 14.1%
I/O-Core Co-Placement
• Peripheral I/O
– constrained clock/power distribution
– coupling and power issues for off-chip signaling
• Area-array I/O
– improved pad count and reliability
– reduced noise coupling
• Simultaneous I/O and core placement
– I/Os are spread over the placement area, in the
same way and at the same time as core cells
– DensityWeight * DensityPenalty +
IODensityWeight * IODensityPenalty
I/O-Core Co-Placement Results
I/O-core co-placement results
with different number of I/Os.
IBM-Place v.2
Ckls
I/Os
ibm01e
0
400
1000
ibm02e
0
400
1000
ibm07e 0
400
1000
ibm08e
0
400
1000
APlace with IO-Core Co-Placement
WL WL_l IO disc disc iter CPU (m)
0.48 0.52
1.19 1098
12.6
0.48 0.52
1.50 1.24 1144
24.9
0.50 0.54
1.34 1.31 1274
28.4
1.41 1.45
1.12 1097
30.3
1.36 1.41
1.62 1.09 1107
30.2
1.45 1.50
1.36 1.16 1279
34.4
968
3.17 3.29
1.14
63.8
3.36 3.47
1.70 1.14 1049
63.5
3.32 3.43
1.55 1.14 1049
58.8
3.51 3.65
1.11 887
75.4
3.59 3.83
1.55 1.12 887
55.5
3.55 3.73
1.20 1.11 887
61.0
• Randomly select 400 or 1000 cells
and regard them as I/Os
• I/Os: distributed fairly evenly
• WL, disc of core cell distribution,
and running times: not seriously
impaired
I/O-core co-placement with 400
I/Os for ibm01-easy circuit.
Placement with Geometric Constraints
• Mixed-signal ASIC designs: parasitic effects
– a large number of constraints
• Constraints in APlace: convert to penalty functions
– alignment constraint, e.g.
– spacing constraint, e.g.
– axial symmetry, e.g.
– nodal symmetry, e.g.
Constraint Handling Results
Placement results of APlace with
90 artificial geometric constraints.
IBM-Dragon
Ckts
ibm01e
ibm02e
ibm07e
ibm08e
•
•
•
•
APlace with Constraints
WL WL_l disc iter CPU (m)
0.54 0.57 1.17 1309 57.3
1.50 1.55 1.10 1208 70.2
3.36 3.48 1.16 1049 122.8
3.78 3.99 1.12 887 129.9
Average WL increase: 8.2%
Blue: Alignments
Red: Nodal Symmetries + Spacing
Black: Axial Symmetries + Spacing
Placement of APlace with 90
artificial geometric constraints
for ibm01-easy circuit.
Conclusion and Ongoing Work
• Implemented and conducted in-depth analysis
of characteristics and results of APlace
– placed and routed wirelengths outperform QPlace,
Capo and Dragon.
• Extended the basic formulation
– top-down hierarchical placement, congestiondirected placement, I/O-core co-placement, and
constraint handling
• Ongoing work:
– timing-driven placement
– mixed-size placement
Thanks
Prof. C.-K. Cheng, UCSD
Bo Yao, UCSD
Prof. Igor Markov, Michigan
Saurabh Adya, Michigan
Shubhyant Chaturvedi, Michigan
Prof. C.-K. Koh, Purdue
Chen Li, Purdue
Prof. Andrew Kennings, Waterloo
Thank You !

Implementation and Extensibility of an Analytic Placer UCSD CSE Department

Transcript Implementation and Extensibility of an Analytic Placer UCSD CSE Department

Directory