APLACE: A General and Extensible Large

Download Report

Transcript APLACE: A General and Extensible Large

APLACE: A General and
Extensible Large-Scale Placer
Andrew B. Kahng* Sherief Reda Qinke Wang
VLSI CAD Lab
UCSD CSE and ECE Departments
http://vlsicad.ucsd.edu
*Currently on leave of absence at Blaze DFM, Inc.
Goals and Plan
Goals:
• Build a new placer to win the competition
• Scalable, robust, high-quality implementation
• Leave no stone unturned / QOR on the table
Plan and Schedule:
• Work within most promising framework: APlace
• 30 days for coding + 30 days for tuning
Philosophy
Respect the competition
• Well-funded groups with decades of experience
– ABKGroup’s Capo, MLPart, APlace = all unfunded side projects
– No placement-related industry interactions
• QOR target: 24-26% better than Capo v9r6 on all known
benchmarks
– Nearly pulled out 10 days before competition
Work smart
• Solve scalability and speed basics first
– Slimmed-down data structure, -msse compiler options, etc.
• Ordered list of ~15 QOR ideas to implement
• Daily regressions on all known benchmarks
• Synthetic testcases to predict bb3, bb4, etc.
Implementation Framework
New APlace Flow
APlace weaknesses:
• Weak clustering
• Poor legalization /
detailed placement
Clustering
Global
Phase
Adaptive APlace engine
Unclustering
New APlace:
1. New clustering
2. Adaptive parameter
setting for scalability
3. New legalization +
iterative detailed
placement
Legalization
WS arrangement
Detailed
Phase
Cell order polishing
Global moving
Clustering/Unclustering
 A multi-level paradigm with clustering ratio  10
 Top-level clusters  2000
 Similar in spirit to [HuM04] and [AlpertKNRV05]
Algorithm Sketch
 For each clustering level:
 Calculate the clustering score of each node to its
neighbors based on the number of connections
 Sort all scores and process nodes in order as long as
cluster size upper bounds are not violated
 If a node’s score needs updating then update score and
insert in order
Adaptive Tuning / Legalization
Adaptive Parameterization:
1. Automatically decide the initial weight for the
wirelength objective according to the gradients
2. Decrease wirelength weight based on the current
placement process
Legalization:
1. Sort all cells from left to right: move each cell in order
(or a group of cells) to the closest legal position(s)
2. Sort all cells from right to left: move each cell in order
(or a group of cells) to the closest legal position(s)
3. Pick the best of (1) and (2)
Detailed Placement
Whitespace Compaction:
 For each layout row:
 Optimally arrange whitespace to minimize
wirelength while maintaining relative cell order.
[KahngTZ99], [KahngRM04].
Cell Order Polishing:
 For a window of neighboring cells
 Optimally arrange cell orders and whitespace to
minimize wirelength
Global Moving:
 Optimally move a cell to a better available position
to minimize wirelength
Parameterization and Parallelizing
Tuning Knobs:
 Clustering ratio, # top-level clusters, cluster area constraints
 Initial wirelength weight, wirelength weight reduction ratio
 Max # CG iterations for each wirelength weight
 Target placement discrepancy
 Detailed placement parameters, etc.
Resources:
 SDSC ROCKS Cluster: 8 Xeon CPUs at 2.8GHz
 Michigan Prof. Sylvester’s Group: 8 various CPUs
 UCSD FWGrid: 60 Opteron CPUs at 1.6GHz
 UCSD VLSICAD Group: 8 Xeon CPUs at 2.4GHz
Wirelength Improvement after Tuning : 2-3%
Artificial Benchmark Synthesis
 Synthetic benchmarks to test code scalability and
performance
 Rapid response to broadcast of s00-nam.pdf
 Created “synthetic versions of bigblue3 and
bigblue4 within 48 hours
Mimicked fixed-block layout diagrams in the artificial
benchmark creation
This process was useful: we identified (and solved) a
problem with clustering in presence of many small fixed
blocks
Results
Circuit
GP
HPWL
Leg
HPWL
DP
HPWL CPU (h)
adaptec1
80.20
81.80
79.50
3
adaptec2
84.70
92.18
87.31
3
adaptec3
218.00
230.00
218.00
10
adaptec4
182.90
194.75
187.71
13
bigblue1
93.67
97.85
94.64
5
bigblue2
140.68
147.85
143.80
12
bigblue3
357.28
407.09
357.89
22
bigblue4
813.91
868.07
833.21
50
Bigblue4 Placement
HPWL = 833.21
Conclusions
ISPD05 = an exercise in process and philosophy
At end, we were still 4% short of where we wanted
Not happy with how we handled 5-day time frame
Auto-tuning  first results ~ best results
During competition, wrote but then left out “annealing” DP
improvements that gained another 0.5%
 Students and IBM ARL did a really, really great job
 Currently restoring capabilities (congestion, timing-driven,
etc.) and cleaning (antecedents in Naylor patent)