Tool Development for Multi

Download Report

Transcript Tool Development for Multi

Capo: Robust and Scalable
Open-Source Min-cut Floorplacer
http://vlsicad.eecs.umich.edu/BK/PDtools/tar.gz/LATEST/
Jarrod A. Roy, David A. Papa,Saurabh N. Adya,
Hayward H. Chan, James F. Lu, Aaron N. Ng,
Igor L. Markov
EECS Department
University of Michigan, Ann Arbor
Credits for original Capo: Caldwell, Kahng & Markov
Original Motivation (ca 2000)
• Co-developed at UCLA with Andy Caldwell
under the guidance of Andrew Kahng
– First fixed-die placer in the literature
– First academic placer using
multi-level partitioning (MLPart)
– First academic placer in the US to compete heads-on
with commercial placers on large industrial netlists
– All code written from scratch
• Capo - an experiment in min-cut placement
• DAC 2000: “Can Recursive Bisection Alone
Produce Routable Placements ?”
– Message: minimizing HPWL is not enough
Min-cut Bisection
Basic Components & Techniques
• Overall runtime O(P (log P)2)
• Three min-cut partitioners
– MLPart and FMPart (heur.) & BBPart (optimal)
ASPDAC 2000, JEA 2000, ISPD `98/TCAD `01
– Capo makes several different calls
to MLPart every time (some are Vcycling)
• Shifting cut-lines (not a grid): ASIC `98
• Optimal end-case placers (B&B): ISPD `98
– Also used in detail placement “RowIroning”
• Uniform whitespace allocation: TCAD `03
• Non-uniform whitespace allocation: ICCAD `03
• Feedback / cycling (Kahng & Reda, DAC `04)
Capo’s Distinctive Features
(can be found in some industry tools,
but rarely in academic placers)
• Global placement with Capo
often produces legal placements
– No cell-shifting / legalization is necessary
Very low
– Top-down estimates of cell locations and
via counts
interconnect are very accurate
– This seems to improve “generic routability”
– Ensured robust handling of obstacles since 2000
• Can make any netlist routable by more
generous floorplan (greater whitespace)
– Request uniform whitespace distribution!
The Integration of Block-packing
(using Parquet) ICCAD `04
Advanced Details
• Weighted terminal propagation
– Described in a TR by Karypis & Selvakkumaran
– More accurate capture of HPWL in min-cut partitioning
– Nets with terminals too close to cut-line have cost <1
– Some orig. nets are modeled by two nets for min-cut
• To improve terminal propagation, use
an SOR-based quadratic placer
after every round of partitioning
– Also tried ACG (Alpert, Nam & Villarrubia; ICCAD`03)
• Look-ahead criteria for Parquet were sharpened
Recent Improvements
• Whitespace allocation rewritten entirely
– New params: -minLocalWS and -safeWS
• Cutline positioning relative to obstacles
– “Feature locations” are corners of fixed macros
– Among those, we minimize cutlength
– Cutline direction is selected based on cutlength
• MLPart & FMPart are 2x faster (loop unroling)
– Also, bugfix in Vcycling
– Also, variable-effort partitioning in Capo
• Stronger, much faster RowIroning
• Stronger macro legalizer, more scalable cell leg.
• Meta-options: -faster and -tryHarder
Large-scale Visualization
• Example: different whitespace allocation modes
• Plots of large designs with data compression
– Less than 1 bit per cell
– The plotter is in the GSRC bookshelf under PlaceUtils
Capo Memory & Runtime Data
with –faster (Opteron @2.8GHz, Linux)
Memory Usage (MB)
6000
5000
32 Bit Mode
4000
64 Bit Mode
7h40m
3000
2000
1000
17 mins
0
0.00E+00 1.00E+06 2.00E+06 3.00E+06 4.00E+06 5.00E+06 6.00E+06 7.00E+06 8.00E+06 9.00E+06 1.00E+07
Pin Count
Example of Self-profiling in Capo
Adaptec 1 (250K)
MLPart took:
396.06sec (51.58%)
FMPart took:
38.29sec (4.99%)
BBPart took:
20.13sec (2.62%)
SmPlace took:
139.14sec (18.12%)
ProblemSetup took
151.01sec (19.67%)
SmPlProbSetup took:
4.85sec (0.63%)
Level Stats took:
10.79sec (1.41%)
Total runtime of measured components:
760.43sec (99.03%)
Big Blue 4 (2.1M)
MLPart took:
9360.9sec (39.25%)
FMPart took:
661.4sec (2.77%)
BBPart took:
255.3sec (1.07%)
SmPlace took:
1727.6sec (7.24%)
ProblemSetup took: 11365.5sec (47.66%)
SmPlProbSetup took: 133.5sec (0.56%)
Level Stats took:
151.2sec (0.63%)
Total runtime of measured components:
23657.1sec (99.2%)
RowIroning takes additional 10-15% by runtime
Infrastructure Available
in GSRC Bookshelf (April 2005)
• Source code & binaries
–
–
–
–
–
Converters: Bookshelf to/from LEFDEF
Gnuplotter with data compression
Macro & cell legalizer
Improved stand-alone RowIroning
Complete Capo 9.1 (with MLPart)
for Linux (32/64), Solaris (32/64) and Windows
http://vlsicad.eecs.umich.edu/BK/PDtools/
• Compatible with OpenAccess
• Ongoing work on further improvements