Transcript Document

Physical Synthesis Comes of Age
Chuck Alpert, IBM Corp.
Chris Chu, Iowa State University
Paul Villarrubia, IBM Corp.
Physical Synthesis Family Tree
Synthesis
Layout
Physical
Synthesis

Roles of layout as a parent:




Clean up the mess created by physical synthesis
(Implement the netlist generated by physical synthesis)
Provide guidance to physical synthesis
so that it will do things right
Is layout mature enough to serve the role?
Is there still room for layout to grow?
2
New Requirements of Placement
1.
Super fast


2.
Stable in handling incremental placement

3.
Physical synthesis constantly makes changes to netlist
Flexible objective function

4.
4 to 8 million objects now
Provide quick feedbacks to physical synthesis to refine
the netlist
Timing, Power, Routability
Handle mixed-size modules

Hierarchical design and use of IP blocks are common
3
Placement As a Baby

Simulated annealing based placement

Popularized by Timberwolf [DAC-86]
Greedy Algorithm
Simulated Annealing
•You only have 1 chance.
•OK to make mistakes. Keep trying!
•If you get stuck, I will terminate you! •Evaluation/Feedback is important.

Strength:




Good quality for small designs
Easy to consider different objective functions
Handle incremental changes well
Weakness:


Very slow – crawling
Non-trivial to handle modules of different sizes
4
Placement As a Kid

Min-cut placement (or Partitioning-based placement)

An old idea [Breuer, DAC-77]
Circuit
Placement
Region



Strength:



Capo [DAC-00] leverages breakthrough in partitioning using multilevel technique (e.g., hMetis [DAC-97], MLFM [DAC-97])
Dragon [ICCAD-00] combines hierarchical partitioning with annealing
Efficient and scalable
Very good wirelength, but can we do better?
Weakness:



More difficult to handle other objectives
Not stable in handling incremental changes
Not good in white space management
5
White Space in Min-Cut Placement
Capo (Min-Cut)
adaptec2
HPWL=9955
APlace (Analytical)
adaptec2
HPWL=8715
Courtesy: IBM
6
Placement Maturing

Analytical placement


Strength:






Fastest and scalable
Best wirelength
Robust framework to incorporate different objectives and constraints
Stable in handling incremental changes
Good in white space management
Why would analytical placement work so well?


Used by 4 of the top 5 placers in ISPD-05 Placement Contest
and the top 5 placers in ISPD-06 Placement Contest
Can see the big picture
Why was it not popular in the past?

Hard to spread modules evenly in placement region
7
Attempt Still Relying on Partitioning

Gordian: Global Optimization and Rectangle Dissection
[TCAD-91]
Centers of mass

Artificial center of mass constraints disturb global optimal solution
too drastically
8
Another Partitioning-based Spreading

Quadratic optimization with quadrisection [Vygen,
DAC-97]
Courtesy: IBM
9
Spreading by Density-based Force

Kraftwerk [DAC-98]

f ( p)  12 pT Cp  d T p  const
 Cp  d  0
Spread cells by additional forces: Cp  d  f  0

Density-based force to push cells away from dense to sparse region


 

 r  r' 
k
f (r ) 
D(r ' )   2 dr '

2
r  r'

r
xr 
Great idea:



Quadratic wirelength minimization: Min
Spread cells smoothly
Very good wirelength
But not too fast:


Constant force, hard to control convergence
Density-based force expensive to compute
10
Dramatic Speedup

FastPlace [ISPD-04]
repeat
Solve quadratic program to minimize wirelength 
Spread the cells 
until cell distribution is roughly even 
Reduce wirelength by iterative heuristic 



Hybrid Net Model
 Speed up solving of QP 
Cell Shifting
 Simple technique to compute spreading force 
 Fast convergence due to the use of pseudo-net [Hu et al., ISPD02] instead of constant force 
Iterative Local Refinement
 More efficient than using QP to refine the solution 
 Minimize wirelength based on linear objective 
11
Linearization of Quadratic Wirelength

New Kraftwerk [ICCAD-06]

BoundingBox net model for multi-pin nets:
 Need to know the outmost pins of a net
BoundingBox
Clique
Accurately models HPWL
 Faster and less memory than clique model
Two fundamental components of spreading force:
 Hold force – Constant force
 Move force – Enforced by pseudo-net to fixed point


12
Relaxation Rather than Linearization

RQL [DAC-07]


Force Vector Modulation to FastPlace framework
Currently fastest and best wirelength
Spreading
Force
Magnitude
Rank Modules based on the
spreading force magnitude
Module Index
Nullify the spreading force
for top 5-10% of modules
13
An Alterative Analytical Approach

APlace [ISPD-04], mPL5 [ISPD-05], NTUPlace3 [ICCAD-06]
APlace
Wirelength Model
NTUP3
mPL6
Log-sum-exponential
Density potential based
Spreading Force
Objective Function

Bell-shaped
Bell-shaped
Non-linear & Non-convex
 

Quadratic
Fixed-point
based
Quadratic
Log-sum-exponential function to approximate HPWL
[Naylor et al., US Patent 2001]
lse x1 ,, xn     ln

Poisson
smoothed
RQL

xi / 
e
 max x1 ,, xn 
i 1
n
Density constraint is directed formulated into the objective function
Very competitive wirelength and runtime
14
Placement: Getting Old or Still Young?





Better approach than quadratic / analytical approach?
Massive parallelism to speed up placement
Better clustering technique
Marco placement / floorplanning
True timing driven placement
15
Sufficient Parental Guidance?


All physical synthesis gets from placement is distance info
Physical synthesis has a distorted world view!


Wirelength estimation is inaccurate
(especially for nets with high pin count)
Congestion estimation is inaccurate
Routing of a Bus
S3
S2
S1
S0
T0 T1 T2 T3
A Simple Solution
S3
S2
S1
S0
T0 T1 T2 T3
Probablistic Estimation
S3
S2
S1
S0
T0 T1 T2 T3
1 1 1
 
2 3 4
Harmonicseries  
Prob.Usage  1 

Area estimation is inaccurate


Without buffering and gate sizing
Timing estimation is very inaccurate
16
Routing-Driven Physical Synthesis

Need a more integrated approach



Main obstacle:


Past: Placement-Driven Physical Synthesis
Future: Routing-Driven Physical Synthesis
Runtime
Two possibilities:
1. Construct Steiner trees to guide synthesis and placement
2. Perform global routing to guide synthesis and placement
17
Fast Steiner Tree Construction
FLUTE (Fast LookUp Table Estimation) [ICCAD 04, ISPD 05]


An extremely fast and accurate rectilinear Steiner Tree algorithm
Very suitable for VLSI applications:


Optimal up to degree 9, Very accurate up to degree 100
Over all 1.57 million nets in 18 IBM circuits [ISPD 98]
RMST
4
3
Error (%)

RSTT
2
1
SPAN
FLUTE
BGA
0
0
20
40
60
80
Runtime (s)
100
BI1S
120
18
Is Steiner Tree Sufficient?



Steiner trees do not consider detour due to routing
congestion or buffering congestion
Can we predict the impact of congestion on routing?
There is no way for generic estimators to accurately estimate
congestion of arbitrary global routers!
ibm01
ibm02
ibm03
ibm04
ibm06
ibm07
ibm08
ibm09
ibm10
Labyrinth(70%)
#cong
238
368
247
588
367
568
486
377
501
Labyrinth(50%)
Chi Dispersion
#cong #match #cong #match
268
54
122
44
390
89
46
7
214
47
1
0
596
261
273
161
391
81
9
1
643
162
122
55
655
138
30
18
399
69
12
3
376
93
27
16
match
Congestion
by router 1
Congestion
by router 2
19
Traditional Global Routing

Simultaneous approach (e.g., ILP)


Very slow
Sequential approach





Net-by-net routing, Rip-up and Reroute
Maze routing for a net: Lee’s, Dijkstra’s, A*-search algorithms
Reasonably fast
Reasonably good quality
Is it good enough to handle the demand of physical synthesis?
20
Progresses in Global Routing

Pattern Routing [Kastner et al., ICCAD-00]



Better cost functions for maze routing [Hadsell & Madden,
DAC-03; Pan & Chu, ICCAD-06]


Much faster because of much less reliance on maze routing
Negotiated Congestion by PathFinder [FPGA-95]




Reduce overflow significantly
Congestion-driven Steiner tree construction [Pan & Chu,
ICCAD-06]


L-shaped, Z-shaped routes
Faster
Used by BoxRouter [ICCAD-07], FGA [ICCAD-07], Archer [ICCAD-07]
Excellent routing ability
Very slow because it takes a long time to build congestion history
Wanted: Techniques that are both fast and high quality
21
What Should We Do Next?

Integration of global routing into placement


An initial attempt: IPR [DAC-07]
 Integration of FastPlace, FastDP, FLUTE and FastRoute
 Significantly improves routability & wirelength in good runtime
Incorporate buffering and gate sizing into integrated
placement & routing


Much more accurate timing information
Should also help congestion and placement density control

Integration with logic synthesis

In other words, we need:


Better basic algorithms – placement, Steiner tree, global routing,
buffering, gate sizing, etc.
Clever ways of integration
It is a (EDA) family problem. Let’s work together!
22
Thank You