ppt - University of Michigan
Download
Report
Transcript ppt - University of Michigan
© KLMH
EECS 527 Paper Presentation
Techniques for Fast Physical Synthesis
By Charles J. Alpert, Shrirang K. Karandikar,
Zhuo Li, Gi-Joon Nam, Stephen T. Quay,
Haoxing Ren, C. N. Sze, Paul G. Villarrubia,
and Mehmet C. Yildiz
Presented by Lingfeng Xu
Department Electrical Engineering and Computer Science
University of Michigan, Ann Arbor
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
1
Lienig
11/2011
© KLMH
EECS 527 Paper Presentation
Outlines
Introduction
Buffering Trends
Major Phases of Physical Synthesis
Closer Look at Optimization
Selected Techniques
Fast Timing-Driven Buffering
Layout Aware Buffer Trees
Diffusion Based Legalization
Q&A
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
2
Lienig
© KLMH
EECS 527 Paper Presentation
Introduction
Purpose of physical synthesis
Timing closure
Physical synthesis
Iterations
Iterate between manual design work and automatic physical synthesis
Philosophy
As fast as possible even if a little optimality is sacrificed
IBM’s physical synthesis tool
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
3
Lienig
PDS (Placement-Driven Synthesis) system
© KLMH
EECS 527 Paper Presentation
Buffering trends
“Buffering Explosion”
Thiner wires == resistance increase
Wire delays increasingly dominate gate delays
Saxena et al. [3] predict that half of all logic will consist of buffers
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
4
Lienig
20% - 25% buffers or inverters in today’s 90nm design
© KLMH
Percentage of block-level nets
requiring repeaters [3]
Intra-block communication
repeaters as a percentage of
the total cell count for the block
[3]
Chapter 5: Global Routing
5
Lienig
VLSI Physical Design: From Graph Partitioning to Timing Closure
© KLMH
EECS 527 Paper Presentation
Buffering trends
Challenges
Buffer insertion need to be performed fast
Area and Power
Layout awareness
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
6
Lienig
Buffering constricts or seeds global routing
© KLMH
EECS 527 Paper Presentation
Major Phase of Physical Synthesis
PDS stages
Initial placement and optimization
Timing-driven placement and optimization
Timing-driven detailed placement
Optimization techniques
Clock insertion and optimization
Routing and post routing optimization
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
7
Lienig
Early-mode timing optimization
EECS 527 Paper Presentation
Closer look at Optimization
Optimization phases
Electrical correction
Critical path optimization
Histogram compression
•
•
•
•
© KLMH
Phase 1
Initial Placement
Electrical Correction
Legalization
Critical Slack Optimization
Phase 2
•
•
•
•
•
•
Timing-driven Placement
Electrical Correction
Critical Slack Optimization
Legalization
Compression
Legalization
Legalization
Phase 3
• Timing-driven Detailed Placement
An example of physical
synthesis breakdown
VLSI Physical Design: From Graph Partitioning to Timing Closure
•
•
•
•
•
•
•
•
Electrical Correction
Legalization
Critical Slack Optimization
Legalization
Critical Slack Optimization
Legalization
Compression
Legalization
Chapter 5: Global Routing
8
Lienig
Phase 4
© KLMH
EECS 527 Paper Presentation
How to Achieve Fast Physical Synthesis?
Selected Techniques
Fast Timing-Driven Buffering
Layout Aware Buffer Trees
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
9
Lienig
Diffusion Based Legalization
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Motivation
Over a million buffers
Rebuffering rips all buffers and reinserts buffers from scratch
Considerations
Buffering resources vs. delay
Runtime
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
10
Lienig
Slew, noise and capacitance constraints
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Classical Buffering Algorithm
Goal: Maximize source RAT
Dynamic programming
Candidate solutions generated and propagated from the sinks to the source
Solution internal node characteristics (q, c, w)
q: required arrival time
c: downstream load capacitance
w: cost summation for the buffer insertion decision
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
11
Lienig
Example: sink (q = RAT, c = load capacitance, w = 0)
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Classical Buffering Algorithm
Two solutions α1, α2
α2 dominates α1, if q2 ≥ q1, c2 ≤ c1 and w2 ≤ w1
α1 is redundant and can be pruned
At the end of algorithm
A set of solutions with different cost-RAT tradeoff is obtained
Choose one in middle
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
12
Lienig
“10 ps rule”: If margin RAT gain is more than 10ps,
choose solution with bigger RAT
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Prebuffer Slack Pruning (PSP)
Based on current node being processed
if q2 < q1, c2 < c1 and (q2 - q1)/(c2 - c1) ≥ Rmin, then α2 is pruned early
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
13
Lienig
Appropriate Rmin guarantees optimality, however larger value does not hurt
solution quality
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Squeeze Pruning
Three partial solutions α1, α2, α3 with same cost
if (q2 - q1)/(c2 - c1)≤(q3 - q2)/(c3 - c2), then α2 is pruned
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
14
Lienig
For a two-pin net, the middle point is always dominated by either the first or the
third solution; for multi-sink net, optimality not guaranteed but causes no
degradation in solution most of the time
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Library Lookup
Every buffer in the library is examined
for iteration
If there are m kinds of buffer and
inverter, n nodes, mn candidate
solutions in total
However many candidate solutions are
not worth considering
Pre-compute Buffer table and Inverter
table
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
15
Lienig
2n candidate solutions, n with inverters
and n with buffers
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Results and Summary
Derived from 5000 high capacitance nets from an ASIC chip
3% quality degradation and 20x speedup
Philosophy: as fast as possible even if a little optimality is sacrificed
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
16
Lienig
Rip up and rebuffering with more accurate techniques can be perform latter
if desired
© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees
Layout problems in buffering
(a) Alley
(b) Pile-ups
Holes in large blocks
Layout constrains
Holes in large blocks
Navigating blocks and dense
region
Critical and non-critical routes
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
17
Lienig
Avoiding routing congestions
© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees
Layout aware buffer tree flow
Step 1: Construct a fast timing-driven Steiner tree
Step 2: Reroute the Steiner tree to preserve its topology while navigating
environmental constrains
Step 3: Insert buffers (e.g. with Fast Timing-Driven Buffering)
This work focuses on Step 2
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
18
Lienig
© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees
Algorithm
Break existing Steiner tree into disjoint 2-paths, i.e., paths start and end with
either source, sink or a Steiner point
Each 2-path is routed in turn to minimize cost, starting from sinks and ending
at source
Maze routing for each
2-path with cost
function
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
19
Lienig
If Steiner point is in a
congested region,
move it in a specified
“plate region”
© KLMH
EECS 527 Paper Presentation
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
20
Lienig
© KLMH
EECS 527 Paper Presentation
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
21
Lienig
© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees
General Maze routing cost function
Tradeoff parameter 0 ≤ K ≤ 1
Tile cost: cost(t) = 1 + K e(t)
Merging branches:
cost(t) = max(cost(L), cost(R) + K min(cost(L), cost(R))
Sink initialization
cost(s) = (K - 1)RAT(s)/DpT
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
22
Lienig
Use K=1 for electrical correction; use K=0.1 for critical path
© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees
Example and Summary
A 7-pin net of an industrial design
(a) K=1.0, 4134ps slack improvement
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
23
Lienig
(b) K=0.1, 4646ps slack improvement
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization
Classical legalization
After optimization, local regions can be overfull
Run periodically to snap from overlaps to legal positions
If one waits too long between two legalizations, cells may end up quite far
away from optimal position, which may severely hurt timing
Diffusion-Based Legalization
Avoid cells been moved too far away
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
24
Lienig
Fast. Run in minutes on designs with millions of gates
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization
Diffusion as a Physical Process
Moves elements from a state with non-zero potential energy to a state of
equilibrium
Can be modeled by breaking down into finite time steps
Relationship of material concentration with time and space
t
2 d x , y (t )
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
25
Lienig
d x , y (t )
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization
Diffusion as a Physical Process
Cell velocity
vxH, y ( x, y )
v ( x, y )
V
x, y
d x , y (t )
x
d x , y (t )
y
/ d x , y (t )
/ d x , y (t )
Cell new location
t
x(t ) x(0) vxH(t '), y (t ') (t ')dt '
0
t
y(t ) y(0) vVx(t '), y (t ') (t ')dt '
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
26
Lienig
0
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization
Diffusion Based Placement
Coordinates are scaled so that the width and height of each bin is one
Location (x, y) lies in bin ( j, k ) ( x , y )
Forward Time Centered Space (FTCS) scheme
New bin density
t
d j ,k (n 1) d j ,k (n) (d j 1,k (n) d j 1,k (n) 2d j ,k ( n))
2
t
(d j 1,k (n) d j ,k 1 (n) 2d j ,k ( n))
2
Bin velocity
vVj ,k (n)
d j 1,k (n) d j 1,k (n)
2 d ( j , k ) ( n)
d j ,k 1 (n) d j ,k 1 (n)
VLSI Physical Design: From Graph Partitioning to Timing Closure
2 d ( j , k ) ( n)
Chapter 5: Global Routing
27
Lienig
v Hj,k (n)
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization
Diffusion Based Placement
Enforce vH = 0 at horizontal boundary and vH = 0 at vertical boundary
Two cells right next to each other can be assigned very different velocities
which could change their relative ordering. Apply velocity interpolation based
on the four closest bins to remedy this behavior
New locations (x, y) for the next time stamp
x(n 1) x(n) vxH( n ), y ( n ) t
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
28
Lienig
y (n 1) y (n) vVx ( n ), y ( n ) t
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization
Diffusion Based Placement: Getting it work
Diffusion process reaches equilibrium when each bin has the same density,
i.e. the average density, can cause unnecessary spreading, even if every
bin’s density is well below dmax
Idea: Run diffusion for regions which requires it
Local Diffusion: Run diffusion on cells in a window around bins that violate
target density constraint
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
29
Lienig
If FTCS error exceeds a certain threshold, update the real density based on
real cell placement and restart the diffusion algorithm
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization
Example
Before legalization,
after traditional legalization
and diffusion legalization
4% total wire length save
48% worst slack improvement
36% less negative paths
Summary
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
30
Lienig
Diffusion based legalization
is less likely to disrupt the
state of design
© KLMH
EECS 527 Paper Presentation
Summary
Buffering trends
“Buffer Explosion”
Physical synthesis phases
Fast Timing-Driven Buffering
Layout Aware Buffer Trees
Diffusion-Based Legalization
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
31
Lienig
4 phases
© KLMH
EECS 527 Paper Presentation
Thanks !
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
32
Lienig
Q&A