Transcript Timing Closure Today - Zhejiang University
Timing Closure Today
Lou Scheffer Cadence San Jose, CA [email protected]
Hangzhou, April 2002 Lou Scheffer 1
Timing Closure Today
Design Entry Synthesis Timing Place Timing Route Timing •
Timing more accurate as flow progresses
•
Sometimes an earlier stage thinks timing is OK, but it fails a later stage
•
Need to repeat one or more steps with tighter constraints
•
We have a timing closure problem when this process fails. Symptoms include:
•
Non-convergence
•
Too many iterations
•
Solution achievable, but this flow cannot find it.
Hangzhou, April 2002 Lou Scheffer 2
The Timing Closure Problem
Performance of Circuit Test 7 100 99 96 95 90 85 80 75 PKS/WLM P&R 78 Stage IPO P&R 83 pks regular Hangzhou Lou Scheffer II-3
Design C1
Examples of Problems
Worst slack / # misses Synthesis Placed -1 / 2000 -12 / 38k Cycle time 7.5 ns Tech .25 µm V1 T1 P1 V2 0 / 0 -0.5 / 2000 -0.4 / 100 -0.5 / 500 -12 / 15k -48 / 164k -97 / 43k -11 / 2000 7.5 ns .18 µm 2.5-10 ns .18 µm 8 ns 7.5 ns .25 µm .18 µm Hangzhou, April 2002 Lou Scheffer 4
Agenda
Timing Analysis Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-5
Timing Analysis
Give accurate time values on each pin/port of the network Has to deal with design changes in optimization toolbox
Static
Timing Analysis Simulation far too slow in optimization environment Accuracy is more than enough Hangzhou Lou Scheffer II-6
Timing Analysis Requirements
Choose combination of timing analyzer and delay calculator which are appropriate for level of design give the best accuracy for performance that can be tolerated Timing Analysis / Delay calculation must be able to cope with logic design changes Incremental Highest performance possible Non-linear delay models Hangzhou Lou Scheffer II-7
Timing Analysis Requirements
Must handle… Difference between rising and falling delays Delay dependent on slew rate Slew and delay dependent on output load Non-linear delay equations Hangzhou Lou Scheffer II-8
Late Mode Analysis Definitions
AT a AT b
a b c
d ax
y
RAT x
x
Constraints: assertions at the boundaries – Arrival times:
AT a , AT b
– Required arrival time:
RAT x
Delay from
a
to
x
is the longest time it takes to propagate a signal from
a
to
x
Slack is required arrival time - arrival time.
Hangzhou Lou Scheffer II-9
Example
SL a
0 0 0
AT a
0
AT b
1
a b
SL b
0 1 1
SL y
1 2 1 1
AT y
y
2
c
AT c
0
SL c
1 0 1 1
RAT x
2
x
AT x
3
SL x
2 3 1 Hangzhou Lou Scheffer II-10
Early mode analysis
Definitions change as follows –
longest
becomes
shortest
– slack = arrival – required Not as important since early violations are easier to fix
SL a
0 0 0
AT a
0
AT b
1
a b
SL b
1 0 1 1
SL y
1 1 0
AT y
1
y
SL c
c
AT c
0 0 1 1 1
RAT x
2
x
AT x
1
SL x
1 2 1 Hangzhou Lou Scheffer II-11
Delay modeling
a b
d ax d bx
Propagation Arcs x d
t cl
_
d
cl
d cl
_
o
o
Hangzhou
Timing Model Test Arc
Lou Scheffer II-12
Agenda
Timing Analysis Overview Traditional design flows Summary of DSM Problems Timing Correction Overview Approaches to Fixing Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-13
Traditional Design Flows
1.
Design Entry Tech independent optimization Synthesis Timing 2.
3.
Tech mapping Rudimentary timing correction Place Timing Route Timing Lou Scheffer Hangzhou II-14
Logic Synthesis
Technology independent optimization General goal: reduce connections, literals, redundancies, area Technology mapping Map logic into technology library Timing correction added next Find and fix critical timing paths Fix electrical violations (load, slew) Hangzhou Lou Scheffer II-15
Traditional Design Flows
Design Entry 1.
Synthesis w/Timing 2.
3.
Tech independent optimization Tech mapping Timing correction Place w/Timing Route Timing Integrate timing with synthesis and placement Lou Scheffer Hangzhou II-16
Traditional Design Flows
Design Entry 1.
Synthesis/Place ment w/Timing 2.
3.
4.
Tech independent optimization Tech mapping Placement Timing Correction Global Route Detailed Route Timing Integrate timing with synthesis and placement Hangzhou Lou Scheffer II-17
Traditional Design Flows
Design Entry 1.
Synthesis and Placement w/Timing and Global route 2.
3.
4.
5.
Tech independent optimization Tech mapping Placement Timing Correction Global route Detailed Route Timing Integrate timing with synthesis, placement and global route Lou Scheffer Hangzhou II-18
Agenda
Timing Analysis Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-19
The Wall
Logic designers concentrate on logic and timing (as understood by synthesis) Design work done in abstract world Was gates and wire load models Now may include placement and global route Throw design
over the wall
when complete Physical designers concentrate on layout and ability to route Effective method for many years Hangzhou Lou Scheffer II-20
General CMOS Problems
Low drive strengths / low power Capacitance (not intrinsic delay) plays a large role in performance Huge variability – range between slowest possible and fastest possible Noise affects delay IR drop a big percentage of supply Crosstalk can change delay by a factor of 2 Hangzhou Lou Scheffer II-21
Additional DSM Problems
High density / huge designs Very thin and resistive wires Very high frequencies Inductance becomes more important Smaller voltages IR drop a bigger fraction of signal swing Clock skew and latency Electromigration and noise Hangzhou Lou Scheffer II-22
Clock Distribution Problems
Most common design approach requires close to zero skew CMOS / DSM problems all affect clocks Distribution problem increasing Number of latches/flip-flops growing significantly Power consumed in clock tree significant
I
and noise also of concern Hangzhou Lou Scheffer II-23
Process Designers are trying to help
Many metal layers Different metal pitches Small pitch for local interconnect Big pitch/thick metal for long, fast wires Copper wires, thick metal to lower R SOI – Silicon On Insulator Low k dielectrics These help but are not enough Hangzhou Lou Scheffer II-24
Agenda
Timing Analysis Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-25
Timing Correction
Fix electrical violations (slew and load). Takes priority since needed for reliability.
Resize cells Buffer nets Copy (clone) cells Fix timing problems Local transforms (bag of tricks) Path-based transforms Hangzhou Lou Scheffer II-26
Local Transforms
Resize cells Buffer or clone to reduce load on critical nets Decompose large cells Swap connections on commutative pins or among equivalent nets Move critical signals forward Pad early paths Area recovery Hangzhou Lou Scheffer II-27
Transform Example
…..
Double Inverter Removal …..
…..
Delay = 4 Delay = 2
II-28 Hangzhou Lou Scheffer
a b
Resizing
?
d e f 0.2
0.2
0.3
a b a b A 0.035
Hangzhou
C 0.026
0.05
0.04
0.03
0.02
0.01
0 0 0.2
0.4
load 0.6
A B C 0.8
1
Lou Scheffer II-29
a b
Cloning
?
0.05
0.04
0.03
0.02
0.01
0 0 0.2
0.4
A load B 0.6
C 0.8
d e f g h 0.2
0.2
0.2
0.2
0.2
a b A B Can also isolate critical sinks 1
Hangzhou Lou Scheffer
d e f g h
II-30
a b
Buffering
0.05
0.04
0.03
0.02
0.01
0 0 0.2
0.4
A load B 0.6
C 0.8
?
d e f g h 0.2
0.2
0.2
0.2
0.2
a b B
Hangzhou Lou Scheffer
1 0.1
B d e 0.2
0.2
f g h 0.2
0.2
0.2
II-31
Redesign Fan-in Tree
Arr(a)=4 Arr(b)=3 a b Arr(c)=1 Arr(d)=0 c d 1 1 1 e Arr(e)=6
Hangzhou
c d 1 b 1 a 1 e Arr(e)=5
Lou Scheffer II-32
1
Redesign Fan-out Tree
3 1 1 Longest Path = 5
Hangzhou
1 1 1 3 1 1 1 2 1 Longest Path = 4 Slowdown of buffer due to load
Lou Scheffer II-33
Decomposition
Hangzhou Lou Scheffer II-34
Swap Commutative Pins
1 0 a 1 1 2 c b 1 5 2 Simple Sorting on arrival times and delay works 2 c 1 0 1 a b 2 1 1 3
Hangzhou Lou Scheffer II-35
Move Critical Signals Forward
a b c d a b d c
Hangzhou
e e
Lou Scheffer
Based on ATPG
– linear in circuit size – Detects redundancies efficiently
Efficiently find wires to be added and remove.
– Based on mandatory assignments.
II-36
Path-based Transforms
Path-based resizing Unmap / remap a path or cone Slack stealing Retiming Hangzhou Lou Scheffer II-37
Slack Stealing
Take advantage of timing behavior of level sensitive registers (latches)
C1 C2 0 Slack = -1 1 2 Slack = +1 C1 C1 C2
Hangzhou
C2
Lou Scheffer
Slack = 0
II-38
Retiming
Backward
Delay=3
Forward A more aggressive optimization since it changes the function
Hangzhou Lou Scheffer Delay=2 II-39
Solutions to Timing Closure
Carry hierarchical logic design into physical Hand / Custom design Improved analysis More sophisticated clock design Modify existing flows More physically knowledgeable tools Many variations: combined synthesis/place/route, gain based synthesis, etc.
Hangzhou Lou Scheffer II-40
Agenda
Analysis Methods Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-41
Hierarchy and Physical Design
Logical hierarchy can be carried over into physical design Seems natural top-down approach, using floorplanning as a firm guide to physical design Use of hierarchy offers many advantages and many possible problems A new generation of tools for this problem Hangzhou Lou Scheffer II-42
Pin Assignment and Timing Budgeting
Block 1
L L
Block 3 Block 2
L
Each block requires:
Content definition Partitioning Pin locations Clock/timing definition Set_input_delay Set_output_delay Set_drive Set_load Path exceptions (false, multicycle paths) Hangzhou Lou Scheffer II-43
Hierarchy and Physical Design Advantages…
Run time of P&R tools Blocks can be built independently Early (
and valuable
) knowledge of global wires Limited wire delay within macro may allows simpler methodologies Contains the problem size Extends naturally to SOC and mixed A/D chips May be the only real method available Hangzhou Lou Scheffer II-44
Physical Hierarchy Disadvantages
Possible to overconstrain the design in many ways (see next slide) Hierarchy usually logic-based, not physically-based Designed for logical correctness, not physical implementation Hangzhou Lou Scheffer II-45
Physical Hierarchy Overconstraints
Placement solution perhaps overconstrained Logical gates may not fit naturally in a rectangle Ability to find a routable solution hindered Can’t detour through neighboring cell Boundary conditions explode and must be managed carefully to avoid surprises A recent IBM design had 17,000 top level connections. A bad timing constraint on any one can make the whole design infeasible Hangzhou Lou Scheffer II-46
Hierarchy Example Plots
Hangzhou Lou Scheffer II-47
Hierarchy Example Plots
Hangzhou Lou Scheffer II-48
Hierarchy Example Plots
Hangzhou Lou Scheffer II-49
The Challenges
How to derive sensible partitioning?
How to achieve die utilization similar to “flat” approach?
How to achieve clock speed and skews similar to “flat” approach?
How to automatically generate optimal pin assignments for each module?
How to automatically come up with realistic timing budgets for each module?
Hangzhou Lou Scheffer II-50
Basic Approach to solution
Example tool – First Encounter Start with a
Silicon Virtual Prototype
•
A near final quality ‘flat’ placement
•
Near legal routing RTL / gates
•
Known feasible solution for timing and routability
•
Use this solution to guide the final implemention
•
Partitioning, pin assignment, timing constraints
•
Build the blocks with more detailed tools.
Hangzhou Lou Scheffer
Silicon Virtual Prototype hierarchical partitioning and placement
Top Level
Top level buffering, clock balancing, and power grid GDSII
Block Level
Physical synthesis / placement and routing Chip assembly routing
II-51
Basic Approach continued
Logic Design: RTL, gates, IP, “black box” IP a = b + c
netlist const.
.lib
Physical Data
Physical Prototype
Complete “flat” physical design (proves timing and routability)
VERY FAST
Silicon Virtual Prototype Full-Chip Physical Prototype
Accurate timing and routability data in hours instead of days or weeks
Hand off a floorplan OR full placement
Block-Level Physical Synthesis and/or Route
Confidence the design will work once the blocks are re-assembled into the complete IC
Hangzhou Lou Scheffer II-52
In-Context Hierarchical Partitioning
Pin assignment Timing budgeting Clock tree generation Power grid planning Partitioning Hangzhou Independent block-level implementation Lou Scheffer SoC assembly II-53
In-Context Pin Assignment
Accurate Physical Prototype Flat Full-Chip Top Level Partition View
Full-chip prototype results in optimal pin placement Results in narrower channels and reduced die size Reduces the routing congestion Improves the chip timing Hangzhou Lou Scheffer II-54
In Context Timing Budgeting
Block 1
L L
Block 3 Block 2
L
Each block requires:
Clock definition Set_input_delay Set_output_delay Set_drive Set_load Path exceptions (false, multicycle paths)
Accurate timing budgets result in predictable timing convergence
Hangzhou Lou Scheffer II-55
Agenda
Analysis Methods Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-56
Blocks have timing closure
problems, too
Didn’t the big flat placement guarantee blocks are feasible? No, because Block may not have been defined when global constraints were set Global placer does not deal with all DSM effects Block may be too hard for the relatively simple global placer (which must be very fast) Requirements change as project progresses Process technology may have changed …..
Hangzhou Lou Scheffer II-57
Hand/Custom Design
Mentioned for completeness Hurts productivity Yields highest performance Can only fix a few things – for example: Can realistically fix timing or crosstalk problems on a few nets Cannot realistically change the size of blocks Hangzhou Lou Scheffer II-58
Improved Analysis Helps
Plot shows slack by net for two designs A 10% timing delta -> many more bad nets Often the difference between success and failure 3500 3000 2500 2000 1500 1000 500 0 0 -5 Hangzhou 5 10
Slack Relative to Worst Net (ns)
Lou Scheffer 15 Series1 Series2 20 II-59
More accurate analysis
Crosstalk induced delay Old approach – overestimate coupling C Better – compute nominal timing + xtalk delta Customer example from CadMos Ignore crosstalk completely Not an acceptable alternative 400 MHz Coupling Caps overestimated by 60% 300 MHz Nominal delays + computed crosstalk 333 MHz More accurate analysis gains 10% margin Hangzhou Lou Scheffer II-60
Increased accuracy helps
Global/detailed route correlation Any global route better than Wire Load Models or Steiner trees, since global routes consider congestion But to get that last 10%, need global/detailed router link Knowing some nets must detour is good, but….
Which net takes which detour is needed for good correlation Hangzhou Lou Scheffer II-61
Modified clock design
Zero skew is not necessary, and often not even desirable We have the freedom to adjust clock arrival times at memory elements This obtains more margin and thus helps convergence Similar to retiming but less disruptive Improvement very design dependent If worst path is flip-flop to itself, doesn’t help May impact scan chains Hangzhou Lou Scheffer II-62
Previous attempts to fix block closure
Without the radical step of combining synthesis and placement, designers have tried: Allow placer to do sizing and buffering Do post placement optimization Simple transformations Use existing placement Do post placement re-synthesis Complex transformations allowed Needs incremental placement and extraction But these have not been fully successful Why? Re-examine the root cause of discrepancies Wire load models and their limitations Combined Synthesis/Placement/Routing Hangzhou Lou Scheffer II-63
Post-Placement Optimization
Design Entry Synthesis w/Timing Place Re-run Synthesis w/Timing Route Timing 1.
2.
In-place optimizations Minimally disturb placement optimizations Hangzhou Lou Scheffer II-64
Post-Placement Optimization
In-place (little or no placement impact) Resizing (carefully) Pin swapping, some tree rebuilding Wire sizing / typing Minimally disruptive Resizing Buffering Cloning Tree rebuilding Cell removal Hangzhou Lou Scheffer II-65
In-place Optimization
Not
too
difficult Can use extracted electrical data (C, RC) from placement tool Some changes affect pin locations, but may be ignored Tree rebuilding needs incremental extraction Can use timing reports for timing data But, accuracy suffers as changes are made Real RC data replaced by estimates again Hangzhou Lou Scheffer II-66
In-place Optimization
Resize swap pins rebuild trees Placed netlist Placement & extraction Optimization C/RC data Opt’d netlist Hangzhou Lou Scheffer II-67
Place-disruptive Optimization
Nets changing implies… Must be able to recompute C and RC May need to incrementally place new cells Need incremental timing capability Hangzhou Lou Scheffer II-68
Place-disruptive Optimization
Resize buffer clone cell removal rebuild trees Placed netlist Placement & extraction Optimization with placer, timer, extractor C/RC data Opt’d netlist Hangzhou Lou Scheffer II-69
What are the problems?
Getting the timing right Different timers used at different stages Do the optimizer and placer see the same worst paths as the static timer?
Design size / tool capacity Using synthesis technology on flat designs Hangzhou Lou Scheffer II-70
More problems
Incompatible tools, formats Placer, synthesizer, timer may all use different file format, may all be different vendors Basic interoperability issues Incremental placer needed for new cells Doesn’t have to be smart But might produce some infeasible solutions Must be integrated with optimizer Hangzhou Lou Scheffer II-71
Still more challenges/problems
Extraction/Estimation of net data Any optimization which significantly alters net topology needs this ability Insert cells Remove cells Move connections from one cell to another Steiner tree estimation Net C and delay (RC) calculator Do results match detail router and other extraction tools?
Hangzhou Lou Scheffer II-72
Sample Optimization Results
Design Worst slack / # misses Synthesized Placed Opt C1 -1 / 2000 -12 / 38k -2 / 1400 Cycle time 7.5 ns Tech .25 µm V1 T1 P1 V2 0 / 0 -12 / 15k -0.3 / 100 -0.5 / 500 -11 / 2000 -4 / 1000 7.5 ns .18 µm -0.5 / 2000 -48 / 164k -6 / 62k 2.5-10 ns .18 µm -0.4 / 100 -97 / 43k -13 / 20k 8 ns .25 µm 7.5 ns .18 µm Hangzhou Lou Scheffer II-73
Root Problem is Wire Load Models
Main problem: correlation between Pre P&R estimates and Post-P&R extraction If correlation is good… Problems detected and potentially fixed
early
If correlation is bad… Problems detected
late
Not a good situation! Need to re-write RTL is worst case for timing closure.
Hangzhou Lou Scheffer II-74
Why are Wire Load Models Used?
Can’t complete layout until logic design is complete Can’t complete logic design without timing Can’t time without load and net delay data Can’t extract load and net delay data until layout is complete Can’t complete layout … Hangzhou Lou Scheffer II-75
WLM solution – use statistics
Don’t know specific layout data But we know something about statistical properties Average net load, average net delay Further refine using other characteristics Number of sinks Size of design (number of circuits) Physical size Hangzhou Lou Scheffer II-76
Correlation Pre/Post-P&R using averages
Wire load models
of physical design give synthesis an
estimate
We can correlate
averages
pre- and post P&R as accurately as needed If specific design has average behavior, its timing,
on average
, can be predicted Otherwise, a pass through placement can provide correct WLM for a design, and get the averages right Hangzhou Lou Scheffer II-77
Timing and averages
WLMs OK for area, power (properties that are sums are well handled by statistics) But, timing dictated by the worst
specific
path That path is built of
individual
nets One net can determine the speed of an entire design Reality: poor correlation for relatively few nets can cause major headaches Hangzhou Lou Scheffer II-78
Correlation Pre/Post-P&R Averages and Wire Loads
Distribution of C / fan-out 30000 25000 20000 15000 10000 5000 0
median mean
0 10 20 30 40 50 60 70 pF per fan-out 80 90 100 110 Note the very long tails of this distribution
Hangzhou Lou Scheffer II-79
Correlation Pre/Post-P&R C
wire
Data by Logic Design
C wire
Hangzhou Lou Scheffer
Number of fan-outs
II-80
Better Wire Load Models
How can we use information from one pass through physical design?
Adjust wire load model coefficients
Back annotate
specific net load and delay data to the logic design New problem: correlation of logic pre- and post synthesis But, there are fundamental limits to statistical models – a new approach is needed .
Hangzhou Lou Scheffer II-81
A better (but harder) approach: Combine Synthesis, P & R
Don’t use wire load models at all Synthesis does a trial placement as it runs Loading found from estimated routes For best results, must include global routing Then, feed global route to detailed router Or, do detailed route itself Much better correlation and timing closure No inter-tool data transfer headaches Hangzhou Lou Scheffer II-82
Example of Combined SP&R
Video Graphics Engine 160k instances 70 macros (blocks) 5 layers, 0.18 micron Target freq: 100Mhz Hangzhou Lou Scheffer II-83
Conventional Flow
Func. & Timing .lib
More than 20 Iterations 89MHz best result w/manual changes syn2GCF Floorplan DEF Func. & Timing .TLF
Physical LEF Synthesis DC Static Timing PT SE Placement base optimization Global route Detail route Extraction Delay calc Pearl DRC II-84 Hangzhou Lou Scheffer
Combined SP&R Flow
100MHz final result, met timing Correlation within + - 2.1% One pass 12hrs 20min runtime Static Timing PT write_constraints TCL Constraints Floorplan DEF Func. & Timing .TLF
EDIF netlist SE-PKS PKS Optimization Global Route Static Timing Physical LEF Detail route HE Extraction Delay calc Pearl DRC Hangzhou Lou Scheffer II-85
Slack Correlation
PKS Routed
Hangzhou Lou Scheffer
Wire Load Based
II-86
Enlargement of SP&R slack
Hangzhou Lou Scheffer II-87
Results from combined SP&R
Case size instances (k) 1 2 3 350 250 50 4 160 macros PKS timing error (%) 56 50 4 + - 3% + - 3% + - 0.96% 70 + - 2.1% max freq (MHz) conventional SP&R 140 97 93 140 100 95 89 100 Hangzhou Lou Scheffer II-88
Agenda
Traditional design flows Summary of DSM Problems Analysis Methods Overview Correction Methods Overview Approaches to Fixing Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-89
Experimental Results
For Hierarchical design, two objectives Should be faster and higher capacity than a fully detailed flat design, to find problems earlier Resulting partitions should be realizable For Block Design, compare different strategies Can overconstrain clock or wire models Can do IPO or not after placing Can allow placer to change size or not Can test combined synthesis/placement against running the two tools separately Hangzhou Lou Scheffer II-90
Hierarchical Experimental results
Design 580K cells, 0.25um process, 5LM, 100MHz Data collected on a 500MHz processor workstation First Encounter Flow Resulting blocks were realizable Traditional Flow
(*) SPC Trial Route
6x 1x 60x Design Import Detail Place Hangzhou 56x Detail Route* 57x RC Extract 7x 33x Delay Calculation Timing Analysis Lou Scheffer 5x IPO Design Iteration II-91
How do different block design approaches compare?
Jay McDougal of Agilent ran many flows on the same design 1.
2.
3.
4.
5.
Overconstrain clock by various amounts Accurate or conservative WLMs Tried many levels of conservatism Allow placer to size or not Do post placement optimization or not Physically knowledgeable synthesis Hangzhou Lou Scheffer II-92
Characteristics of sample design
Design not very difficult ColdFire processor 80K instances 0.25 micron library 5 layer process, not congestion dominated Design goal was 180 MHz, known to be possible with this design 85% of delay in gates; 15% in interconnect 0.18/0.13 micron, bigger designs will show bigger differences between techniques Hangzhou Lou Scheffer II-93
Key to the plot of results
Basic flow – Design Compiler & QPlace TDD = timing driven design In addition to minimizing wire length and congestion, placer is given timing constraints and allowed to change gate sizes IPO and PBO are post placement optimizers IPO – runs on synthesis DB with back annotation PBO – runs on physical DB with synthesis transforms PKS = Physically Knowledgeable Synthesis (combined Synthesis/Place/Route) Hangzhou Lou Scheffer II-94
Comparison of Approaches
9.5
9 8.5
8 7.5
7 6.5
6 5.5
5 0.95
Required Cycle time 1.05
Relative size 1.15
1.25
No custom WLM 90% WLM 3ns;50%WL IPO 5ns NoWL IPO 3ns NoWL TDD/PBO 50%WL TDD/PBO 90%WL PKS
Hangzhou Lou Scheffer II-95
Comparison of Approaches
9.5
9 8.5
8 7.5
7 6.5
6 5.5
5 0.95
Good area, but iterates between placement and synthesis, worst TTM, didn’t hit timing target One tool, no iteration, better TTM, hit timing target 1.05
Relative size 1.15
1.25
No WLM 90% WLM 3ns;50%WL IPO 5ns NoWL IPO 3ns NoWL TDD/PBO 50%WL TDD/PBO 90%WL PKS
Hangzhou Lou Scheffer II-96
Agenda
Traditional design flows Summary of DSM Problems Analysis Methods Overview Correction Methods Overview Approaches to Fixing Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-97
Good News
At least we understand the problem Analysis of timing is well understood Transformations that help timing are well understood DSM effects are painful but can be controlled Hangzhou Lou Scheffer II-98
Bad News
Cycle time and technology advances demand more and more sophisticated optimization techniques In previous flows, corrections must be applied in separate tools Disconnects among various tools involved increases turn-around-time and limits optimization Hangzhou Lou Scheffer II-99
Good News
The Bad News is commonly recognized Many tool vendors, academics, in-house EDA researchers are working to solve these problems A new generation of tools is already available that was designed from the ground up to address timing closure Hierarchical and block design Hangzhou Lou Scheffer II-100
Bad News
These problems won’t be the last!
Each process generation brings new problems Increased size Weird process rules (antenna) Possible new effects (single event upset) Hangzhou Lou Scheffer II-101
Summary
Timing closure is a very real problem Large chips need hierarchical tools to help with partitioning and budgeting Block tools must understand synthesis, placement and routing wire load models have serious limitations Best approach is combined synthesis/P&R Experimental data backs this up Hangzhou Lou Scheffer II-102
Acknowledgements
Tony Drumm wrote the original set of slides for this lecture, including many of the examples. He credits: Alex Suess José Neves Bill Joyner IBM Rochester EDA folks But the conclusions, and any mistakes, are mine Hangzhou Lou Scheffer II-103
Hangzhou Lou Scheffer II-104