Timing Closure Today - Zhejiang University

Download Report

Transcript Timing Closure Today - Zhejiang University

Timing Closure Today

Lou Scheffer Cadence San Jose, CA [email protected]

Hangzhou, April 2002 Lou Scheffer 1

Timing Closure Today

Design Entry Synthesis Timing Place Timing Route Timing •

Timing more accurate as flow progresses

Sometimes an earlier stage thinks timing is OK, but it fails a later stage

Need to repeat one or more steps with tighter constraints

We have a timing closure problem when this process fails. Symptoms include:

Non-convergence

Too many iterations

Solution achievable, but this flow cannot find it.

Hangzhou, April 2002 Lou Scheffer 2

The Timing Closure Problem

Performance of Circuit Test 7 100 99 96 95 90 85 80 75 PKS/WLM P&R 78 Stage IPO P&R 83 pks regular Hangzhou Lou Scheffer II-3

Design C1

Examples of Problems

Worst slack / # misses Synthesis Placed -1 / 2000 -12 / 38k Cycle time 7.5 ns Tech .25 µm V1 T1 P1 V2 0 / 0 -0.5 / 2000 -0.4 / 100 -0.5 / 500 -12 / 15k -48 / 164k -97 / 43k -11 / 2000 7.5 ns .18 µm 2.5-10 ns .18 µm 8 ns 7.5 ns .25 µm .18 µm Hangzhou, April 2002 Lou Scheffer 4

Agenda

 Timing Analysis Overview  Traditional design flows  Summary of DSM Problems  Correction Methods Overview  Hierarchy and Timing Closure  Block Level Timing Closure  Experimental Results  Summary Hangzhou Lou Scheffer II-5

Timing Analysis

 Give accurate time values on each pin/port of the network  Has to deal with design changes in optimization toolbox 

Static

Timing Analysis  Simulation far too slow in optimization environment  Accuracy is more than enough Hangzhou Lou Scheffer II-6

Timing Analysis Requirements

 Choose combination of timing analyzer and delay calculator which are appropriate for level of design  give the best accuracy  for performance that can be tolerated  Timing Analysis / Delay calculation must be able to cope with logic design changes  Incremental  Highest performance possible  Non-linear delay models Hangzhou Lou Scheffer II-7

Timing Analysis Requirements

 Must handle…  Difference between rising and falling delays  Delay dependent on slew rate  Slew and delay dependent on output load  Non-linear delay equations Hangzhou Lou Scheffer II-8

Late Mode Analysis Definitions

AT a AT b

a b c

d ax

y

RAT x

x

 Constraints: assertions at the boundaries – Arrival times:

AT a , AT b

– Required arrival time:

RAT x

 Delay from

a

to

x

is the longest time it takes to propagate a signal from

a

to

x

 Slack is required arrival time - arrival time.

Hangzhou Lou Scheffer II-9

Example

SL a

 0  0  0

AT a

 0

AT b

 1

a b

SL b

 0  1   1

SL y

 1  2   1 1

AT y

y

 2

c

AT c

 0

SL c

 1  0  1 1

RAT x

 2

x

AT x

 3

SL x

 2  3   1 Hangzhou Lou Scheffer II-10

Early mode analysis

  Definitions change as follows –

longest

becomes

shortest

– slack = arrival – required Not as important since early violations are easier to fix

SL a

 0  0  0

AT a

 0

AT b

 1

a b

SL b

 1  0  1 1

SL y

 1  1  0

AT y

 1

y

SL c

c

AT c

 0  0  1   1 1

RAT x

 2

x

AT x

 1

SL x

 1  2   1 Hangzhou Lou Scheffer II-11

Delay modeling

a b

d ax d bx

Propagation Arcs x d

t cl

_

d

cl

d cl

_

o

o

Hangzhou

Timing Model Test Arc

Lou Scheffer II-12

Agenda

 Timing Analysis Overview  Traditional design flows  Summary of DSM Problems  Timing Correction Overview  Approaches to Fixing Timing Closure  Experimental Results  Summary Hangzhou Lou Scheffer II-13

Traditional Design Flows

1.

Design Entry Tech independent optimization Synthesis Timing 2.

3.

Tech mapping Rudimentary timing correction Place Timing Route Timing Lou Scheffer Hangzhou II-14

Logic Synthesis

 Technology independent optimization  General goal: reduce connections, literals, redundancies, area  Technology mapping  Map logic into technology library  Timing correction added next  Find and fix critical timing paths  Fix electrical violations (load, slew) Hangzhou Lou Scheffer II-15

Traditional Design Flows

Design Entry 1.

Synthesis w/Timing 2.

3.

Tech independent optimization Tech mapping Timing correction Place w/Timing Route Timing Integrate timing with synthesis and placement Lou Scheffer Hangzhou II-16

Traditional Design Flows

Design Entry 1.

Synthesis/Place ment w/Timing 2.

3.

4.

Tech independent optimization Tech mapping Placement Timing Correction Global Route Detailed Route Timing Integrate timing with synthesis and placement Hangzhou Lou Scheffer II-17

Traditional Design Flows

Design Entry 1.

Synthesis and Placement w/Timing and Global route 2.

3.

4.

5.

Tech independent optimization Tech mapping Placement Timing Correction Global route Detailed Route Timing Integrate timing with synthesis, placement and global route Lou Scheffer Hangzhou II-18

Agenda

 Timing Analysis Overview  Traditional design flows  Summary of DSM Problems  Correction Methods Overview  Hierarchy and Timing Closure  Block Level Timing Closure  Experimental Results  Summary Hangzhou Lou Scheffer II-19

The Wall

 Logic designers concentrate on logic and timing (as understood by synthesis)  Design work done in abstract world  Was gates and wire load models  Now may include placement and global route  Throw design

over the wall

when complete  Physical designers concentrate on layout and ability to route  Effective method for many years Hangzhou Lou Scheffer II-20

General CMOS Problems

 Low drive strengths / low power  Capacitance (not intrinsic delay) plays a large role in performance  Huge variability – range between slowest possible and fastest possible  Noise affects delay  IR drop a big percentage of supply  Crosstalk can change delay by a factor of 2 Hangzhou Lou Scheffer II-21

Additional DSM Problems

 High density / huge designs  Very thin and resistive wires  Very high frequencies  Inductance becomes more important  Smaller voltages  IR drop a bigger fraction of signal swing  Clock skew and latency  Electromigration and noise Hangzhou Lou Scheffer II-22

Clock Distribution Problems

 Most common design approach requires close to zero skew  CMOS / DSM problems all affect clocks  Distribution problem increasing  Number of latches/flip-flops growing significantly  Power consumed in clock tree significant  

I

and noise also of concern Hangzhou Lou Scheffer II-23

Process Designers are trying to help

 Many metal layers  Different metal pitches  Small pitch for local interconnect  Big pitch/thick metal for long, fast wires  Copper wires, thick metal to lower R  SOI – Silicon On Insulator  Low k dielectrics  These help but are not enough Hangzhou Lou Scheffer II-24

Agenda

 Timing Analysis Overview  Traditional design flows  Summary of DSM Problems  Correction Methods Overview  Hierarchy and Timing Closure  Block Level Timing Closure  Experimental Results  Summary Hangzhou Lou Scheffer II-25

Timing Correction

 Fix electrical violations (slew and load). Takes priority since needed for reliability.

 Resize cells  Buffer nets  Copy (clone) cells  Fix timing problems  Local transforms (bag of tricks)  Path-based transforms Hangzhou Lou Scheffer II-26

Local Transforms

 Resize cells  Buffer or clone to reduce load on critical nets  Decompose large cells  Swap connections on commutative pins or among equivalent nets  Move critical signals forward  Pad early paths  Area recovery Hangzhou Lou Scheffer II-27

Transform Example

…..

Double Inverter Removal …..

…..

Delay = 4 Delay = 2

II-28 Hangzhou Lou Scheffer

a b

Resizing

?

d e f 0.2

0.2

0.3

a b a b A 0.035

Hangzhou

C 0.026

0.05

0.04

0.03

0.02

0.01

0 0 0.2

0.4

load 0.6

A B C 0.8

1

Lou Scheffer II-29

a b

Cloning

?

0.05

0.04

0.03

0.02

0.01

0 0 0.2

0.4

A load B 0.6

C 0.8

d e f g h 0.2

0.2

0.2

0.2

0.2

a b A B Can also isolate critical sinks 1

Hangzhou Lou Scheffer

d e f g h

II-30

a b

Buffering

0.05

0.04

0.03

0.02

0.01

0 0 0.2

0.4

A load B 0.6

C 0.8

?

d e f g h 0.2

0.2

0.2

0.2

0.2

a b B

Hangzhou Lou Scheffer

1 0.1

B d e 0.2

0.2

f g h 0.2

0.2

0.2

II-31

Redesign Fan-in Tree

Arr(a)=4 Arr(b)=3 a b Arr(c)=1 Arr(d)=0 c d 1 1 1 e Arr(e)=6

Hangzhou

c d 1 b 1 a 1 e Arr(e)=5

Lou Scheffer II-32

1

Redesign Fan-out Tree

3 1 1 Longest Path = 5

Hangzhou

1 1 1 3 1 1 1 2 1 Longest Path = 4 Slowdown of buffer due to load

Lou Scheffer II-33

Decomposition

Hangzhou Lou Scheffer II-34

Swap Commutative Pins

1 0 a 1 1 2 c b 1 5 2 Simple Sorting on arrival times and delay works 2 c 1 0 1 a b 2 1 1 3

Hangzhou Lou Scheffer II-35

Move Critical Signals Forward

a b c d a b d c

Hangzhou

e e

Lou Scheffer  

Based on ATPG

– linear in circuit size – Detects redundancies efficiently

Efficiently find wires to be added and remove.

– Based on mandatory assignments.

II-36

Path-based Transforms

 Path-based resizing  Unmap / remap a path or cone  Slack stealing  Retiming Hangzhou Lou Scheffer II-37

Slack Stealing

 Take advantage of timing behavior of level sensitive registers (latches)

C1 C2 0 Slack = -1 1 2 Slack = +1 C1 C1 C2

Hangzhou

C2

Lou Scheffer

Slack = 0

II-38

Retiming

Backward

Delay=3

Forward A more aggressive optimization since it changes the function

Hangzhou Lou Scheffer Delay=2 II-39

Solutions to Timing Closure

 Carry hierarchical logic design into physical  Hand / Custom design  Improved analysis  More sophisticated clock design  Modify existing flows  More physically knowledgeable tools  Many variations: combined synthesis/place/route, gain based synthesis, etc.

Hangzhou Lou Scheffer II-40

Agenda

 Analysis Methods Overview  Traditional design flows  Summary of DSM Problems  Correction Methods Overview  Hierarchy and Timing Closure  Block Level Timing Closure  Experimental Results  Summary Hangzhou Lou Scheffer II-41

Hierarchy and Physical Design

 Logical hierarchy can be carried over into physical design  Seems natural top-down approach, using floorplanning as a firm guide to physical design  Use of hierarchy offers many advantages and many possible problems  A new generation of tools for this problem Hangzhou Lou Scheffer II-42

Pin Assignment and Timing Budgeting

Block 1

L L

Block 3 Block 2

L

Each block requires:

 Content definition  Partitioning  Pin locations  Clock/timing definition  Set_input_delay  Set_output_delay  Set_drive  Set_load  Path exceptions (false, multicycle paths) Hangzhou Lou Scheffer II-43

Hierarchy and Physical Design Advantages…

 Run time of P&R tools  Blocks can be built independently  Early (

and valuable

) knowledge of global wires  Limited wire delay within macro may allows simpler methodologies  Contains the problem size  Extends naturally to SOC and mixed A/D chips  May be the only real method available Hangzhou Lou Scheffer II-44

Physical Hierarchy Disadvantages

 Possible to overconstrain the design in many ways (see next slide)  Hierarchy usually logic-based, not physically-based  Designed for logical correctness, not physical implementation Hangzhou Lou Scheffer II-45

Physical Hierarchy Overconstraints

 Placement solution perhaps overconstrained  Logical gates may not fit naturally in a rectangle  Ability to find a routable solution hindered  Can’t detour through neighboring cell  Boundary conditions explode and must be managed carefully to avoid surprises  A recent IBM design had 17,000 top level connections. A bad timing constraint on any one can make the whole design infeasible Hangzhou Lou Scheffer II-46

Hierarchy Example Plots

Hangzhou Lou Scheffer II-47

Hierarchy Example Plots

Hangzhou Lou Scheffer II-48

Hierarchy Example Plots

Hangzhou Lou Scheffer II-49

The Challenges

 How to derive sensible partitioning?

 How to achieve die utilization similar to “flat” approach?

 How to achieve clock speed and skews similar to “flat” approach?

 How to automatically generate optimal pin assignments for each module?

 How to automatically come up with realistic timing budgets for each module?

Hangzhou Lou Scheffer II-50

Basic Approach to solution

 Example tool – First Encounter  Start with a

Silicon Virtual Prototype

A near final quality ‘flat’ placement

Near legal routing RTL / gates

Known feasible solution for timing and routability

Use this solution to guide the final implemention

Partitioning, pin assignment, timing constraints

Build the blocks with more detailed tools.

Hangzhou Lou Scheffer

Silicon Virtual Prototype hierarchical partitioning and placement

Top Level

Top level buffering, clock balancing, and power grid GDSII

Block Level

Physical synthesis / placement and routing Chip assembly routing

II-51

Basic Approach continued

Logic Design: RTL, gates, IP, “black box” IP a = b + c

netlist const.

.lib

Physical Data

Physical Prototype

Complete “flat” physical design (proves timing and routability)

VERY FAST

Silicon Virtual Prototype Full-Chip Physical Prototype

Accurate timing and routability data in hours instead of days or weeks

Hand off a floorplan OR full placement

Block-Level Physical Synthesis and/or Route

Confidence the design will work once the blocks are re-assembled into the complete IC

Hangzhou Lou Scheffer II-52

In-Context Hierarchical Partitioning

 Pin assignment  Timing budgeting  Clock tree generation  Power grid planning Partitioning Hangzhou Independent block-level implementation Lou Scheffer SoC assembly II-53

In-Context Pin Assignment

Accurate Physical Prototype Flat Full-Chip Top Level Partition View

 Full-chip prototype results in optimal pin placement  Results in narrower channels and reduced die size  Reduces the routing congestion  Improves the chip timing Hangzhou Lou Scheffer II-54

In Context Timing Budgeting

Block 1

L L

Block 3 Block 2

L

Each block requires:

 Clock definition  Set_input_delay  Set_output_delay  Set_drive  Set_load  Path exceptions (false, multicycle paths)

Accurate timing budgets result in predictable timing convergence

Hangzhou Lou Scheffer II-55

Agenda

 Analysis Methods Overview  Traditional design flows  Summary of DSM Problems  Correction Methods Overview  Hierarchy and Timing Closure  Block Level Timing Closure  Experimental Results  Summary Hangzhou Lou Scheffer II-56

Blocks have timing closure

problems, too

Didn’t the big flat placement guarantee blocks are feasible? No, because  Block may not have been defined when global constraints were set  Global placer does not deal with all DSM effects  Block may be too hard for the relatively simple global placer (which must be very fast)  Requirements change as project progresses  Process technology may have changed  …..

Hangzhou Lou Scheffer II-57

Hand/Custom Design

 Mentioned for completeness  Hurts productivity  Yields highest performance  Can only fix a few things – for example:  Can realistically fix timing or crosstalk problems on a few nets  Cannot realistically change the size of blocks Hangzhou Lou Scheffer II-58

 

Improved Analysis Helps

Plot shows slack by net for two designs A 10% timing delta -> many more bad nets  Often the difference between success and failure 3500 3000 2500 2000 1500 1000 500 0 0 -5 Hangzhou 5 10

Slack Relative to Worst Net (ns)

Lou Scheffer 15 Series1 Series2 20 II-59

More accurate analysis

 Crosstalk induced delay  Old approach – overestimate coupling C  Better – compute nominal timing + xtalk delta  Customer example from CadMos  Ignore crosstalk completely  Not an acceptable alternative 400 MHz  Coupling Caps overestimated by 60% 300 MHz  Nominal delays + computed crosstalk 333 MHz  More accurate analysis gains 10% margin Hangzhou Lou Scheffer II-60

Increased accuracy helps

 Global/detailed route correlation  Any global route better than Wire Load Models or Steiner trees, since global routes consider congestion  But to get that last 10%, need global/detailed router link  Knowing some nets must detour is good, but….

 Which net takes which detour is needed for good correlation Hangzhou Lou Scheffer II-61

Modified clock design

 Zero skew is not necessary, and often not even desirable  We have the freedom to adjust clock arrival times at memory elements  This obtains more margin and thus helps convergence  Similar to retiming but less disruptive  Improvement very design dependent  If worst path is flip-flop to itself, doesn’t help  May impact scan chains Hangzhou Lou Scheffer II-62

Previous attempts to fix block closure

 Without the radical step of combining synthesis and placement, designers have tried:  Allow placer to do sizing and buffering  Do post placement optimization  Simple transformations  Use existing placement  Do post placement re-synthesis  Complex transformations allowed  Needs incremental placement and extraction  But these have not been fully successful  Why? Re-examine the root cause of discrepancies  Wire load models and their limitations  Combined Synthesis/Placement/Routing Hangzhou Lou Scheffer II-63

Post-Placement Optimization

Design Entry Synthesis w/Timing Place Re-run Synthesis w/Timing Route Timing 1.

2.

In-place optimizations Minimally disturb placement optimizations Hangzhou Lou Scheffer II-64

Post-Placement Optimization

 In-place (little or no placement impact)  Resizing (carefully)  Pin swapping, some tree rebuilding  Wire sizing / typing  Minimally disruptive  Resizing  Buffering  Cloning  Tree rebuilding  Cell removal Hangzhou Lou Scheffer II-65

In-place Optimization

 Not

too

difficult  Can use extracted electrical data (C, RC) from placement tool  Some changes affect pin locations, but may be ignored  Tree rebuilding needs incremental extraction  Can use timing reports for timing data  But, accuracy suffers as changes are made  Real RC data replaced by estimates again Hangzhou Lou Scheffer II-66

In-place Optimization

Resize swap pins rebuild trees Placed netlist Placement & extraction Optimization C/RC data Opt’d netlist Hangzhou Lou Scheffer II-67

Place-disruptive Optimization

 Nets changing implies…  Must be able to recompute C and RC  May need to incrementally place new cells  Need incremental timing capability Hangzhou Lou Scheffer II-68

Place-disruptive Optimization

Resize buffer clone cell removal rebuild trees Placed netlist Placement & extraction Optimization with placer, timer, extractor C/RC data Opt’d netlist Hangzhou Lou Scheffer II-69

What are the problems?

 Getting the timing right  Different timers used at different stages  Do the optimizer and placer see the same worst paths as the static timer?

 Design size / tool capacity  Using synthesis technology on flat designs Hangzhou Lou Scheffer II-70

More problems

 Incompatible tools, formats  Placer, synthesizer, timer may all use different file format, may all be different vendors  Basic interoperability issues  Incremental placer needed for new cells  Doesn’t have to be smart  But might produce some infeasible solutions  Must be integrated with optimizer Hangzhou Lou Scheffer II-71

Still more challenges/problems

 Extraction/Estimation of net data  Any optimization which significantly alters net topology needs this ability  Insert cells  Remove cells  Move connections from one cell to another  Steiner tree estimation  Net C and delay (RC) calculator  Do results match detail router and other extraction tools?

Hangzhou Lou Scheffer II-72

Sample Optimization Results

Design Worst slack / # misses Synthesized Placed Opt C1 -1 / 2000 -12 / 38k -2 / 1400 Cycle time 7.5 ns Tech .25 µm V1 T1 P1 V2 0 / 0 -12 / 15k -0.3 / 100 -0.5 / 500 -11 / 2000 -4 / 1000 7.5 ns .18 µm -0.5 / 2000 -48 / 164k -6 / 62k 2.5-10 ns .18 µm -0.4 / 100 -97 / 43k -13 / 20k 8 ns .25 µm 7.5 ns .18 µm Hangzhou Lou Scheffer II-73

Root Problem is Wire Load Models

 Main problem: correlation between Pre P&R estimates and Post-P&R extraction  If correlation is good…  Problems detected and potentially fixed

early

 If correlation is bad…  Problems detected

late

 Not a good situation! Need to re-write RTL is worst case for timing closure.

Hangzhou Lou Scheffer II-74

Why are Wire Load Models Used?

 Can’t complete layout until logic design is complete  Can’t complete logic design without timing  Can’t time without load and net delay data  Can’t extract load and net delay data until layout is complete  Can’t complete layout … Hangzhou Lou Scheffer II-75

WLM solution – use statistics

 Don’t know specific layout data  But we know something about statistical properties  Average net load, average net delay  Further refine using other characteristics  Number of sinks  Size of design (number of circuits)  Physical size Hangzhou Lou Scheffer II-76

Correlation Pre/Post-P&R using averages

Wire load models

of physical design give synthesis an

estimate

 We can correlate

averages

pre- and post P&R as accurately as needed  If specific design has average behavior, its timing,

on average

, can be predicted  Otherwise, a pass through placement can provide correct WLM for a design, and get the averages right Hangzhou Lou Scheffer II-77

Timing and averages

 WLMs OK for area, power (properties that are sums are well handled by statistics)  But, timing dictated by the worst

specific

path  That path is built of

individual

nets  One net can determine the speed of an entire design  Reality: poor correlation for relatively few nets can cause major headaches Hangzhou Lou Scheffer II-78

Correlation Pre/Post-P&R Averages and Wire Loads

Distribution of C / fan-out 30000 25000 20000 15000 10000 5000 0

median mean

0 10 20 30 40 50 60 70 pF per fan-out 80 90 100 110 Note the very long tails of this distribution

Hangzhou Lou Scheffer II-79

Correlation Pre/Post-P&R C

wire

Data by Logic Design

C wire

Hangzhou Lou Scheffer

Number of fan-outs

II-80

Better Wire Load Models

 How can we use information from one pass through physical design?

 Adjust wire load model coefficients 

Back annotate

specific net load and delay data to the logic design  New problem: correlation of logic pre- and post synthesis  But, there are fundamental limits to statistical models – a new approach is needed .

Hangzhou Lou Scheffer II-81

A better (but harder) approach: Combine Synthesis, P & R

 Don’t use wire load models at all  Synthesis does a trial placement as it runs  Loading found from estimated routes  For best results, must include global routing  Then, feed global route to detailed router  Or, do detailed route itself  Much better correlation and timing closure  No inter-tool data transfer headaches Hangzhou Lou Scheffer II-82

Example of Combined SP&R

Video Graphics Engine  160k instances  70 macros (blocks)  5 layers, 0.18 micron  Target freq: 100Mhz Hangzhou Lou Scheffer II-83

Conventional Flow

Func. & Timing .lib

 More than 20 Iterations  89MHz best result w/manual changes syn2GCF Floorplan DEF Func. & Timing .TLF

Physical LEF Synthesis DC Static Timing PT SE Placement base optimization Global route Detail route Extraction Delay calc Pearl DRC II-84 Hangzhou Lou Scheffer

Combined SP&R Flow

  100MHz final result, met timing Correlation within + - 2.1%  One pass  12hrs 20min runtime Static Timing PT write_constraints TCL Constraints Floorplan DEF Func. & Timing .TLF

EDIF netlist SE-PKS PKS Optimization Global Route Static Timing Physical LEF Detail route HE Extraction Delay calc Pearl DRC Hangzhou Lou Scheffer II-85

Slack Correlation

PKS Routed

Hangzhou Lou Scheffer

Wire Load Based

II-86

Enlargement of SP&R slack

Hangzhou Lou Scheffer II-87

Results from combined SP&R

Case size instances (k) 1 2 3 350 250 50 4 160 macros PKS timing error (%) 56 50 4 + - 3% + - 3% + - 0.96% 70 + - 2.1% max freq (MHz) conventional SP&R 140 97 93 140 100 95 89 100 Hangzhou Lou Scheffer II-88

Agenda

 Traditional design flows  Summary of DSM Problems  Analysis Methods Overview  Correction Methods Overview  Approaches to Fixing Timing Closure  Experimental Results  Summary Hangzhou Lou Scheffer II-89

Experimental Results

 For Hierarchical design, two objectives  Should be faster and higher capacity than a fully detailed flat design, to find problems earlier  Resulting partitions should be realizable  For Block Design, compare different strategies  Can overconstrain clock or wire models  Can do IPO or not after placing  Can allow placer to change size or not  Can test combined synthesis/placement against running the two tools separately Hangzhou Lou Scheffer II-90

Hierarchical Experimental results

   Design 580K cells, 0.25um process, 5LM, 100MHz Data collected on a 500MHz processor workstation First Encounter Flow Resulting blocks were realizable Traditional Flow

(*) SPC Trial Route

6x 1x 60x Design Import Detail Place Hangzhou 56x Detail Route* 57x RC Extract 7x 33x Delay Calculation Timing Analysis Lou Scheffer 5x IPO Design Iteration II-91

How do different block design approaches compare?

Jay McDougal of Agilent ran many flows on the same design 1.

2.

3.

4.

5.

Overconstrain clock by various amounts  Accurate or conservative WLMs Tried many levels of conservatism Allow placer to size or not Do post placement optimization or not Physically knowledgeable synthesis Hangzhou Lou Scheffer II-92

Characteristics of sample design

 Design not very difficult  ColdFire processor  80K instances  0.25 micron library  5 layer process, not congestion dominated  Design goal was 180 MHz, known to be possible with this design  85% of delay in gates; 15% in interconnect  0.18/0.13 micron, bigger designs will show bigger differences between techniques Hangzhou Lou Scheffer II-93

Key to the plot of results

 Basic flow – Design Compiler & QPlace  TDD = timing driven design  In addition to minimizing wire length and congestion, placer is given timing constraints and allowed to change gate sizes  IPO and PBO are post placement optimizers  IPO – runs on synthesis DB with back annotation  PBO – runs on physical DB with synthesis transforms  PKS = Physically Knowledgeable Synthesis (combined Synthesis/Place/Route) Hangzhou Lou Scheffer II-94

Comparison of Approaches

9.5

9 8.5

8 7.5

7 6.5

6 5.5

5 0.95

Required Cycle time 1.05

Relative size 1.15

1.25

No custom WLM 90% WLM 3ns;50%WL IPO 5ns NoWL IPO 3ns NoWL TDD/PBO 50%WL TDD/PBO 90%WL PKS

Hangzhou Lou Scheffer II-95

Comparison of Approaches

9.5

9 8.5

8 7.5

7 6.5

6 5.5

5 0.95

Good area, but iterates between placement and synthesis, worst TTM, didn’t hit timing target One tool, no iteration, better TTM, hit timing target 1.05

Relative size 1.15

1.25

No WLM 90% WLM 3ns;50%WL IPO 5ns NoWL IPO 3ns NoWL TDD/PBO 50%WL TDD/PBO 90%WL PKS

Hangzhou Lou Scheffer II-96

Agenda

 Traditional design flows  Summary of DSM Problems  Analysis Methods Overview  Correction Methods Overview  Approaches to Fixing Timing Closure  Experimental Results  Summary Hangzhou Lou Scheffer II-97

Good News

 At least we understand the problem  Analysis of timing is well understood  Transformations that help timing are well understood  DSM effects are painful but can be controlled Hangzhou Lou Scheffer II-98

Bad News

 Cycle time and technology advances demand more and more sophisticated optimization techniques  In previous flows, corrections must be applied in separate tools  Disconnects among various tools involved increases turn-around-time and limits optimization Hangzhou Lou Scheffer II-99

Good News

 The Bad News is commonly recognized  Many tool vendors, academics, in-house EDA researchers are working to solve these problems  A new generation of tools is already available that was designed from the ground up to address timing closure  Hierarchical and block design Hangzhou Lou Scheffer II-100

Bad News

 These problems won’t be the last!

 Each process generation brings new problems  Increased size  Weird process rules (antenna)  Possible new effects (single event upset) Hangzhou Lou Scheffer II-101

Summary

 Timing closure is a very real problem  Large chips need hierarchical tools to help with partitioning and budgeting  Block tools must understand synthesis, placement and routing  wire load models have serious limitations  Best approach is combined synthesis/P&R  Experimental data backs this up Hangzhou Lou Scheffer II-102

Acknowledgements

 Tony Drumm wrote the original set of slides for this lecture, including many of the examples. He credits:  Alex Suess  José Neves  Bill Joyner  IBM Rochester EDA folks  But the conclusions, and any mistakes, are mine Hangzhou Lou Scheffer II-103

Hangzhou Lou Scheffer II-104