Scaling and CbC Design

Download Report

Transcript Scaling and CbC Design

The Scaling Challenge: Can Correct-by-Construction Design Help?

Prashant Saxena Noel Menezes Pasquale Cocchini Desmond Kirkpatrick

Intel Labs (CAD Research) Hillsboro OR International Symposium on Physical Design Monterey, CA

Apr 16, 2003

ISPD’03

Repeaters, which are already a full-chip headache, will become critical at the block level also

2

ISPD’03

Outline

Some scaling experiments

– Spice simulations 

Implications for post-RTL design

Correct-by-Construction (CbC) design

– What’s the promise? What’s missing?

3

ISPD’03

A Scaling Primer

 Process scaling: –

Devices shrink 0.7x, delay 0.7x

Wires shrink 0.7x

– R/ m increases 2x, C/ m unchanged –

So, (delay/scaled

m)

increases 1.4x

Block area often stays same

# cells, # nets doubles

Wiring histogram shape invariant

S G D

4

Critical Repeater Lengths

1 0.8

Relative Critical Repeater Length 0.6

0.4

0.2

0.57x

Optimally-sized uniformly for min delay

Min distance at which inserting a repeater speeds up the line

0 90nm 65nm 45nm 32nm

In line with scaling theory:

s s

 0 .

586

M6 M3

“Ideally shrunk” circuit requires additional repeaters

(0.7x

vs

0.57x)

5 ISPD’03

Critical Sequential Lengths

7 6 Relative Critical 5 4 Seq. Length 3 2 1 0

7 6

# rep. between

4 5 3

FFs

2 1 0

ISPD’03 0.43x

M6 M3

 

Optimized for max distance in one clock period

Assumes: – 2x frequency scaling, 5GHz on 90nm –

Ignores setup, hold, skew

90nm 65nm 45nm 32nm 0.75x

90nm 65nm 45nm 32nm

“Ideally shrunk” circuit:

Requires much new wire pipelining

(0.7x

vs

0.43x) –

Ratio of regular to clocked repeaters decreasing

6

Block Wiring Histogram and Critical Repeater Lengths

7 Metal M6 M3 Process 90nm 65nm 45nm 32nm

Critical lengths migrating rapidly to the left… (zoomed view coming up)

ISPD’03

100000 10000 1000 100 10 1

Block Wiring Histogram:

Zoomed View

Critical Repeater Lengths

Metal M6 M3 Process 90nm 65nm 45nm 32nm Increasingly steep slope of curve

(log scale)

ISPD’03 => # impacted nets exploding!

8

Block Wiring Histogram and Critical Sequential Lengths

9 M6 M3 90nm 65nm 45nm 32nm ISPD’03 # pipelined nets growing from negligible (90nm) to substantial (32nm)

Repeated Block-level Nets

35 30 25 20 15 10 5

M3 M6 Ever-increasing %age of block level nets requires repeaters

0

90nm 65nm 45nm 32nm ISPD’03 Even the rate of growth is accelerating!

…especially for clocked repeaters

14 12 10 8 6 4 2 0 M3 M6 90nm 65nm 45nm 32nm 10

Total Repeater Count

80 70 60 c lk - re p re p t o t - re p 50 40 30 20 10

Ever-increasing fractions of total cell count will be repeaters

70% in 32nm (and this

omits FC repeaters within block !)

0 ISPD’03 90nm 65nm 45nm 32nm Total repeater count is independent of frequency scaling assumptions 11

ISPD’03

So, what’s changing?

 

Interconnects scaling worse than devices

….in spite of optimal (re-)buffering

# repeaters increasing exponentially Interconnect repeaters will comprise significant fraction of cells in block Even block-level nets will need to be pipelined 12

Implications on Synthesis

Literal/Gate count and fanout metrics misleading

Major delay contribution from communication

– –

Fanouts often isolated by repeaters Area often wire-limited

Sizing often determined by (predictable) repeater load

Pre-layout sizing wasted ISPD’03 13

Implications on Synthesis

14

   

Less logic per pipeline stage Combinational synthesis: max benefit shrinking Synthesis across sequential boundaries Methodological support for retiming ISPD’03

Implications on Synthesis

15

  

Bandwidth ceiling

Hard to move data around for computation

Logic replication

Encourage low fans Dense encodings Distribution of computation across channel ISPD’03

Implications on Layout

Routing

– –

Must understand repeater insertion Fine power grid => templated routing?

S S S S

Placement with repeaters

Intra-block nets: # repeaters depends on routing

– –

OTH routes: fixed obstructions

Add buffering into placement core

… as opposed to ECO postprocessing

a a a b b b

V SS V SS

16 ISPD’03

Implications on Layout

Latency-constrained placement

– m

arch sub-optimality

Hard constraint per stage

(unlike delay) OR 

Post-RTL latency optimization

Methodological nightmare

Delay insensitive design?

90nm 32nm

ISPD’03 17

Implications on FC Assembly

What if we reduce block area to avoid wire effects?

Many of the new physical synthesis problems go away

BUT # blocks triples!

(and block assembly is the hardest part of chip design!)

 

Flat assembly

(Fragmentation of paths across blocks)

OR Increased hierarchy

(Lack of visibility across hierarchy levels)

ISPD’03 18

The CbC Link

Process scaling => worsening predictability Predictability => CbC design But current CbC approaches too rigid Can we still apply them?

19 ISPD’03

Principles of CbC Design

More predictability

Reduced estimation error improves high-level optimizations

Break the design-verification loop

Sequence of small, guaranteed-correct transformations

No unexpected deterioration of secondary metrics

Avoid micro-engineering

Design productivity gap 20 ISPD’03

Abstract Fabrics

Structural fabrics: too resource-intensive

e.g. DWF: 50% routing tracks 

Use algorithmic fabrics instead

Prune to subspace with desirable CbC properties

e.g. Non uniform power grid using “min power pitch” (ISPD’02) Guaranteed throughput bus design (ICCAD’02) –

CbC rules-of-thumb

e.g. Bound on max adjacent runs of signals

Performance with predictability

ISPD’03 21

CbC Block Construction

“Vertical” partitioning and successive refinement

Coarse layout of unsynthesized design

Successive refinement of “vertical” partitions

Critical partitions first

Different partitions exist at different level of refinement

Hierarchical engines

Enables early repeater prediction

RTL Synth/mapped netlist Placed/buffered netlist GR/track-assigned layout

22 ISPD’03

CbC Full Chip Assembly

Latency prediction for full-chip interconnects

Preferential routing for performance-critical nets

Flip-flop staging on non-critical nets

Performance prediction with cycle latency ranges 23

Block area mis-prediction tolerance

Move blocks without re-implementation

Global communication grids ISPD’03

ISPD’03

Summing Up

 

Repeaters becoming critical at the block level Most post-RTL design problems changing fundamentally

Combination of algorithmic and methodological advances required

CbC approaches viable, but at the abstract level

– Current structural fabrics too resource intensive – Achieve predictability through algorithmic fabrics

24

Backup Slides

ISPD’03

PIE (Process Independent Exploration) Models

      

To provide an easier way to study interconnect structures and their trends in future CMOS processes To be used in place of fudged process files Analytical models directly correlating to device and interconnect physics

Device models based on BSIM3 equations including major 2 nd order effects

Accurate mobility and velocity saturation models, DIBL and channel length modulation approximation

Continuous from weak to strong inversion

Interconnect models with 2D fringe capacitance approximation

Scattering not accounted for Entire process expressed by small set of physically meaningful process parameters (e.g. T ox , V th , k ild , etc.) in PEF (Process Exploration File) files

16 for devices

6 each metal layer Test cases simulated as SPICE netlists PIE models implemented as behavioral sources Calibrated against existing process files 26