Scaling and CbC Design

Download Report

Transcript Scaling and CbC Design

The Scaling Challenge: Can Correct-by-Construction Design Help?

Prashant Saxena Noel Menezes Pasquale Cocchini Desmond Kirkpatrick

Intel Labs (CAD Research) Hillsboro OR International Symposium on Physical Design Monterey, CA

Apr 16, 2003

ISPD’03

Repeaters, which are already a full-chip headache, will become critical at the block level also

ISPD’03

Outline



Some scaling experiments

– Spice simulations 

Implications for post-RTL design



Correct-by-Construction (CbC) design

– What’s the promise? What’s missing?

ISPD’03

A Scaling Primer

 Process scaling: –

Devices shrink 0.7x, delay 0.7x

–

Wires shrink 0.7x

– R/ m increases 2x, C/ m unchanged –

So, (delay/scaled

increases 1.4x



Block area often stays same

–

# cells, # nets doubles

–

Wiring histogram shape invariant

S G D

Critical Repeater Lengths

1 0.8

Relative Critical Repeater Length 0.6

0.4

0.2

0.57x



Optimally-sized uniformly for min delay

–

Min distance at which inserting a repeater speeds up the line

0 90nm 65nm 45nm 32nm

In line with scaling theory:

s s

 0 .

586

M6 M3



“Ideally shrunk” circuit requires additional repeaters

(0.7x

0.57x)

5 ISPD’03

Critical Sequential Lengths

7 6 Relative Critical 5 4 Seq. Length 3 2 1 0

7 6

# rep. between

4 5 3

FFs

2 1 0

ISPD’03 0.43x

M6 M3

 

Optimized for max distance in one clock period

Assumes: – 2x frequency scaling, 5GHz on 90nm –

Ignores setup, hold, skew

90nm 65nm 45nm 32nm 0.75x

90nm 65nm 45nm 32nm



“Ideally shrunk” circuit:

–

Requires much new wire pipelining

(0.7x

0.43x) –

Ratio of regular to clocked repeaters decreasing

Block Wiring Histogram and Critical Repeater Lengths

7 Metal M6 M3 Process 90nm 65nm 45nm 32nm

Critical lengths migrating rapidly to the left… (zoomed view coming up)

ISPD’03

100000 10000 1000 100 10 1

Block Wiring Histogram:

Zoomed View

Critical Repeater Lengths

Metal M6 M3 Process 90nm 65nm 45nm 32nm Increasingly steep slope of curve

(log scale)

ISPD’03 => # impacted nets exploding!

Block Wiring Histogram and Critical Sequential Lengths

9 M6 M3 90nm 65nm 45nm 32nm ISPD’03 # pipelined nets growing from negligible (90nm) to substantial (32nm)

Repeated Block-level Nets

35 30 25 20 15 10 5

M3 M6 Ever-increasing %age of block level nets requires repeaters

90nm 65nm 45nm 32nm ISPD’03 Even the rate of growth is accelerating!

…especially for clocked repeaters

14 12 10 8 6 4 2 0 M3 M6 90nm 65nm 45nm 32nm 10

Total Repeater Count

80 70 60 c lk - re p re p t o t - re p 50 40 30 20 10



Ever-increasing fractions of total cell count will be repeaters

–

70% in 32nm (and this

omits FC repeaters within block !)

0 ISPD’03 90nm 65nm 45nm 32nm Total repeater count is independent of frequency scaling assumptions 11

ISPD’03

So, what’s changing?

 

Interconnects scaling worse than devices

….in spite of optimal (re-)buffering

# repeaters increasing exponentially Interconnect repeaters will comprise significant fraction of cells in block Even block-level nets will need to be pipelined 12

Implications on Synthesis



Literal/Gate count and fanout metrics misleading

–

Major delay contribution from communication

– –

Fanouts often isolated by repeaters Area often wire-limited



Sizing often determined by (predictable) repeater load

–

Pre-layout sizing wasted ISPD’03 13

Implications on Synthesis

   

Less logic per pipeline stage Combinational synthesis: max benefit shrinking Synthesis across sequential boundaries Methodological support for retiming ISPD’03

Implications on Synthesis

  

Bandwidth ceiling

–

Hard to move data around for computation



Logic replication

–

Encourage low fans Dense encodings Distribution of computation across channel ISPD’03

Implications on Layout



Routing

– –

Must understand repeater insertion Fine power grid => templated routing?

S S S S



Placement with repeaters

–

Intra-block nets: # repeaters depends on routing

– –

OTH routes: fixed obstructions

Add buffering into placement core

… as opposed to ECO postprocessing

a a a b b b

V SS V SS

16 ISPD’03

Implications on Layout



Latency-constrained placement

– m

arch sub-optimality

–

Hard constraint per stage

(unlike delay) OR 

Post-RTL latency optimization

–

Methodological nightmare

–

Delay insensitive design?

90nm 32nm

ISPD’03 17

Implications on FC Assembly

What if we reduce block area to avoid wire effects?

Many of the new physical synthesis problems go away

BUT # blocks triples!

(and block assembly is the hardest part of chip design!)

 

Flat assembly

(Fragmentation of paths across blocks)

OR Increased hierarchy

(Lack of visibility across hierarchy levels)

ISPD’03 18

The CbC Link

Process scaling => worsening predictability Predictability => CbC design But current CbC approaches too rigid Can we still apply them?

19 ISPD’03

Principles of CbC Design



More predictability

–

Reduced estimation error improves high-level optimizations



Break the design-verification loop

–

Sequence of small, guaranteed-correct transformations

–

No unexpected deterioration of secondary metrics



Avoid micro-engineering

–

Design productivity gap 20 ISPD’03

Abstract Fabrics



Structural fabrics: too resource-intensive

e.g. DWF: 50% routing tracks 

Use algorithmic fabrics instead

–

Prune to subspace with desirable CbC properties

e.g. Non uniform power grid using “min power pitch” (ISPD’02) Guaranteed throughput bus design (ICCAD’02) –

CbC rules-of-thumb

e.g. Bound on max adjacent runs of signals

Performance with predictability

ISPD’03 21

CbC Block Construction



“Vertical” partitioning and successive refinement

–

Coarse layout of unsynthesized design

–

Successive refinement of “vertical” partitions

–

Critical partitions first

–

Different partitions exist at different level of refinement

–

Hierarchical engines

–

Enables early repeater prediction

RTL Synth/mapped netlist Placed/buffered netlist GR/track-assigned layout

22 ISPD’03

CbC Full Chip Assembly



Latency prediction for full-chip interconnects

–

Preferential routing for performance-critical nets

–

Flip-flop staging on non-critical nets

–

Performance prediction with cycle latency ranges 23



Block area mis-prediction tolerance

–

Move blocks without re-implementation

–

Global communication grids ISPD’03

ISPD’03

Summing Up

 

Repeaters becoming critical at the block level Most post-RTL design problems changing fundamentally



Combination of algorithmic and methodological advances required



CbC approaches viable, but at the abstract level

– Current structural fabrics too resource intensive – Achieve predictability through algorithmic fabrics

Backup Slides

ISPD’03

PIE (Process Independent Exploration) Models

      

To provide an easier way to study interconnect structures and their trends in future CMOS processes To be used in place of fudged process files Analytical models directly correlating to device and interconnect physics

–

Device models based on BSIM3 equations including major 2 nd order effects

–

Accurate mobility and velocity saturation models, DIBL and channel length modulation approximation

–

Continuous from weak to strong inversion

–

Interconnect models with 2D fringe capacitance approximation

–

Scattering not accounted for Entire process expressed by small set of physically meaningful process parameters (e.g. T ox , V th , k ild , etc.) in PEF (Process Exploration File) files

–

16 for devices

–

6 each metal layer Test cases simulated as SPICE netlists PIE models implemented as behavioral sources Calibrated against existing process files 26

Scaling and CbC Design

Transcript Scaling and CbC Design

Repeaters, which are already a full-chip headache, will become critical at the block level also

Outline

Some scaling experiments

Implications for post-RTL design

Correct-by-Construction (CbC) design

A Scaling Primer

Critical Repeater Lengths

Critical Sequential Lengths

Block Wiring Histogram and Critical Repeater Lengths

Block Wiring Histogram:

Zoomed View

Block Wiring Histogram and Critical Sequential Lengths

Repeated Block-level Nets

Total Repeater Count

So, what’s changing?

Implications on Synthesis

Implications on Synthesis

Implications on Synthesis

Implications on Layout

Implications on Layout

Implications on FC Assembly

The CbC Link

Principles of CbC Design

Abstract Fabrics

CbC Block Construction

CbC Full Chip Assembly

Summing Up

PIE (Process Independent Exploration) Models

Directory