Transcript Scaling and CbC Design
The Scaling Challenge: Can Correct-by-Construction Design Help?
Prashant Saxena Noel Menezes Pasquale Cocchini Desmond Kirkpatrick
Intel Labs (CAD Research) Hillsboro OR International Symposium on Physical Design Monterey, CA
Apr 16, 2003
ISPD’03
Repeaters, which are already a full-chip headache, will become critical at the block level also
2
ISPD’03
Outline
Some scaling experiments
– Spice simulations
Implications for post-RTL design
Correct-by-Construction (CbC) design
– What’s the promise? What’s missing?
3
ISPD’03
A Scaling Primer
Process scaling: –
Devices shrink 0.7x, delay 0.7x
–
Wires shrink 0.7x
– R/ m increases 2x, C/ m unchanged –
So, (delay/scaled
m)
increases 1.4x
Block area often stays same
–
# cells, # nets doubles
–
Wiring histogram shape invariant
S G D
4
Critical Repeater Lengths
1 0.8
Relative Critical Repeater Length 0.6
0.4
0.2
0.57x
Optimally-sized uniformly for min delay
–
Min distance at which inserting a repeater speeds up the line
0 90nm 65nm 45nm 32nm
In line with scaling theory:
s s
0 .
586
M6 M3
“Ideally shrunk” circuit requires additional repeaters
(0.7x
vs
0.57x)
5 ISPD’03
Critical Sequential Lengths
7 6 Relative Critical 5 4 Seq. Length 3 2 1 0
7 6
# rep. between
4 5 3
FFs
2 1 0
ISPD’03 0.43x
M6 M3
Optimized for max distance in one clock period
Assumes: – 2x frequency scaling, 5GHz on 90nm –
Ignores setup, hold, skew
90nm 65nm 45nm 32nm 0.75x
90nm 65nm 45nm 32nm
“Ideally shrunk” circuit:
–
Requires much new wire pipelining
(0.7x
vs
0.43x) –
Ratio of regular to clocked repeaters decreasing
6
Block Wiring Histogram and Critical Repeater Lengths
7 Metal M6 M3 Process 90nm 65nm 45nm 32nm
Critical lengths migrating rapidly to the left… (zoomed view coming up)
ISPD’03
100000 10000 1000 100 10 1
Block Wiring Histogram:
Zoomed View
Critical Repeater Lengths
Metal M6 M3 Process 90nm 65nm 45nm 32nm Increasingly steep slope of curve
(log scale)
ISPD’03 => # impacted nets exploding!
8
Block Wiring Histogram and Critical Sequential Lengths
9 M6 M3 90nm 65nm 45nm 32nm ISPD’03 # pipelined nets growing from negligible (90nm) to substantial (32nm)
Repeated Block-level Nets
35 30 25 20 15 10 5
M3 M6 Ever-increasing %age of block level nets requires repeaters
0
90nm 65nm 45nm 32nm ISPD’03 Even the rate of growth is accelerating!
…especially for clocked repeaters
14 12 10 8 6 4 2 0 M3 M6 90nm 65nm 45nm 32nm 10
Total Repeater Count
80 70 60 c lk - re p re p t o t - re p 50 40 30 20 10
Ever-increasing fractions of total cell count will be repeaters
–
70% in 32nm (and this
omits FC repeaters within block !)
0 ISPD’03 90nm 65nm 45nm 32nm Total repeater count is independent of frequency scaling assumptions 11
ISPD’03
So, what’s changing?
Interconnects scaling worse than devices
….in spite of optimal (re-)buffering
# repeaters increasing exponentially Interconnect repeaters will comprise significant fraction of cells in block Even block-level nets will need to be pipelined 12
Implications on Synthesis
Literal/Gate count and fanout metrics misleading
–
Major delay contribution from communication
– –
Fanouts often isolated by repeaters Area often wire-limited
Sizing often determined by (predictable) repeater load
–
Pre-layout sizing wasted ISPD’03 13
Implications on Synthesis
14
Less logic per pipeline stage Combinational synthesis: max benefit shrinking Synthesis across sequential boundaries Methodological support for retiming ISPD’03
Implications on Synthesis
15
Bandwidth ceiling
–
Hard to move data around for computation
Logic replication
–
Encourage low fans Dense encodings Distribution of computation across channel ISPD’03
Implications on Layout
Routing
– –
Must understand repeater insertion Fine power grid => templated routing?
S S S S
Placement with repeaters
–
Intra-block nets: # repeaters depends on routing
– –
OTH routes: fixed obstructions
Add buffering into placement core
… as opposed to ECO postprocessing
a a a b b b
V SS V SS
16 ISPD’03
Implications on Layout
Latency-constrained placement
– m
arch sub-optimality
–
Hard constraint per stage
(unlike delay) OR
Post-RTL latency optimization
–
Methodological nightmare
–
Delay insensitive design?
90nm 32nm
ISPD’03 17
Implications on FC Assembly
What if we reduce block area to avoid wire effects?
Many of the new physical synthesis problems go away
BUT # blocks triples!
(and block assembly is the hardest part of chip design!)
Flat assembly
(Fragmentation of paths across blocks)
OR Increased hierarchy
(Lack of visibility across hierarchy levels)
ISPD’03 18
The CbC Link
Process scaling => worsening predictability Predictability => CbC design But current CbC approaches too rigid Can we still apply them?
19 ISPD’03
Principles of CbC Design
More predictability
–
Reduced estimation error improves high-level optimizations
Break the design-verification loop
–
Sequence of small, guaranteed-correct transformations
–
No unexpected deterioration of secondary metrics
Avoid micro-engineering
–
Design productivity gap 20 ISPD’03
Abstract Fabrics
Structural fabrics: too resource-intensive
e.g. DWF: 50% routing tracks
Use algorithmic fabrics instead
–
Prune to subspace with desirable CbC properties
e.g. Non uniform power grid using “min power pitch” (ISPD’02) Guaranteed throughput bus design (ICCAD’02) –
CbC rules-of-thumb
e.g. Bound on max adjacent runs of signals
Performance with predictability
ISPD’03 21
CbC Block Construction
“Vertical” partitioning and successive refinement
–
Coarse layout of unsynthesized design
–
Successive refinement of “vertical” partitions
–
Critical partitions first
–
Different partitions exist at different level of refinement
–
Hierarchical engines
–
Enables early repeater prediction
RTL Synth/mapped netlist Placed/buffered netlist GR/track-assigned layout
22 ISPD’03
CbC Full Chip Assembly
Latency prediction for full-chip interconnects
–
Preferential routing for performance-critical nets
–
Flip-flop staging on non-critical nets
–
Performance prediction with cycle latency ranges 23
Block area mis-prediction tolerance
–
Move blocks without re-implementation
–
Global communication grids ISPD’03
ISPD’03
Summing Up
Repeaters becoming critical at the block level Most post-RTL design problems changing fundamentally
Combination of algorithmic and methodological advances required
CbC approaches viable, but at the abstract level
– Current structural fabrics too resource intensive – Achieve predictability through algorithmic fabrics
24
Backup Slides
ISPD’03
PIE (Process Independent Exploration) Models
To provide an easier way to study interconnect structures and their trends in future CMOS processes To be used in place of fudged process files Analytical models directly correlating to device and interconnect physics
–
Device models based on BSIM3 equations including major 2 nd order effects
–
Accurate mobility and velocity saturation models, DIBL and channel length modulation approximation
–
Continuous from weak to strong inversion
–
Interconnect models with 2D fringe capacitance approximation
–
Scattering not accounted for Entire process expressed by small set of physically meaningful process parameters (e.g. T ox , V th , k ild , etc.) in PEF (Process Exploration File) files
–
16 for devices
–
6 each metal layer Test cases simulated as SPICE netlists PIE models implemented as behavioral sources Calibrated against existing process files 26