No Slide Title

Download Report

Transcript No Slide Title

are wires plannable?
Ralph H.J.M. Otten
Eindhoven University of Technology
Eindhoven, The Netherlands
[email protected]
Giuseppe S. Garcea
Delft University of Technology
Delft, The Netherlands
[email protected]
wire planning
Ralph Otten

1987: providing floorplan design with alignment constraints
floorplan is a data structure capturing the relative positions
(i.e. no geometry, possibly overlap, several optimizations)
 alignment to save wire area (data path generator)
 often tremendous reduction in routing complexity
 in essence not limited to "data path" regularity


1998: fixing (and maximizing) time budgets for modules
remove global iteration from synthesis
 fix total path delay
 provide pre-placement and pin positioning data
 enable early retiming, layer assignments, system partitioning
 ensure satisfaction of system timing requirements


this talk:
iteration free synthesis: what is needed?
 trends in chip industry: where do the wires go?
 some directions

iteration free synthesis (silicon compilers)
Ralph Otten
conceptual
design
gate
and net
list
behavioral
synthesis
logic
synthesis
library
foot print
data
preparation
technology
weighted
incidence
wire length and area
timing was structure
an incidental,
minimization
usually surprisingly
good, under
technology
result of a synthesis
flowconstraints
with size as its prime objective
layout
synthesis
iterative timing optimization
Ralph Otten
conceptual
design
behavioral
synthesis
wire loads,
resistances,
data
foot print
critical paths preparation
logic
synthesis
library
technology
timing
optimization
timing
analysis
buffer insertion,
transistor sizing,
fanout trees
layout
synthesis
timing awareness in conventional flows
Ralph Otten
synthesis: uses delay models but has very limited information
resynthesis: accepts additional constraints
and wire load models
layout synthesis: tries to reduce total wire length and area
timing is the (arbitrary) outcome of
• a sequence of optimizations with other objectives
• adding constraints and resynthesis
 bringing it to a local optimum
•adding more constraints and resynthesis
 bringing it to another local optimum
desired:
a flow that satisfies timing constraints exactly whenever possible
TIMING CLOSURE
sutherland's delay formula
Ralph Otten
g
 
 p
f
CL
  broco
 brocp
Cin
Cin  sco
note: the absence of
resistancenbetween
driver and load
ro
R

g: computing effort
s
p: inherent
size independent !
delay
delay
depends on:g/f : effort(parasitic)
C:Lrestoring
•C
function
size
independent
1/f
effort
p  scp
• topology
• device size
if f is kept constant, then delay stays constant
continuously sized networks
Ralph Otten
ca  ga f C
the size of a gate
with constant delay cb  gb f C
varies linearly with the load:
gate size = a f C
gate size  p

cn  gn f C
cx  p
inputs

ca
cb
C
cn
gx f C  a f C
inputs
f, the scaling factor,
is the same for all input
a, the area sensititivity,
is a property of the gate,
that is function,topology,
sizing
continuously sized networks
Ralph Otten
the size of a gate
with constant delay
varies linearly with the load:
j
Cj
gate size = a f C
nij fj c j
ci 
qi  nij fj c j  nik fk ck 
 qi 
 nij fj c j
i
qi
nik fk ck
jfo(i)
k
Ck
in vector notation:
D
c qNf c


 I - N fD c  q


 c  I - N f D -1q
timing closure
Ralph Otten
the size of a gate
with constant delay
varies linearly with the load:
gate size = a f C
Grodstein, e.a. ICCAD, 1995
g
 
 p
f
Sutherland and Sproul:
VLSI, 1991
constant
delay
methodology
gain-based
synthesis
fixed
delays
guaranteed
timing
fixed
timing
performance
planning
synthesis under timing constraints
Ralph Otten
no iterative loop has been created!
conceptual
design
solve the corresponding
leontieff system

behavioral
synthesis

logic
synthesis
c  I - N f D -1q
foot print
library
data
preparation
technology
size
assignment
timing
analysis
area
optimization
insert buffers
to reduce area
layout
synthesis
size assignment
Ralph Otten
synthesis
a vector
+
weighted
incidence
matrix
imposed
capacitances
q
netlist
possibly modified by
inserted buffers
wire
lengths
netlist
restoring
effort
N
f

c  I-N f
N'
D

-1
q
sizes
floorplan
optimization
vector of
effort reciprocals
thatTIMING
is Cin/Cout
GUARANTEED
- for f was fixed
- buffers inserted
for area recovery
only
where
implied
by
enough
slack
the
calculated
is available
!
input
capacitances
layout
synthesis
resistive interconnect
Ralph Otten
vtr
RW
Rtr
vst
0.5Cw+Cp
c.l
0.5CW
Cw
  R tr Cp  Cw   R w

2
C
 rocg  1 R w Cin w  rocp 
2
Cin
 g  r  h  p


r is not size independent
problem 1:
how to cope with resistive interconnect
while their delay models cannot be made size independent?
the new synthesis problem
Ralph Otten
logic synthesis is to provide an initial netlist
and the restoring effort 1/f for every gate !
how can synthesis be guided to produce networks
that lead to "fast enough" implementations?
• sutherland's principle of uniform stage effort
• brayton's uniform stage delay
• technology mapping for speed
wire planning
problem 2:
how can we prevent synthesis from generating networks
that preclude satisfying timing constraints,
while timing correct networks exist?
problem 1:
how to cope with resistive interconnect
while their delay models cannot be made size independent?
synthesis with wire planning
Ralph Otten
conceptual
design
no iterative loop has been created!
timing budgets
behavioral
synthesis
wire
planning
logic
synthesis
library
foot print
preplacement
pin size
assignment
assignment
layer
assignment
wire structures
timing
analysis
data
preparation
technology
area
optimization
layout
synthesis
global wire theory
Ralph Otten
global wires are interconnects whose delay can be improved
by inserting restoring circuitry
assumptions



global interconnections are always point-to point wires
first moment matching is accurate enough
restoring circuits are modeled with sakurai's first order model

the length of a section , the critical length, depends
on the wiring layer, but not on the buffer size ,
and tends to be constant when measured in feature sizes

the delay of an optimally segmented line is linear in its length,
path delay is therefore independent of the position
of the restoring circuits on the path

the delay of a section of an optimally buffered line
is the same for all layers
wire planning considerations
Ralph Otten

the definition of global wires creates a two-level hierarchy

global wires will be optimally buffered

a wire planning scenario:
allocate delays to global paths
 assign time budgets to modules
 create net lists for the modules
 assign size to all gates


given the path delays, and
convex trade-off between module size and delay,
size optimization is efficiently solvable,
and produces time budgets for each module

logic synthesis has to create net lists for the modules
with given time budgets, and assign restoring efforts
to the gates

size assignment is done by solving the leontieff system
remaining problems
Ralph Otten
optimally buffered lines have fixed input /output capacitance
cin
Cout
problem 3:
optimally buffered lines fix input and output capacitances,
and therefore constrain the total effort along a path,
and thus the delay of that path.
problem 2:
how can we prevent synthesis from generating networks
that preclude satisfying timing constraints,
while timing correct networks exist?
problem 1:
how to cope with resistive interconnect
while their delay models cannot be made size independent?
discrete libraries
Ralph Otten
derivation assumes continuous sizability !
libraries are mostly discrete and offer limited range in sizes
problem 4:
does the fact that libraries are not continuously sizable
defeat timing closure by fixing individual gate delays?
problem 3:
optimally buffered lines fix input and output capacitances,
and therefore constrain the total effort along a path,
and thus the delay of that path.
problem 2:
how can we prevent synthesis from generating networks
that preclude satisfying timing constraints,
while timing correct networks exist?
problem 1:
how to cope with resistive interconnect
while their delay models cannot be made size independent?
some problems of timing closure
Ralph Otten
problem 5:
can the efficiency of load independent mapping for speed
be advantageous under a constant delay methodology?
problem 4:
does the fact that libraries are not continuously sizable
defeat timing closure by fixing individual gate delays?
problem 3:
optimally buffered lines fix input and output capacitances,
and therefore constrain the total effort along a path,
and thus the delay of that path.
problem 2:
how can we prevent synthesis from generating networks
that preclude satisfying timing constraints,
while timing correct networks exist?
problem 1:
how to cope with resistive interconnect
while their delay models cannot be made size independent?
are wires plannable?
Ralph Otten

iteration free synthesis
size assignment to achieve proper timing

resistive interconnect and guiding synthesis
wire planning
for that we need:

a solid basis for wire planning
pin placement for detour free routing
 valid retiming
 early layer assignment
.......

wire plans
Ralph Otten

a wire plan for a functional network is
a position for each of its function nodes, and
a pin assignment for all its primary inputs and outputs

a global wire plan is a wire plan
of which all arcs represent global wires, and
will be laid out as optimally buffered lines.

a wire plan is monotonic if
all its arcs can be laid out such that
the L1-length of every directed path in the network
is equal to the L1-distance between its end points

given a pin assignment, no global wire plan is faster than
a monotonic wire plan (if functions have fixed delays)

given a pin assignment, monotonic wire plans have
the least wire capacitance
wire plans for given pin assignment
Ralph Otten

the inbox of a node is the smallest iso-rectangle
containing its support

the outbox of a node is the smallest iso-rectangle
containing its range

a bridge of a node is a minimum L2-length line
connecting the inbox and the outbox
a functional network
has a monotonic wire plan
with respect to
a given pin assignment
fif
every node has
one and only one bridge
existence criterion
Ralph Otten
the existence of monotonic wire plan
of a functional network for a given pin assignment
can be checked on a node-by-node basis:

its in- or outbox is a single point

its inbox and outbox are perpendicular iso-lines

its outbox is in the projection of the inbox
a functional network
has a monotonic wire plan
with respect to
a given pin assignment
fif
every node has
one and only one bridge
are wires plannable?
Ralph Otten

iteration free synthesis
size assignment to achieve timing

resistive interconnect and guiding synthesis
wire planning

delay prediction is needed and should be enabled
optimally buffered global interconnect
trends in chip industry
Ralph Otten
many laws in chip industry fit a specific generic form:
dU f (U)

dV h( V )
differential equation with an integral
(solvable by separation of variables)
moore's law
Ralph Otten
[Gordon Moore, 1964]
proportionality constant,
"moore exponent m",
0.2 for processors, and
0.4 for memory
static memory
intel microprocessors
number of transistors
the growth rate
of chip complexity
will be proportional
to the achieved
complexity to date
1011
1010
109
1G
256 M
108
64 M
107
4M
106
1M
256 K
105
104
64 K
16 K
4K
1K
dN
N
dt
103
70
80
90
00
year
N=numerical complexity of the module (e.g. the chip)
rent's rule
Ralph Otten
the growth rate
of the terminal count
with the complexity
of the module
will be proportional
to the average
number of terminals
per submodule
[Landman, Russo, 1971]
proportionality constant,
"rent exponent r",
dT T

dN N
T(N) = the number of terminals of a module
with numerical complexity N
N=numerical complexity of the module (e.g. the chip)
rent’s curves
Ralph Otten
10,000
r=0.25
K=82
board level
high performance computers
1,000
gate arrays
r=0.5
K=1.9
chip level
microprocessors
100
r=0.63
K=1.4
r=0.45
K=0.82
r=0.12
K=6
static ram
dT 10T
 100
dN N
r=0.1
K=4
dynamic ram
1,000
10,000
100,000
1,000,000
[Bakoglu, 1987]
process exponents
Ralph Otten
the reduction rate
of device sizes
will be proportional
to the achieved
device size
[Status2000,ICE, 2000]
[ mm]
101
10 0
proportionality constants
are pretty close in value, 10-1
and will be called
the "process exponent p",
10 -2
dL
 L
dt
 3 atom
layers
10-3
1960
1970
1980
1990
Year
2000
2010
straverius laws
Ralph Otten
many laws in chip industry have generic form:
dU f (U)

dV h( V )
differential equation with an integral
(solvable by separation of variables)
dN
N
dt
moore's law on chip industry
dN N

dT T
rent's rule on intra-module communication
dL
 L
dt
observed miniaturization in chip technology
there are many more!!!
another old rule
Ralph Otten
massive
memory
machines
how primary memory should be supplied
to a processor with a given speed
memory
size
(Mb)
cray 2
1k
pentium IV
vax 11
amdahl's
constant
cray 1
80486
1
massive
parallel
machines
ibm 360
1
1k
processor
speed( MIPS)
in a balanced computer system
the size of primary memory in bytes
is close to the number of instructions per second [Richard P. Case, 60's]
memory-to-compute ratio
Ralph Otten
to rebalance
the system
memory has
to be extended
downscaling makes
memory (by Sm) and
processor (by Sc)
smaller
M(tO)
C(tO)
Mt o 
 t o  
Ct o 
SmM
ScC
processing became
A times faster
due to downscaling
down scaling forces
the memory-to-compute ratio 
to increase very fast !!!
[Paul Stravers, 2000]
M(t)
C(t)
Mt 
 t  
Ct 
(t )
Sc s
A
 b
(t o )
Sm L
(b  1)
d  dL sbp

 b
dt L dt
L
buffer area under global wire assumptions
Ralph Otten
ro/s
r.l /n
c.l
n
cos
2
co
rc
l


T(l,n, s)  brocon  b r s  rco s   a n


T
r buffer area
note:

 b r co  c o2  l  0
is independent
of
s

s 
wire resistance
2
T
rc
l
 broco  a 2  0
n
n
lcrit 
l
nopt

b ro c o
ar c
  sopt / lcrit 
ro c
r co
sopt 
roc
rco
ar c
b ro c o
 cc
o
a
b
lmax
buffer area   NI 
l  P(l)  dl
lcrit
wire length distribution
Ralph Otten
P(l), the wire length distribution, is usually obtained
by requiring that rent's rule must be satisfied

donath-feuer:


sastry-parker:


pareto-levy distribution
weibull distribution
P(l)  g  l2r 3
P(l)  g  r  lr 1  exp( g  lr )
davis-de-meindl:

explicit (long) formulas separate for two regions
lmax
buffer area   NI 
l  P(l)  dl
lcrit
relative buffer area
Ralph Otten
r =0.75
r =0.75
0.4
total buffer area / die area
total buffer area / die area
0.5
0.3
0.2
0.1
0.25
10 -1
r =0.63
10
-2
r =0.45
r =0.63
-3
••
0.15
10
•
0.20
•
r =0.55
•
•
0.10
•
• • r =0.55
•• • r =0.45
0.05
L
5.10 -2 mm
using formulae of davis-de-meindl
10 -1
2.10 -1
L
lmax
buffer area   NI 
l  P(l)  dl
lcrit
are wires plannable?
Ralph Otten

iteration free synthesis
size assignment to achieve timing

resistive interconnect and guiding synthesis
wire planning

delay prediction is needed and should enable wire planning
optimally buffered global interconnect

the memory share of a balanced processor
chip area will increase very fast with scaling
new architectures
optimal buffering
forces almost all functionality from a single layer chip
new technologies

multilayer integration
Ralph Otten
main disadvantage:
early layers have to go multilayer integration
through many cycles
growing
main disadvantage:
poor alignment of
inter-layer via's
stacking
vertical
integration
seeding
layer
growth
recrystallization
the true
3D integration
already tried
before 1980
sidewall
metallization
film
transfer
benefits
Ralph Otten

global interconnect length considerably reduced

folding datapaths over layers and determining optimum
crossing points can shorten cycle time

much smaller total footprint for the same functionality

different technologies for different layers are feasible
why not fully exploited today ?

industry sustained its miraculous growth up to now without it

technological feasibility for vlsi only shown recently

economical feasibility not yet proven

virtually no adequate cad-support

no design experience with multilayer integration
possible layer dedication
Ralph Otten
buffers, optical receivers, i.o
Si
polyimide
Si O2  Al
processor, first level cache
Si
polyimide
Si O2  Al
second level cache interfaces
Si
polyimide
Si O2  Al
advanced memory technology
optical clock receivers,
line repeaters,
regular i/o [Otten,1980]
processors
(the main heat source),
first level memory
second level cache
for performance improvement
[M.B. Kleiner, S.A.Kühn, P. Ramm,
W.Weber, 1995]
Si
high density
advanced
memory technology
thermal analysis
Ralph Otten
mm
buffers, optical receivers, i.o
Si
400
polyimide
Si O2  Al
350
Si
processor, first level cache
polyimide
Si O2  Al
second level cache interfaces
Si
300
polyimide
Si O2  Al
advanced memory technology
Si
ºC
250
0.5
1.0
1.5
temperature increase
[M.B. Kleiner, S.A.Kühn, P. Ramm, W.Weber, 1995]
are wires plannable?
Ralph Otten

iteration free synthesis
size assignment to achieve timing

resistive interconnect and guiding synthesis
wire planning

delay prediction is needed and should enable wire planning
optimally buffered global interconnect

the memory share of a balanced processor
chip area will increase very fast with scaling
new architectures
optimal buffering
forces almost all functionality from a single layer chip
new technologies
multilayer integration
new theories
may ease all of the above


today we are far from plannable wiring!
are wires plannable?
Ralph H.J.M. Otten
Eindhoven University of Technology
Eindhoven, The Netherlands
[email protected]
Giuseppe S. Garcea
Delft University of Technology
Delft, The Netherlands
[email protected]