Transcript timing.ppt

EECS 219B
Spring 2003
Timing Optimization
Andreas Kuehlmann
1
Restructuring for Timing Optimization
Outline:
• Definitions and problem statement
• Overview of techniques (motivated by adders)
– Tree height reduction (THR)
– Generalized bypass transform (GBX)
– Generalized select transform (GST)
– Partial collapsing
2
Timing Optimization
Factors determining delay of circuit:
• Underlying circuit technology
– Circuit type (e.g. domino, static CMOS, etc.)
– Gate type
– Gate size
• Logical structure of circuit
– Length of computation paths
– False paths
– Buffering
• Parasitics
– Wire loads
– Layout
3
Problem Statement
Given:
• Initial circuit function description
• Library of primitive functions
• Performance constraints (arrival/required times)
Generate:
an implementation of the circuit using the primitive functions, such
that:
– performance constraints are met
– circuit area is minimized
4
Current Design Process
Behavioral description
Behavior
Optimization
(scheduling)
Logic and latches
Partitioning
(retiming)
Logic equations
•Gate library
•Perf. Constraints
•Delay models
Logic synthesis
•Technology independent
•Technology mapping
Gate netlist
Timing driven
place and route
Layout
5
Technology Mapping for Delay
Function
tree
Buffer
tree
6
Overview of Solutions for Delay
•
•
Circuit re-structuring
– Rescheduling operations to reduce time of computation
Implementation of function trees (technology mapping)
– Selection of gates from library
• Minimum delay (load independent model - Kukimoto)
• Minimize delay and area (Jongeneel, DAC’00)
(combines Lehman-Watanabe and Kukimoto)
Implementation of buffer trees
– Touati (LT-trees)
– Singh
Resizing
•
Constant delay synthesis
•
•
7
Circuit Restructuring
Approaches:
Local:
• Mimic optimization techniques in adders
– Carry lookahead (THR tree height reduction)
– Conditional sum (GST transformation)
– Carry bypass (GBX transformation)
Global:
• Reduce depth of entire circuit
– Partial collapsing
– Boolean simplification
8
Restructuring Methods
Performance measured by
• levels,
• sensitizable paths,
• technology dependent delays
• Level based optimizations:
– Tree height reduction (Singh ‘88)
– Partial collapsing and simplification (Touati ‘91)
– Generalized select transform (Berman ‘90)
• Sensitizable paths
– Generalized bypass transform (McGeer ‘91)
9
Tree-Height Reduction (THR)
Singh’88:
6
5
n
l
5
1
m
1
4
i
j
Collapsed
Critical region
n’
Critical
region
1
i
k
3
0
0
2
5
1
m
j
h
0
a
Duplicated
logic
1
4
k
3
h
0 0 0
2
0
0
b c d
e
f
g
0
a
0
0 0
2
0
0
b
c d
e
f
g
10
Tree-Height Reduction
3
Collapsed
Critical region
n’
1
i
0
0
2
5
1
Duplicated
logic
m
j
5
1
1
4
New delay = 5
2
1
i
k
3
0
2
1
m
j
0
h
0
a
4
n’
0
0 0
2
0
0
b
c d
e
f
g
0
a
1
4
k
3
h
0
0 0
2
0
0
b
c d
e
f
g
11
Generalized bypass transform (GBX)
• Make critical path false
– Speed up the circuit
• Bypass logic of critical path(s)
McGeer’91:
fm=f
fm+1
fm =f
fm+1
Boolean
difference
… fn=g
… fn=g
dg
__
df
0
g’
1
s-a-0 redundant
12
GBX and KMS transform
GBX gives little area increase, BUT creates an untestable fault
(on control input to multiplexer)
KMS transform: (remove false paths without increasing delay)
• fk is last node on false path that fans out.
• Duplicate false path {f1,…, fk} -> {f’1, … , f’k}
• f’j fans out to every fanout of fj except fj+1, and fj just fans out to fj+1
• Set f0 input to f1 to controlling value and propagate constant (can
do because path is false and does not fanout)
KMS results
– Function of every node, except f1, … ,fk is unchanged
– Added k nodes
– Area added in linear in size of length of false paths; in practice
small area increase.
13
KMS
Keutzer, Malik, Saldanha’90:
fm+1
0
fm+2
fm+k
f’m+1
f’m+2
… f’m+k
fm+1
fm+2
fm+k
fm+k+1
…
fn
Delay is not
increased
fm+k+1
…
fn
14
Generalized select transform (GST)
Berman’90: Late signal feeds multiplexor
a
out
b
c
d
e
f
g
a=0
0
b
out
c
a=1
d
e
f
g
1
b
c
d
e
f
g
a
15
a
c
g
…
b
h
0
g’
1
a
GST vs GBX
GBX
dh
__
da
GBX
a
g
…
b
Note:
Boolean
difference =
a
0
g’
1
a=0
b
h
 ha  h a
a
a=1
c
d
e
f
g
c
d
e
f
g
b
a=0
0 out
b
a=1
c
c
d
e
f
g
c
d
e
f
g
1
GST
b
a
16
h
GST vs GBX
• Select transform appears to be more area efficient
• But Boolean difference generally more efficiently formed in
practice
• No delay/speedup advantage for either transform
• Can reuse parts of the critical paths for multiple fanouts on GST
0
GST
out2
1
a
0
a=0
b
c
d
e
f
g
out1
1
a=1
b
c
d
e
f
g
a
17
Technology Independent Delay Reductions
Generally THR, GBX, GST (critical path based methods) work OK,
– but very greedy and computationally expensive
b
e
t
t
e
r
Why are technology independent delay reductions hard?
Lack of fast and accurate delay models
s
– # levels, fast but crude
l
– # levels + correction term (fanout, wires,… ): a little better,
o
but still crude (what coefficients to use?)
w
– Technology mapped: reasonable, but very slow
e
– Place and route: better but extremely slow
r
– Silicon: best, but infeasibly slow (except for FPGAs)
18
Conclusions
• Variety of methods for delay optimization
– No single technique dominates
– When applied to ripple-carry adder get
– Carry-lookahead adder (THR)
– Carry-bypass adder (GBX)
– Carry-select adder (GST)
– Clustering/Partial collapse
• All techniques ignore false paths when assessing the delay and
critical regions
– Can use KMS transform to eliminate false paths without
increasing delay (area increase however).
19