A Highly Testable Pass Transistor Based Structured ASIC Design Methodology

Download Report

Transcript A Highly Testable Pass Transistor Based Structured ASIC Design Methodology

A Highly Testable Pass
Transistor Based
Structured ASIC Design
Methodology
Kanupriya Gulati
Nikhil Jayakumar
Sunil P. Khatri
Motivation for Structured ASICs
Process (microns)
2.0
0.8
0.6
0.35
0.25
0.18
0.13
0.1
Single Mask Cost ($K)
1.5
1.5
2.5
4.5
7.5
12
40
60
# of Masks
12
12
12
16
20
26
30
34
Mask Set cost ($K)
18
18
30
72
150
312
1000
2000


A full set of lithography
masks can cost between
$1-3M.
Roughly 25% reduction
in ASIC design starts in
past 7 years. [Sematech
Annual Report 2002], [
A. SangiovanniVincentelli “The Tides
of EDA”, keynote talk,
DAC 2003].
Our Solution

Use a regular array of pass transistor logic based if-thenelse (ITE) cells with flip-flops along the edges of the die
as the underlying circuit structure.



Stock such arrays pre-processed up until metallization step
Or, use previously generated masks for all other layers and use
new masks for only METAL, VIA layers.
To create an ASIC for a given design – technology-map
this design to the smallest available array.

Only METAL and VIA masks require changes.
Advantages

Can share masks for several layers.


No need for the designer to worry about DFM issues.




Reduces NRE.
Improved yield.
New designs can be implemented faster.
Task of engineering change simplified – design
modification requires only METAL, VIA mask
changes.
Generating test patterns for such a design is easy.


100% test coverage in time linear in the size of the
network
No redundant faults in the design.
The Gap between FPGA and ASIC
FPGA



ASIC
Low speed
High Power
Cost-effective for low
volume products




What bridges the gap?
High Speed
Low Power
Cost-effective for high
volume products
Necessary for products
requiring high
performance or low
power.
Taxonomy of Regular Logic Fabrics


As we move further away from
Standard cell (ASIC), we lose
 Area
 Speed
 Power
As we move closer to FPGAs, we gain
 Flexibility
 Lower NRE
Our Approach
“Exploring Regular Fabrics to Optimize the Performance-Cost Trade-off ” – L.
Pillegi et.al.
Overview



Convert a logic netlist to a partitioned Reduced
Order Binary Decision Diagram (ROBDD).
Each ROBDD node is implemented as an ITE
cell.
Place these ITE cells in an area and delay
efficient manner on a pre-fabricated array of
ITE cells.
ITE Cell Structure
out
out



i
i
T
E
Used NMOS pass-gate based
structure.
Each ITE cell generates
buffered output and its
complement.
Delay of NMOS pass-gate
ITE cell was found to be
similar to that of CMOS passgate based ITE cell with a
smaller area.

Probably due to the increased
diffussion capacitance in CMOS
pass-gates.
ITE Cell Design

VDD

i1
i1

i2
i2
i3
i3

GND
MUX control signals run along
the length of the cell.
Each ITE cell has 3 variable
signals and three complemented
variable signals running
horizontally in metal 3.
Appropriate placement of
stacked vias at the horizontal
metal 3 wires allows the ITE cell
to be connected to any one of
the 3 variables in the
corresponding row of the array.
Metal layers 1 and 2 used for
most of the layout, metal layer 3
used to route variables and their
complement.
Synthesis – Partitioned ROBDD


Synthesis of logic netlist into a partitioned ROBDD structure
done in VIS.
Primary input variables are ordered using a DFS ordering.




Do bottom up construction of ROBDDs
Let set of variables in ROBDD manager be V (initially PIs).
If size of any ROBDD > user-specified threshold ‘ B ’


Introduce new variable v (intermediate ROBDD variable) and continue
building ROBDDs on a set of variables V U v.
Results in a series of ROBDDs


Enable dynamic variable ordering before building ROBDDs
Size of each ROBDD bounded by B.
Output of these ROBDDs represent either a primary output or
an intermediate ROBDD variable.
Example
z
z
y2
y1
y2
y1
x1
x2
x3
x1



x2
x3
x4
x4
Given multi-level logic network with primary inputs {x1,x2, x3,x4}
As bottom-up ROBDD construction proceeds, new variables y1 and y2 are
created.
Z is built in terms of {y1, y2}
Placement

First Replicate ITE cells whose outputs are heavily loaded in
order to limit fanout


Correspond to ROBDD nodes with high in-degrees.
If in-degree of ROBDD node = k, then replicate this node  k   1
 K 
times.


we use K = 3
Compute initial estimate of number of ITE cells ‘ n ’ in any row
of the ITE array and number of rows ‘ m ’ of the ITE array as
follows:
xn  ym
nm  N
where, x = width of each ITE cell
y = height of each ITE cell
N = total number of ITE cells
Placement

Sort the N ITE cells in increasing order of their
ROBDD variable index.

Variable index is a measure of closeness of variable
to the root of ROBDD.


A variable closer to the root has smaller index than one
further from the root.
Assign ITE cells to rows of the ITE array
Assigning ITE cells to rows

Level 2
Level 3
Cost(b) = 3 – 3 = 0
Level 4
a
Cost(a)
Level =
5 5 – 2 =3
b
If there are nj ITE cells with
variable index vj such that nj >
n (n = number of ITE cells
that can fit in one row)


ITE cells need to span  nn  rows.
 
Sort these nj cells in decreasing
order of cost – C.
j
C  i , j [index (c)  index (c j )]  [index (ci )  index (c)]
 ci
= children of node c
 cj = parents of node c
Level 6

Helps keep routes short.
Assigning ITE cells to rows

If there are nj ITE cells with variable index vj
such that nj < n
Attempt to populate corresponding row of the ITE
array with additional ITE cells with variable index
vj+1
 If row is still not full, add ITE cells with variable
index vj+2 as well.
 Each row can hold ITE cells which depend on at
most 3 variables since the number of variables that
can be routed over any ITE cell is 3.

Placement of ITE cells within rows




ITE cells are arranged within rows to reduce
crossings in the induced circuit graph (after
planarization of the array of ITE cells).
Use DOT (graphviz.org) to do this.
DOT only re-arranges cells in each ITE row in a
manner that minimizes graph crossings.
DOT is not allowed to modify the assignment
of ITE cells to rows.
Implementing Sequential Designs


Each row of ITE cells has a bank of 3 flipflops.
Outputs of the flops can drive one of the inputs
by means of a METAL and VIA mask change.
Route


Use WROUTE (in
Cadence’s Silicon
Ensemble for DSM)
to route the ITE cell
array.
Use 4 metal layers
for the route.
Example: alu2
Summary of Design Flow





Convert netlist to partitioned ROBDD in VIS.
Perform cell replication if required to limit fanout.
Perform ITE cell assignment to rows.
Re-arrange ITE cells within rows using DOT to
minimize crossings in the graph induced by the
interconnections among the ITE cells.
Use the result of DOT as the final placement and
perform routing using WROUTE (or any other routing
tool).
Ease of Testability

In traditional scanned standard-cell based
circuits


ATPG problem is NP complete.
In our scanned ITE cell based approach

In functional mode


Partitioned ROBDD outputs are regular inputs to other
partitions.
In test mode

Primary inputs and the outputs of each partition are
scanned in to allow independent testability of the
different partitions.
Abstract View of Partitioned ROBDDs
z
y2
x5
x9
x6
x3
x4
y2
y1
x1
x2
x3
x4
PO
...
..
PIs
Additional
Scan-able
nodes
Ease of Testability - Excitation
ROBDD of
f
Tx  stuck at v
f
 (x  v )  ( )
x
x
v
 Path from x to v
 Linear time BDD operation
ROBDD of
f
x
Ease of Testability Propagation
Tx  stuck at v
f
 (x  v )  ( )
x
 Path from f to x
 Again a Linear time BDD operation
v
•Support variables for both conditions are
Non-Overlapping !!
•Circuit is guaranteed irredundant
•100% stuck fault coverage guaranteed in time
linear in the size of the circuit.
Experiments

To compare with standard-cell based design, the circuits
were mapped to a library of 20 gates.



Used SIS for optimization (script.rugged) and map.
Placement and routing done using SEDSM using 0.1um
process and 4 metal layers.
Delay of standard-cell based designs:


Pre-characterized the library using SPICE (0.1um BPTM)
Used sense package in SIS

“sense” returns longest sensitizeable path (false paths implicitly
ignored)
Experiments


Partitioned ROBDD construction done using the
“frontier method” in VIS.
Tried the following different partitioning threshold
numbers (B).



5, 10, 15, 20 and 1000.
For each circuit, the result that yielded the smallest
number of ROBDD nodes was selected.
This partitioned ROBDD structure was then taken
through our design flow.
Experiments

Delay of ITE cell array:


Found by traversing longest topological path (in terms of
number of ITE cells) between any circuit PI and PO
Delay at each ITE cell is given by:
If variable is a primary input:
D(cell) = MAX[ D(leftchild), D(rightchild)] + D(ITE block)
 If variable is an internal node
D(cell) = MAX[ D(variable), D(leftchild), D(rightchild)] + D(ITE block)


D(ITE block) found from SPICE simulations (0.1um BPTM)

Assumed that the ITE cell drove the maximum load allowed – hence
delay estimates are conservative
Results (Combinational designs)



Delay penalty is
~ 2X
Area Penalty is
~ 6X
FPGAs typically
have a 25X delay
penalty and a
10X area penalty.
Ckt.
Evaluation Delay
StdCell
ITE
Area
Ovh
StdCell
ITE
Ovh
alu2
770
500
0.65
1314.1
2560
1.95
alu4
1020
527
0.52
2500
5068.8
2.03
apex6
500
1310
2.57
2678.1
14585.6
5.45
apex7
440
1030
2.34
885.1
4608
5.21
C1908
880
2590
2.91
1827.6
8288
4.53
C3540
1250
3050
2.44
4323.1
29491.2
6.82
C432
930
3070
3.3
715.6
4640
6.48
C499
600
1070
1.78
1827.6
3974.4
2.17
C880
1210
2750
2.27
1463.1
8985.6
6.14
dalu
1110
2460
2.22
3164.1
39916.8
12.62
frg2
810
1700
2.1
2575.6
24441.6
9.49
i8
880
1560
1.77
4064.1
40320
9.92
i9
850
810
0.95
2383.2
14035.2
5.89
t481
720
600
0.83
2626.6
6080
2.31
term1
320
730
2.28
663.1
2355.2
3.55
too_large
510
1550
3.04
1105.6
10560
9.55
vda
650
600
0.92
1508.03
6080
4.03
x1
380
950
2.5
1105.6
9625.6
8.71
x3
510
1660
3.25
2756.25
16844.8
6.11
x4
440
650
1.48
1314.1
11264
8.57
Avg
2.01
6.08
Results (Sequential designs)



Delay penalty is
~ 1.6X.
Area penalty is
~ 3.4X.
FPGAs typically
have a 25X delay
penalty and a
10X area penalty
Ckt.
Evaluation Delay
StdCell
ITE
Area
Ovh
StdCell
ITE
Ovh
s1488
630
650
1.03
3277.6
6240
1.9
s1494
650
600
0.92
3108.1
6400
2.06
s208
270
550
2.04
105.1
1459.2
13.88
s344
390
650
1.67
715.6
2649.6
3.7
s349
410
650
1.59
742.6
2649.6
3.57
s386
290
550
1.9
885.1
2060.8
2.33
s444
380
700
1.84
1105.6
2880
2.6
s510
390
400
1.03
1105.6
3161.6
2.86
s526
330
700
2.12
1314.1
2355.2
1.79
s526n
330
700
2.12
1314.1
2457.6
1.87
s820
560
650
1.16
1827.6
3968
2.17
s832
570
650
1.14
1827.6
3968
2.17
Avg
1.55
3.41
Speed-up of ATPG
Ckt


ATPG is
about 30X
faster for ITE
cell based
circuits.
ITE based
circuits are
guaranteed
irredundant
and 100%
testable in
linear time!!!
Regular ATPG (SIS)
ATPG for ITE
Improve
C1908
0.78
0.02
39.00
C3540
4.84
0.02
242.00
C432
0.1
0.52
0.19
C499
0.32
0.01
32.00
C880
0.16
0.01
16.00
frg2
17.21
0.45
38.24
i8
16.26
0.16
101.63
i9
0.6
0.03
20.00
apex7
0.05
0.04
1.25
x3
1.95
0.19
10.26
apex6
0.94
0.27
3.48
term1
0.56
0.02
28.00
alu2
0.3
0.02
15.00
alu4
1.47
0.47
3.13
too_large
8.83
0.41
21.54
vda
3.42
4.37
0.78
x1
0.26
0.43
0.60
x4
0.32
0.28
1.14
Avg.
31.90
Conclusions



We have a method that can implement circuits quicker
and with NRE amortized over a large number of
designs.
Strikes a reasonable compromise between ASICs and
FPGAs.
An ITE cell based design is easily testable.



Testability gains arise from the use of partitioned
ROBDD based PTL design approach


100% testable in linear time
Guaranteed irredundant
Same gains can be reaped in a regular PTL design approach
Can be modified to efficiently test for other faults

Delay faults, stuck open faults etc.
Questions ?