Transcript slides

Enhancing Random Access Scan for
Soft Error Tolerance
Fan Wang*
Vishwani D. Agrawal
Department of Electrical and Computer Engineering,
Auburn University, AL 36849
*Now with Juniper Networks, Inc. Sunnyvale, CA, 94086
42nd IEEE Southeastern Symposium on System Theory, March, 2010
1
Motivation for This Work
• Recent work on random access scan (RAS)
has shown its advantages in reducing test
time, test volume and test power over serial
scan (SS).
• The RAS structure can also improve the
fault tolerance ability in both normal
function mode and test mode.
2
Outline
• Background
• Review of RAS design
• Soft error tolerance of RAS
• A new scan-out structure
• Further enhancing error tolerance using
RAS structure
• Conclusion
3
Soft Errors
• Soft errors are caused by the operating
environment.
• They are not due to permanent hardware faults.
• Soft errors are intermittent or random, which
makes their testing unreliable.
• One way to deal with soft errors is to make
hardware robust:
– Capable of detecting soft errors
– Capable of correcting soft errors
– Both measures are probabilistic
4
Effect on Digital Circuit
Charged
Particles
Combinational
Logic
Flip-flops
IN
Flip-flops
Charged
Particles
OUT
CK
M. Nicolaidis (Editor), Soft Errors in Modern Electronic Systems, Springer, 2010.
5
Random Access Scan (RAS)
• Testing requires that flip-flops be controllable
and observable. Two methods are:
– Serial scan (SS) using shift register
– Random access scan (RAS) using memory-like
addressing
• RAS reduces test application time and test
power, which are otherwise complementary
objectives in SS.
• Previous and current publications on RAS:
• Ando, COMPCON-80
• Wagner, COMPCON-83
• Ito, DAC-90
• Bushnell & Agrawal, textbook, pp. 484-485
• Mudlapur et al., ITC-05
• Saluja et al., VLSI Design-04, ITC-05, ATS-05, VLSI Design-06,
VLSI Design-10.
6
Background
Error tolerant computing techniques are
characterized by the level of reliability:
1. Device level error tolerance techniques
either increase the device critical charge or
decrease the collected charge to reduce SER
2. Circuit or system level error tolerance
techniques include error detection and
correction (EDAC) codes and time/space
redundancy.
7
BISER Design With C-Element
S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with
Built-In Soft-Error Resilience,” Computer, vol. 38, no. 2, pp. 43-52, February 2005.
8
The “Toggle” RAS Flip-Flop
Combinational
Logic Data
1M
Combinational
Logic Data
U
0X
M
To Output
BUS
S
Clock
x
y
RAS-FF
√nff Lines
Row Decoder
Address (log2nff)
Output
BUS
Control
√nff Lines
Column
Decoder
9
Natural Soft Error Tolerance of RAS
•
•
For SS, soft error can be induced on each SFF as it
transports test data to output.
For RAS, only when selected RAS cell has induced
error, will the result be affected.
10
SER Analysis
•
For a 4- cell RAS structure, the SER is
N · ( A + Δ)· P · α f
• For a 4-cell SS structure, the SER is
4N · A · P · α f
Where
• N is the particle flux in #particles · cm-2 · s-1
• A is sensitive area per FF in cm2
• P is probability of SET per strike in a FF
• Δ is average area overhead (routing, decoder,
etc.) per FF to implement RAS
• αf is a temporal derating factor between 0 and 1
11
Fault Tolerant Design Using BISER-RAS
Copy R1
R11
R12
R13
RAS
FF11
RAS
FF12
RAS
FF13
RAS
FF14
C-
x1
Copy R2
R21
R22
RAS
FF11
RAS
FF12
RAS
FF13
ck
R23
RAS
FF14
C-
x2
Copy R3
RAS
FF11
R31
R32
RAS
FF12
RAS
FF13
To Next
Level
R33
RAS
FF14
C-
x3
y1
y2
y3
y4
12
Hardware overheads comparison for ISCAS’89 Circuit
#
FF
#
Gate
s208
8
S349
Ckt
BISER design overhead (%)
TMR design overhead (%)
SS
RAS
Reduction
SS
RAS
Reduction
112
87.41
82.41
5.09
283.33
97.14
186.19
11
176
80.77
71.67
9.10
261.54
83.26
178.28
s510
6
211
46.49
46.45
0.04
150.55
55.49
95.06
s1196
18
529
53.31
43.63
9.68
172.64
49.61
123.02
s1494
6
647
17.82
17.81
0.02
57.71
21.27
36.44
s5378
179
2779
82.27
53.45
28.82
266.40
56.38
210.02
s9234
221
5597
57.49
37.00
20.49
186.17
38.88
147.28
s13207
638
7951
93.49
57.30
36.19
302.73
59.06
243.67
s15850
534
9772
74.21
45.77
28.44
240.29
47.30
192.99
s35932
1728
16065
08.83
164.93
43.90
352.39
66.18
286.21
s38417
1636
22179
89.15
53.25
35.90
288.66
54.30
234.36
s38584
1426
19253
89.36
53.54
35.82
289.34
54.67
234.68
73.14
52.14
20.51
236.83
57.55
179.28
Average
13
Conclusion
• The RAS design has a natural soft error
tolerance capability that is inherited from its
unique structural and operation.
• In a circuit with N FFs, the SER of RAS can
be nearly 1/N that for the SER of SS.
• The BISER-RAS can save on average 20.51%
hardware over BISER applied to SS, and
TMR-RAS saves on average of 179.28% over
TMR-SS for ISCAS89 benchmarks.
14