Transcript slides
Enhancing Random Access Scan for Soft Error Tolerance Fan Wang* Vishwani D. Agrawal Department of Electrical and Computer Engineering, Auburn University, AL 36849 *Now with Juniper Networks, Inc. Sunnyvale, CA, 94086 42nd IEEE Southeastern Symposium on System Theory, March, 2010 1 Motivation for This Work • Recent work on random access scan (RAS) has shown its advantages in reducing test time, test volume and test power over serial scan (SS). • The RAS structure can also improve the fault tolerance ability in both normal function mode and test mode. 2 Outline • Background • Review of RAS design • Soft error tolerance of RAS • A new scan-out structure • Further enhancing error tolerance using RAS structure • Conclusion 3 Soft Errors • Soft errors are caused by the operating environment. • They are not due to permanent hardware faults. • Soft errors are intermittent or random, which makes their testing unreliable. • One way to deal with soft errors is to make hardware robust: – Capable of detecting soft errors – Capable of correcting soft errors – Both measures are probabilistic 4 Effect on Digital Circuit Charged Particles Combinational Logic Flip-flops IN Flip-flops Charged Particles OUT CK M. Nicolaidis (Editor), Soft Errors in Modern Electronic Systems, Springer, 2010. 5 Random Access Scan (RAS) • Testing requires that flip-flops be controllable and observable. Two methods are: – Serial scan (SS) using shift register – Random access scan (RAS) using memory-like addressing • RAS reduces test application time and test power, which are otherwise complementary objectives in SS. • Previous and current publications on RAS: • Ando, COMPCON-80 • Wagner, COMPCON-83 • Ito, DAC-90 • Bushnell & Agrawal, textbook, pp. 484-485 • Mudlapur et al., ITC-05 • Saluja et al., VLSI Design-04, ITC-05, ATS-05, VLSI Design-06, VLSI Design-10. 6 Background Error tolerant computing techniques are characterized by the level of reliability: 1. Device level error tolerance techniques either increase the device critical charge or decrease the collected charge to reduce SER 2. Circuit or system level error tolerance techniques include error detection and correction (EDAC) codes and time/space redundancy. 7 BISER Design With C-Element S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft-Error Resilience,” Computer, vol. 38, no. 2, pp. 43-52, February 2005. 8 The “Toggle” RAS Flip-Flop Combinational Logic Data 1M Combinational Logic Data U 0X M To Output BUS S Clock x y RAS-FF √nff Lines Row Decoder Address (log2nff) Output BUS Control √nff Lines Column Decoder 9 Natural Soft Error Tolerance of RAS • • For SS, soft error can be induced on each SFF as it transports test data to output. For RAS, only when selected RAS cell has induced error, will the result be affected. 10 SER Analysis • For a 4- cell RAS structure, the SER is N · ( A + Δ)· P · α f • For a 4-cell SS structure, the SER is 4N · A · P · α f Where • N is the particle flux in #particles · cm-2 · s-1 • A is sensitive area per FF in cm2 • P is probability of SET per strike in a FF • Δ is average area overhead (routing, decoder, etc.) per FF to implement RAS • αf is a temporal derating factor between 0 and 1 11 Fault Tolerant Design Using BISER-RAS Copy R1 R11 R12 R13 RAS FF11 RAS FF12 RAS FF13 RAS FF14 C- x1 Copy R2 R21 R22 RAS FF11 RAS FF12 RAS FF13 ck R23 RAS FF14 C- x2 Copy R3 RAS FF11 R31 R32 RAS FF12 RAS FF13 To Next Level R33 RAS FF14 C- x3 y1 y2 y3 y4 12 Hardware overheads comparison for ISCAS’89 Circuit # FF # Gate s208 8 S349 Ckt BISER design overhead (%) TMR design overhead (%) SS RAS Reduction SS RAS Reduction 112 87.41 82.41 5.09 283.33 97.14 186.19 11 176 80.77 71.67 9.10 261.54 83.26 178.28 s510 6 211 46.49 46.45 0.04 150.55 55.49 95.06 s1196 18 529 53.31 43.63 9.68 172.64 49.61 123.02 s1494 6 647 17.82 17.81 0.02 57.71 21.27 36.44 s5378 179 2779 82.27 53.45 28.82 266.40 56.38 210.02 s9234 221 5597 57.49 37.00 20.49 186.17 38.88 147.28 s13207 638 7951 93.49 57.30 36.19 302.73 59.06 243.67 s15850 534 9772 74.21 45.77 28.44 240.29 47.30 192.99 s35932 1728 16065 08.83 164.93 43.90 352.39 66.18 286.21 s38417 1636 22179 89.15 53.25 35.90 288.66 54.30 234.36 s38584 1426 19253 89.36 53.54 35.82 289.34 54.67 234.68 73.14 52.14 20.51 236.83 57.55 179.28 Average 13 Conclusion • The RAS design has a natural soft error tolerance capability that is inherited from its unique structural and operation. • In a circuit with N FFs, the SER of RAS can be nearly 1/N that for the SER of SS. • The BISER-RAS can save on average 20.51% hardware over BISER applied to SS, and TMR-RAS saves on average of 179.28% over TMR-SS for ISCAS89 benchmarks. 14