Reconguration Based Fault Tolerant Systems Design

Download Report

Transcript Reconguration Based Fault Tolerant Systems Design

Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches

Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010

Outline

► ► ► ► ► ► Introduction FPGAs and SEU Reconfiguration based Fault-Tolerant designs   Improved testing FT structures based on partial reconfiguration   High-performance FT design Tranzistor & gate level reconfiguration Flash-based FPGAs Reconfigurable Electronics for Space Conclusion

Introduction

► SRAM-Based FPGA ► FPGA is the most used platform for developing new designs and systems ► FPGA dependability and reliability are most discussed issues

FPGAs and SEU

► ► ► FPGA is sensitive to natural radiation effects, the most discussed ones are so called Single Event Upsets SEU can impact FPGA in different ways:  Change of conguration memory   Generated pulse on interconnection Causing Latch-up   Affecting non-programed part of FPGA Affecting clock domain distribution Different situation requires specific solution

Reconfiguration based Fault-Tolerant designs ► Fault-Tolerant desing = redundancy ► Redundancy serves only for a given time ► We have to use reconfiguration to keep FPGA’s FT parameters ► There are different ways how we can use reconfiguration to achieve FT design

Improved testing I.

► Testing is important part of dependable design flow ► Testing allows us to:    Prove design right functionality Localize Faults Prevent latent Faults

Improved testing II.

► BIST architecture based on reconfiguration ► Improved Test Access Mechanism ► Can obtain high overhead caused by bus macros Picture from: Rozkovec, M., Novak, O., “Structural test of programmed FPGA circuits"

FT structures based on partial reconfiguration I.

► Reconfiguration allows various options how to implement FT design ► Basic idea is to divide design in smaller parts which can be reconfigured/replaced ► Smaller the parts bigger the overhead is, we need to find trade-off

FT structures based on partial reconfiguration - app. A* ► Each application divided into many small so called partial reconfigurable modules ► Reconfiguration supervised by partial reconfigurable controller ► Good fault localization, fault impacts smaller area of design, can obtain high HW overhead (bus macros), synchronization issues after reconfiguration *) Straka M., Kastil J., Kotasek Z., “Fault Tolerant Structure for SRAM-based FPGA via Partial Dynamic Reconguration"

FT structures based on partial reconfiguration - App. B* *) Borecky J., Kohlik M., Kubatova H., Kubalik P., “Fault Coverage Improvement based on Fault Simulation and Partial Duplication"

FT structures based on partial reconfiguration - App. B ► Fault impacts relatively big part of design ► Obtained HW overhead is smaller ► Synchronization after reconfiguration has to be solved

FT structures based on partial reconfiguration - App. C* ► Self-Repair Dual FPGA architecture used ► Design divided into columns, spares columns allow Self-Repair ability ► Soft microcontroller evaluates flags from second FPGA, in case of error, faulty FPGA is reconfigured by another one ► Obtaining good trade-off between overhead and fault localization ► Using same bit stream in both FPGA can be risky *) S. Mitra, W.-J. Huang, N. R. Saxena, S.-Y. Yu, E.J. McCluskey, “Recongurable Architecture for Autonomous Self Repair"

High-performance FT system*

► SEU dosage varies with place on the orbit = we can use reconfiguration to switch modes ► When lower density of SEU we can switch to High-performance or power-safe mode ► Using High-Performance mode speeds-up computation by 2.3x compared to use of standard TMR *) Jacobs, A., George, A.D., Cieslewski, G.,”Recongurable fault tolerance: A frame-work for environmentally adaptive fault mitigation in space”

Transistor and gate level reconfiguration* ► Reconfiguration is performed on transistor/gate level ► Redundant N/P diffusions can tolerate faults in silicon VDD VDD in1 in2 GND a) out in1 in2 b) out

redundant transistors

GND

Transistor and gate level reconfiguration ► Replacing whole faulty gate ► Obtained HW overhead is between 30-120% ► Requires supervising in layout Taken from: H. T. Vierhaus, "Transistor and Gate Level Self Repair for Logic Circuits"

Flash-based FPGA

► Configuration stored in Flash memory ► Alternative platform to develop FT design ► Intrinsically SEU hard configuration memory ► Slower then SRAM-based FPGAs ► Higher voltage required to perform programming

Reconfigurable Electronics for Space* ► NASA Rovers on MARS ► On board Xilinx FPGA ► Reconfiguration performed by ASIC Analog/Digital SRAAs ► FPGA implements digital interface between PC and Proto Board *) Didier Keymeulen, "Self-Repairing and Tuning Recongurable Electronics for Space"

Conclusion I.

► Reconfiguration allows us to created FT design in FPGA ► Reconfiguration based systems fight high area overhead ► Synchronization issues is mostly overlooked, but it has to be solved

Conclusion II.

► FPGA reconfiguration for space applications is due to harsh environment unreliable ► Most approaches don’t take into account industrial requirements ► Areas like aerospace or railway can benefit from reconfiguration

Thank you for your attention