SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA Mandy M. Wang JPL R&TD Mobility Avionics Wang-110 D/MAPLD.
Download ReportTranscript SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA Mandy M. Wang JPL R&TD Mobility Avionics Wang-110 D/MAPLD.
SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA Mandy M. Wang JPL R&TD Mobility Avionics Wang-110 1 D/MAPLD 2004 Agenda Wang-110 Project Background SEU Sensitive Areas and Mitigation Approaches Design Details Conclusion 2 D/MAPLD 2004 Project Objective Mobility Avionics project aims to develop an embedded platform for space flight instruments and systems that is scalable, configurable, and capable of withstanding low to medium radiation environments. Wang-110 3 D/MAPLD 2004 Multi-Tiered Strategy Not Time Critical Time Critical Science Data Processor Image Processor Motor Control Simple Strategy Science Data Processor Ground Support Equipment Not Mission Critical Wang-110 Orbiter Command Data Handler Micro-Mobility Controller Robust Strategy EDL Controller Always Available Strategy Mission Critical 4 Low to Medium Radiation Tolerance is Assumed D/MAPLD 2004 Strategies Simple Strategy: A quick-and-dirty approach. It uses less than desirable techniques such as device reset and reconfiguration as a means of error correction. It may require an external computer for configuration check. Robust Strategy: A refinement of the simple strategy. It uses a SEU immune FPGA as a monitoring device for the system board base on Xilinx FPGA device. As a result, no external computer is needed. Wang-110 5 D/MAPLD 2004 SEU Sensitive Areas Registers 0.46 (3%) Xilinx Virtex-II Pro SEU sensitive areas include: Configurati on MEM 3.61 (22%) PPC405 Core registers Configuration Memory (LUT equation and Routing) PPC L1 Cache 10.8 (64%) Block SelectRAM 1.78 (11%) Data path Registers User Memory (Block or Distributed RAMs) (XC2VP20) Normalized Data – based on predicted upset rates Wang-110 6 D/MAPLD 2004 Mitigation Approaches Registers 0.46 (3%) Configurati on MEM 3.61 (22%) PPC L1 Cache 10.8 (64%) Block SelectRAM 1.78 (11%) Detection Indicator Mitigation Fault Injection Processor Comparison at the Coreconnect Bus Internal FF Processor Reset Serial port User Memory EDAC Internal FF EDAC Serial port Configuration Memory CRC (None) FPGA reconfiguration Serial port Data Registers TMR (None) TMR (None) Processor Registers Wang-110 7 D/MAPLD 2004 System Design - Overview Serial Port Decoder FI PPC405 1 FI FI EDC FI OCM BRAM (8K) C PPC405 2 FI FI (Injects fault Signals) EXT MEM (128MB) DDR SDRAM Cntl PLB2OPB Bridge UARTs EDC PLB ARB OPB ARB FI EDC Controller PLB BRAMs (Firmware) (32K) Status BRAMs (4K) Crit. INTC Non-Crit INTC Wang-110 8 (External Devices) D/MAPLD 2004 Dual-processor Comparator PPC 405 Block 1 PPC 405 Block 2 Off Chip Area Cache Units MMU CPU Timers and Debug Cache Units FI External SDRAM FI FI FI Timers and Debug PLB IPIF PLB IPIF FI MMU CPU FI FI FI C DDR SDRAM Controller PC PLB Bus Note: Yellow lines: PLB master read / write signals for D-Cache Green Lines: PLB master read signals for I-Cache FI : Fault insertion point Wang-110 PC 9 : Parity Check D/MAPLD 2004 Dual-Processor Voting Simulation Wang-110 10 D/MAPLD 2004 EDAC OCM BRAMs (Read/Write) Hamming Code [32,39] Read-modified-write to support byte enable feature Error information is stored in a separate memory space Single-bit error triggers a CPU interrupt Double-bit error triggers a CPU reset Data Out (discard parity bits) 32 PPC405 #1 FORCE ERROR ENCIN 32 Glue Logic Parity Encoder PARITY_OUT 7 32 ENOUT 32 DECOUT PPC405 #2 Wang-110 Error Detection Correction DECIN 32 BRAMS (8KB) ADDR EN W_EN[3:0] 7 PARITY_IN CLK ERROR 11 Xilinx D/MAPLD 2004 XAPP645 EDAC PLB BRAMs (Read Only) Hamming Code [64,72] Read-modified-write to support byte enable feature Single-bit error is stored in a separate memory space Single-bit error triggers a CPU interrupt Double-bit error triggers a device reconfiguration Data Out (discard parity bits) 64 PLB Interface FORCE ERROR 2 ENCIN 64 Glue Logic Parity Encoder PARITY_OUT 8 ENOUT 64 DECOUT Error Detection Correction ADDR 64 DECIN 64 8 BRAMS (32KB + 8 KB) EN W_EN CLK PARITY_IN PLB BRAM Controller Wang-110 2 ERROR 12 Xilinx D/MAPLD 2004 XAPP645 EDAC DDR SDRAM Hamming Code [64,72] Read-modified-write to support byte enable and burst of 2-words features Single error is stored in a separate memory space Single error triggers a CPU interrupt Double error triggers device reconfiguration PLB interface modules Data Out (discard parity bits) 64 Glue Logic ENCIN 64 Parity Encoder 64 4 Mux 32 ENOUT 64 DECOUT DDR SDRAM Controller Wang-110 FORCE ERROR 2 PARITY_OUT 8 32 Mux Error Detection Correction DECIN 64 8 32 Demux 4 DDR SDRAM (128MB + 32MB) ADDR CLK CLKn PARITY_IN 2 ERROR 13 Xilinx D/MAPLD 2004 XAPP645 Self Configuration Checker Digital Design top.bit Implementation ICAP Controller top.ll ICAP (contains frame address used for the design) C script Frame Address Memory (BRAMS) Read Back Commands ( 44 Bytes) 4 Bytes (BRAMS) Frame address data formatted for BRAMS CRC Checker Virtex-II Pro Wang-110 This portion can be ported to a radiation-hardened FPGA in 14 the case of robust strategy D/MAPLD 2004 Self Configuration Checker Design Highlights No External I/Os access required Frame-by-frame read back required 32-bit CRC algorithm implemented. (A CRC signature is generated after device power up) No SRL16 and Distributed SelectRAMs used in design Wang-110 15 D/MAPLD 2004 Labview Fault Injection Panel Screenshot of fault injection emulator that interfaces with the prototype board. Process Bus Fault Injection Buttons Program counter resets to zero when a CPU reset occurs. ASCII Command Input window Fault Injection Error Counters Processors Mismatch LED Indicator Fault location map Wang-110 16 D/MAPLD 2004 XC2VP20 Device Utilization (without TMR) Number of External IOBs Number of PPC405s Number of RAMB16s Number of SLICEs 57 out of 564 2 out of 2 30 out of 88 4334 out of 9280 10% 100% 34% 46% Number of BUFGMUXs Number of DCMs Number of ICAPs Number of JTAGPPCs 6 out of 16 2 out of 8 1 out of 1 1 out of 1 37% 25% 100% 100% Wang-110 17 D/MAPLD 2004 Slice Utilization (without TMR) 45000 40000 35000 30000 Fault Injeciton Module 93 5.4% Configuration Checker 156 9.0% Dual-processor comparator 178 10.3% OCB EDAC 32-bit Module 233 13.4% PLB EDAC 64-bit Module 467 26.9% Hardware Status Memory Controller 606 35.0% OPB Arbiter 40 2.0% OPB2DCR Bridge 90 4.5% PLB BRAM Controller 163 8.2% OCM BRAM Controller 278 14.0% Interrupt Controller 341 17.2% Uart Transceiver 368 18.6% PLB2OPB Bridge 700 35.4% PLB Arbiter 1005 50.8% 25000 20000 15000 10000 5000 0 VP20 Wang-110 VP40 VP100 D/MAPLD 18 Note: The shaded modules can be replaced by other approach. 2004 Mitigation State Machine 1) OCM BRAM single-bit error 2) PLB BRAM single-bit error 3) DDR SDRAM single-bit error CPU Reset CPU reset counter == full Normal System Reset 1) OPB Bus error 2) PLB Bus error System reset counter == full 1) Configuration check fail 2) PLB EDC double-bit error 3) DDR SDRAM double-bit error Wang-110 19 Mitigation Severity 1) CPU mismatch 2) CPU watchdog timer 3) OCM EDC double-bit error CPU Interrupt FPGA Reconfiguration D/MAPLD 2004 Conclusion Identified and categorized error prone regions on the Virtex-II Pro into four types Developed mitigation strategies for each region. Radiation test on the overall system is in progress. Wang-110 20 D/MAPLD 2004 Acronyms • • • • • • • • Wang-110 SEU : Single Event Upset FPGA: Field Programmable Gate Array LUT: Look Up Table PLB: Processor Local Bus OPB: On-Chip Peripheral Bus OCM: On-Chip Memory EDAC: Error Detect-And-Correct ICAP: Internal Configuration Access Point 21 D/MAPLD 2004