SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA Mandy M. Wang JPL R&TD Mobility Avionics Wang-110 D/MAPLD.

Download Report

Transcript SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA Mandy M. Wang JPL R&TD Mobility Avionics Wang-110 D/MAPLD.

SEU Mitigation Techniques
for Xilinx Virtex-II Pro FPGA
Mandy M. Wang
JPL R&TD Mobility Avionics
Wang-110
1
D/MAPLD
2004
Agenda
Wang-110

Project Background

SEU Sensitive Areas and Mitigation
Approaches

Design Details

Conclusion
2
D/MAPLD
2004
Project Objective
Mobility Avionics project aims to develop
an embedded platform for space flight
instruments and systems that is scalable,
configurable, and capable of withstanding
low to medium radiation environments.
Wang-110
3
D/MAPLD
2004
Multi-Tiered Strategy
Not
Time
Critical
Time
Critical
Science Data
Processor
Image Processor
Motor Control
Simple Strategy
Science Data
Processor
Ground Support
Equipment
Not Mission
Critical
Wang-110
Orbiter Command
Data Handler
Micro-Mobility Controller
Robust Strategy
EDL Controller
Always
Available Strategy
Mission Critical
4
Low to Medium Radiation Tolerance is Assumed
D/MAPLD
2004
Strategies
Simple Strategy: A quick-and-dirty approach. It
uses less than desirable techniques such as device
reset and reconfiguration as a means of error
correction. It may require an external computer for
configuration check.
Robust Strategy: A refinement of the simple
strategy. It uses a SEU immune FPGA as a
monitoring device for the system board base on
Xilinx FPGA device. As a result, no external
computer is needed.
Wang-110
5
D/MAPLD
2004
SEU Sensitive Areas
Registers
0.46 (3%)
Xilinx Virtex-II Pro SEU
sensitive areas include:
Configurati
on MEM
3.61 (22%)
 PPC405 Core registers
 Configuration Memory
(LUT equation and Routing)
PPC L1
Cache
10.8 (64%)
Block
SelectRAM
1.78 (11%)
 Data path Registers
 User Memory
(Block or Distributed RAMs)
(XC2VP20)
Normalized Data – based on predicted upset rates
Wang-110
6
D/MAPLD
2004
Mitigation Approaches
Registers
0.46 (3%)
Configurati
on MEM
3.61 (22%)
PPC L1
Cache
10.8 (64%)
Block
SelectRAM
1.78 (11%)
Detection
Indicator
Mitigation
Fault Injection
Processor
Comparison at the
Coreconnect Bus
Internal FF
Processor Reset
Serial port
User Memory
EDAC
Internal FF
EDAC
Serial port
Configuration
Memory
CRC
(None)
FPGA reconfiguration
Serial port
Data Registers
TMR
(None)
TMR
(None)
Processor
Registers
Wang-110
7
D/MAPLD
2004
System Design - Overview
Serial Port
Decoder
FI
PPC405
1
FI
FI
EDC FI
OCM
BRAM
(8K)
C
PPC405
2
FI
FI
(Injects fault
Signals)
EXT
MEM
(128MB)
DDR
SDRAM Cntl
PLB2OPB
Bridge
UARTs
EDC
PLB
ARB
OPB
ARB
FI
EDC Controller
PLB BRAMs (Firmware)
(32K)
Status BRAMs
(4K)
Crit. INTC
Non-Crit
INTC
Wang-110
8
(External Devices)
D/MAPLD
2004
Dual-processor Comparator
PPC 405 Block 1
PPC 405 Block 2
Off Chip Area
Cache
Units
MMU CPU
Timers
and
Debug
Cache
Units
FI
External
SDRAM
FI
FI
FI
Timers
and
Debug
PLB IPIF
PLB IPIF
FI
MMU CPU
FI
FI
FI
C
DDR SDRAM
Controller
PC
PLB Bus
Note: Yellow lines: PLB master read / write signals for D-Cache
Green Lines: PLB master read signals for I-Cache
FI : Fault insertion point
Wang-110
PC
9
: Parity Check
D/MAPLD
2004
Dual-Processor Voting
Simulation
Wang-110
10
D/MAPLD
2004
EDAC OCM BRAMs (Read/Write)





Hamming Code [32,39]
Read-modified-write to support byte enable feature
Error information is stored in a separate memory space
Single-bit error triggers a CPU interrupt
Double-bit error triggers a CPU reset
Data Out (discard parity bits)
32
PPC405 #1
FORCE ERROR
ENCIN
32
Glue Logic
Parity
Encoder
PARITY_OUT
7
32
ENOUT
32
DECOUT
PPC405 #2
Wang-110
Error
Detection
Correction
DECIN
32
BRAMS
(8KB)
ADDR
EN
W_EN[3:0]
7
PARITY_IN
CLK
ERROR
11
Xilinx
D/MAPLD
2004
XAPP645
EDAC PLB BRAMs (Read Only)





Hamming Code [64,72]
Read-modified-write to support byte enable feature
Single-bit error is stored in a separate memory space
Single-bit error triggers a CPU interrupt
Double-bit error triggers a device reconfiguration
Data Out (discard parity bits)
64
PLB Interface
FORCE ERROR
2
ENCIN
64
Glue
Logic
Parity
Encoder
PARITY_OUT
8
ENOUT
64
DECOUT
Error
Detection
Correction
ADDR
64
DECIN
64
8
BRAMS
(32KB
+
8 KB)
EN
W_EN
CLK
PARITY_IN
PLB BRAM Controller
Wang-110
2
ERROR
12
Xilinx
D/MAPLD
2004
XAPP645
EDAC DDR SDRAM





Hamming Code [64,72]
Read-modified-write to support byte enable and burst of 2-words features
Single error is stored in a separate memory space
Single error triggers a CPU interrupt
Double error triggers device reconfiguration
PLB interface modules
Data Out (discard parity bits)
64
Glue
Logic
ENCIN
64
Parity
Encoder
64
4
Mux
32
ENOUT
64
DECOUT
DDR SDRAM Controller
Wang-110
FORCE ERROR
2
PARITY_OUT
8
32
Mux
Error
Detection
Correction
DECIN
64
8
32
Demux
4
DDR
SDRAM
(128MB
+
32MB)
ADDR
CLK
CLKn
PARITY_IN
2 ERROR
13
Xilinx
D/MAPLD
2004
XAPP645
Self Configuration Checker
Digital Design
top.bit
Implementation
ICAP
Controller
top.ll
ICAP
(contains frame
address used for
the design)
C script
Frame
Address
Memory
(BRAMS)
Read Back
Commands
( 44 Bytes)
4 Bytes
(BRAMS)
Frame address data
formatted for BRAMS
CRC
Checker
Virtex-II Pro
Wang-110
This portion can be ported to
a radiation-hardened FPGA in
14
the case of robust
strategy
D/MAPLD
2004
Self Configuration Checker
Design Highlights
 No External I/Os access required
 Frame-by-frame read back required
 32-bit CRC algorithm implemented.
(A CRC signature is generated after device
power up)
 No SRL16 and Distributed SelectRAMs
used in design
Wang-110
15
D/MAPLD
2004
Labview Fault Injection Panel
Screenshot of fault injection
emulator that interfaces with
the prototype board.
Process Bus Fault
Injection Buttons
Program
counter resets to
zero when a
CPU reset
occurs.
ASCII Command
Input window
Fault Injection
Error Counters
Processors
Mismatch
LED
Indicator
Fault location map
Wang-110
16
D/MAPLD
2004
XC2VP20 Device Utilization
(without TMR)
Number of External IOBs
Number of PPC405s
Number of RAMB16s
Number of SLICEs
57 out of 564
2 out of 2
30 out of 88
4334 out of 9280
10%
100%
34%
46%
Number of BUFGMUXs
Number of DCMs
Number of ICAPs
Number of JTAGPPCs
6 out of 16
2 out of 8
1 out of 1
1 out of 1
37%
25%
100%
100%
Wang-110
17
D/MAPLD
2004
Slice Utilization (without TMR)
45000
40000
35000
30000
Fault Injeciton Module
93
5.4%
Configuration Checker
156
9.0%
Dual-processor comparator
178
10.3%
OCB EDAC 32-bit Module
233
13.4%
PLB EDAC 64-bit Module
467
26.9%
Hardware Status Memory Controller
606
35.0%
OPB Arbiter
40
2.0%
OPB2DCR Bridge
90
4.5%
PLB BRAM Controller
163
8.2%
OCM BRAM Controller
278
14.0%
Interrupt Controller
341
17.2%
Uart Transceiver
368
18.6%
PLB2OPB Bridge
700
35.4%
PLB Arbiter
1005
50.8%
25000
20000
15000
10000
5000
0
VP20
Wang-110
VP40
VP100
D/MAPLD
18 Note: The shaded modules can be replaced by other
approach.
2004
Mitigation State Machine
1) OCM BRAM single-bit error
2) PLB BRAM single-bit error
3) DDR SDRAM single-bit error
CPU
Reset
CPU reset counter
== full
Normal
System
Reset
1) OPB Bus error
2) PLB Bus error
System reset counter
== full
1) Configuration check fail
2) PLB EDC double-bit error
3) DDR SDRAM double-bit error
Wang-110
19
Mitigation Severity
1) CPU mismatch
2) CPU watchdog timer
3) OCM EDC double-bit error
CPU
Interrupt
FPGA
Reconfiguration
D/MAPLD
2004
Conclusion
 Identified and categorized error prone
regions on the Virtex-II Pro into four
types
 Developed mitigation strategies for each
region.
 Radiation test on the overall system is in
progress.
Wang-110
20
D/MAPLD
2004
Acronyms
•
•
•
•
•
•
•
•
Wang-110
SEU : Single Event Upset
FPGA: Field Programmable Gate Array
LUT: Look Up Table
PLB: Processor Local Bus
OPB: On-Chip Peripheral Bus
OCM: On-Chip Memory
EDAC: Error Detect-And-Correct
ICAP: Internal Configuration Access Point
21
D/MAPLD
2004