Transcript Slide 1

TMR Schemes
Voting Matrix
Melanie Berg
MEI Technologies/NASA GSFC
[email protected]
Overview
Premise: Why do various FPGAs require
separate mitigation strategies?
Radiation Effects in FPGA devices
Mitigation and Actel Anti-fuse Devices
Mitigation and Xilinx Virtex Devices
Tools
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 2
Radiation Effects in FPGA devices
Single Event Transients (SETs)
Single Event Upsets (SEUs)
Single Event Functional Interrupts (SEFIs)
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 3
Single Event Effects (SEEs) and IC
System Error
SEUs or SETs can occur in:
Combinatorial Logic
Sequential Logic
Configuration Memory Cells
Depending on the Device and the design,
each fault type will:
Have a probability of occurrence
Either have a significant or insignificant
contribution to system error
Every Device has different Error Responses – We
must understand the differences and design
appropriately
Page 4
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Combinatorial Logic Blocks and Potential
Upsets… SETs in Anti-fuse FPGAs
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 5
Basic Combinatorial Logic Blocks and Potential
Upsets
TRANSIENT
PSET
STUCK UNTIL OVERWRITTEN
Probability of Configuration Fault
PConfiguration
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 6
DFF’s: SEUs and SEFIs
Strike Caught in Loop
Probability of SEU
PDFFSEU
reset
D Q
CLK
PSEFI
Probability of SEFI
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 7
Transient Capture on A DFF Data Input Pin
(SET→SEU)
D
SET
CLR
clock
fs
T(fs)pulse
P(fs)SETgen
P(fs)SETprop
PDFFEn
P(fs)SET→SEU
Q
Q
tp = 1/fs
P(fs)SET→SEU
Tpulse
: System Frequency
: SET Pulse Width
: Probability SET generated with sufficient amplitude
: Probability SET can propagate with sufficient amplitude
: Probability DFF is enabled (active)
: Probability SET can be caught by clock edge
P fs set  seu
T ( fs) pulse  P( fs) SETgen  P( fs) SETprop  PDFFEn

1
2
fs
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 8
sDFFerror
Frequency Effects and
Conventional DFF Upset Theory
Composite Cross Section
PDFFSEU & PDFFMBU
Frequency
~0
P fs DFFerror  PDFFSEU  P( fs) SET SEU  PDFFMBU
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 9
Summary: Most Significant Factors of
System Error Probability P(fs)error
Configuration
SRAM Based
FPGAs
PConfiguration
DFFs
SEFIs
STATIC
Dynamic
Clocks & Resets
SEU
SET→SEU
Inaccessible
control circuitry
PDFFSEU
P( fs) SET SEU PSEFI
P fserror  PConfiguration  PDFFSEU  P( fs) SET SEU  PSEFI
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 10
Reducing System Error: Common Mitigation
Techniques
Mitigation can be:
Embedded: built into the device library cells
User does not verify the mitigation – manufacturer does
User inserted: part of the actual design process
User must verify mitigation… Complexity is a RISK!!!!!!!!
Common Mitigation Types:
Local Triple Modular Redundancy (LTMR)
Global Triple Modular Redundancy (GTMR)
P fserror  PConfiguration  PDFFSEU  P( fs) SET SEU  PSEFI
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 11
Example Mitigation Schemes
will use Majority Voting
MajorityVoter  I1  I 2  I 0  I 2  I 0  I1
I0
0
0
0
0
1
1
1
1
I1
0
0
1
1
0
0
1
1
I2
0
1
0
1
0
1
0
1
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Majority Voter
0
0
0
1
0
1
1
1
Page 12
Mitigation and Actel Antifuse Devices
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 13
ACTEL RTAX-S Architecture Basics
Super Cluster:
•Combinatorial Cells: C CELLS
•DFF Cells: R Cells
Source: RTAX-S/SL RadTolerant FPGAs 2009 Actel.com
Embedded RHBD:
Hardened Global Clocks and Resets
Antifuse Configuration is SEU immune
Embedded Localized TMR (LTMR) at each DFF (RCELL)
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 14
Local Triple Modular Redundancy
(LTMR): Smallest Area & Power
Non-Mitigated
Mitigated
Triple Each DFF + Vote…
Data paths are not redundant – can only have one voter
Unprotected:
Clocks and Resets… SEFI
Transients (SET->SEU)
Internal/hidden device logic: SEFI
Low
P fserror  PConfiguration  PDFFSEU
 P( fs) SET SEU  PSEFI
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 15
ACTEL RTAX-S Embedded
Mitigation… LTMR and SETs
Combinatorial logic: C-CELL
C C R
C
TX
TX
RX
RX
B
TX
TX
RX
RX
Combinatorial logic C-CELL
C C R
Super
Cluster
Sequential logic R-CELL
X
TX
X
RX
X
Combinatorial logic C-CELL
C
C
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
R
Page 16
RTAX Example: Probability of Error
Reduction
P fserror  PConfiguration  PDFFSEU  P( fs) SET SEU  PSEFI
0
Low
~0
Error Probability is Per DFF bit
Error Rate must reflect frequency of operation
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 17
Upper-Bound Error Prediction RHBD
Anti-fuse FPGA
DFF (near) Static Error Bit Rate no CCells PDFFSEU:
dEbit
10  Errors  Source: Actel
 110 

dt
bit

day


15MHz to 120MHz: Dynamic Error Bit Rate with 8
levels of CCells P(fs)SET→SEU:
Source: NASA Goddard
dEbit  fs
8  Errors 
110 
 6 10 

dt
 bit  day 
9
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 18
Upper-Bound Error Prediction Actel
RHBD Anti-fuse FPGA
With embedded LTMR Mitigation + Hardened Clocks:
P fserror  P( fs) SET SEU
dE dEbit  fs

* #UsedDFFs 
dt
dt
 Errors   bits 
 * n

 6 x10 
 bit  day   design 
8
dE
Errors 
3 

 3x10 
dt
 design  day 
Thousands of years in LEO !!!!!
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 19
Mitigation and Xilinx Virtex Devices
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 20
Xilinx XQR4VSX55: Radiation Test
Data
Xilinx Consortium: VIRTEX-4VQ STATIC SEU CHARACTERIZATION SUMMARY: April/2008
Probability
Error Rate
LEO
GEO
Upsets
Upsets
device  day device  day
Configuration Pconfiguration
Memory:
XQR4VSX55
Combined
SEFIs per
device
PSEFI
dEconfiguration
dt
dESEFI
dt
7.43
4.2
7.5x10-5
2.7x10-5
For non-mitigated designs the most significant upset
M Berg, Trading ASIC and FPGA Considerations for System
factor is:
Insertion; IEEE Nuclear Science Radiation Effects Conference 2009
Configuration
P
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 21
Global Triple Modular Redundancy (GTMR):
Largest Area → Greatest Complexity
Non-Mitigated
Mitigated
Triple Entire Design
Triple I/O and Voters
Unprotected – hidden device logic SEFIs
Can not be an embedded strategy: Complex to verify
Xilinx offers XTMR
P fserror  PConfiguration  PDFFSEU  P( fs) SET SEU  PSEFI
Low
Low
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Low
Page 22
XTMR – Capturing
Asynchronous Input data
Async_data_tr0
Dynamic Analysis:
Metastability
Filter
D
SET
CLR
•One domain leads the other
two
Async_data_tr1
D
Q
Q
SET
CLR
Async_data_tr2
D
SET
CLR
n
INPUT:
Async_DATA_tr0
INPUT:
Async_DATA_tr1
INPUT:
Async_DATA_tr2
n+1
n+2
D
Q
SET
CLR
D
Q
Q
Q
Q
D
Q
SET
CLR
D
Edge Detect
Circuit
SET
CLR
Q
SET
CLR
D
Q
Q
Q
D
Q
E
SET
CLR
D
Q
SET
CLR
SET
CLR
Q
D
Q
E
Q
D
Q
E
Q
Q
SET
CLR
SET
CLR
Q
Q
V
O
T
E
R
Q
Q
n+3
INPUT
SKEW
n
n+1
n+2
n+3
n+4
n+5
Edge_detect_tr0
Edge_detect_tr1
Edge_detect_tr2
Voted rising edge
detect
EDGE DETECT TIMING
WAVEFORM
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 23
Time Domain Considerations: XTMR
Single Bit Failures …Not Detected by
Static Node Analysis
n
n+1
n+2
n+3
INPUT:
Async_DATA_tr0
INPUT:
Async_DATA_tr1
CONFIGURATION
BIT HIT
n+1
INPUT:
Async_DATA_tr2
n+2
n+3
n+4
n+5
Edge_detect_tr0
Edge_detect_tr1
NO EDGE DETECTION
Edge_detect_tr2
Voted rising edge
detect
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 24
Voters and Asynchronous Signal
Capture
Metastability
Filter
Place voter after
metastability filters
It satisfies skew
constraints because
voter is anchored at DFF
control points
n+2
n+3
n+4
INPUT:
Async_DATA_tr0
INPUT:
Async_DATA_tr1
n+5
D
CLR
D
D
VOTER
n+1
D
Q
SET
SET
CLR
SET
Q
Q
D
Q
SET
D
Q
SET
CLR
SET
D
Q
Q
SET
CLR
D
D
SET
CLR
Q
SET
CLR
D
Q
Q
Q
SET
CLR
D
SET
CLR
Q
Q
Q
Q
Q
D
Q
E
SET
D
Q
CLR
Q
Q
SET
D
Q
E
Q
D
Q
E
CLR
Q
Q
CLR
Q
SET
D
SET
V
O
T
E
R
Q
Q
CLR
SET
CLR
Q
Q
Edge Detect
Circuit
Q
Q
SET
CLR
Q
CLR
Q
D
Q
CLR
Metastability
Filter
CLR
n+1
Q
CLR
D
INPUT:
Async_DATA_tr2
SET
Edge Detect
Circuit
D
V
O
T
E
R
SET
CLR
D
Q
D
Q
E
SET
CLR
D
SET
CLR
Q
D
Q
E
SET
CLR
Q
D
Q
E
Q
Q
SET
CLR
SET
CLR
Q
Q
V
O
T
E
R
Q
Q
Edge Detect
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 25
Upper-Bound Error Prediction:
Xilinx FPGA XTMR
PConfiguration ???
SEUs are insignificant
MBUs may be insignificant (still under investigation)
Assumes proper scrubbing
 
Assumes Unmitigated SEFIs
P
fs

P
SEFI
error
are the most predominant
source:
dESEFI
Errors 
5 
 3 10 
 nDevice

dt
 Device  day 
 Errors 
dE dESEFI
5

 3 10  n 

dt
dt
day


European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 26
Tools
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 27
Mitigation and Actel Tools
Mentor Graphics has offered LTMR for anti-fuse
devices
There is a desire to employ LTMR to Actel Flash
Based products
DTMR is another approach (GTMR with no
clock redundancy)
Flash
Assist with SETs in Anti-fuse Device
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 28
Mitigation and Xilinx Tools
Currently XTMR is commercially available from
Xilinx
NASA REAG has identified some issues:
Asynchronous domain crossings
Verification of XTMR insertion
Mentor is now evaluating GTMR with Formal
Checking
NASA REAG is expecting to use Mentor GTMR
(preliminary version) for V5 radiation testing
European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Page 29