Presentation - IEEE High Performance Extreme Computing
Download
Report
Transcript Presentation - IEEE High Performance Extreme Computing
Scrubbing Optimization via
Availability Prediction (SOAP) for
Reconfigurable Space Computing
HPEC 2012
Quinn Martin
Alan George
SOAP
Background
SOAP Approach
FPGAs and Radiation in Space
Traditional Scrubbing Methods
Mission Parameters
Markov Models
Mission Case Studies
Results
Conclusions
2
FPGAs
Field-Programmable Gate Arrays (FPGAs)
Implement custom digital logic hardware with
fabric of logic resources and interconnect
Lookup tables (LUTs) implement combinational logic
User flip flops (FFs) implement sequential logic
Switch and connection boxes route among resources
Many are reconfigurable
Allows update of routing and logic state
Partial reconfiguration can update partition of device
E.g., Virtex from Xilinx and Stratix from Altera
3
Reconfigurable FPGAs in Space
Advantages
Very high performance/power ratio
Reconfigurable (fully and partially)
Adaptable to changing environments and mission
requirements
Can update design after launch
Disadvantages
Relatively difficult to design/test applications
Configuration memory vulnerable to radiation
Can change application processor architecture in
unpredictable way
Must repair upsets via configuration scrubbing
4
Radiation Effects on FPGAs
Single-event
Effects (SEE)
Single-event Latchup (SEL) – Causes current
spike that may damage device
Single-event Upset (SEU) – Changes state of
bit(s), e.g. from logic ‘0’ to ‘1’
Can be single-bit upset (SBU) or multi-bit upset (MBU)
Single-event Functional Interrupt (SEFI) – Like
SEU, but affecting critical device resource
Total
Ionizing Dose
Degrades performance over time leading to
eventual device failure
5
Xilinx V-5/V-6 Configuration
Programmed
via SelectMAP interface
Runtime configuration interface
Also allows readback of existing configuration
32 bits per configuration word
Parallel bus width of 8, 16, or 32 bits
Max clock frequency 100 MHz
Configuration
memory arranged in frames
Minimum unit of access to config. memory
Virtex-5 – 41 words per frame
Virtex-6 – 81 words per frame
6
FPGA Scrubbing
FPGA
Configuration Scrubbing
Quickly repairs SEUs before accumulation
Accumulation defeats redundancy strategies (e.g.,
TMR)
Fast repair can prevent SEUs from manifesting as
errors
Can be decomposed into basic scrubbing
techniques
Correction techniques repair upsets
Detection techniques discover and locate upsets
7
FPGA Scrubbing Techniques
Correction Techniques
Golden Copy – Repairs configuration based on
know “golden” copy (e.g., in rad-hard PROM)
Frame ECC – Repairs based on per-frame error
syndrome code stored on-chip
Detection Techniques
Frame ECC – Detects based on per-frame
SECDED Hamming code
CRC-32– Detects using device-wide CRC-32
8
FPGA Scrubbing Strategies
Scrubbing
Strategies
Any combination of detection and correction
techniques with controller to implement algorithm
Blind Scrubbing – Golden copy correction only
Readback Scrubbing – Some detection
technique used
9
FPGA Scrubbing Strategies
10
SOAP Approach
Scrubbing Optimization via Availability
Prediction (SOAP)
Uses system availability as primary metric for
scrubbing efficacy
Models scrubbing strategies as Markov diagrams
Vary free parameters to find optimal scrubbing
system
Environmental parameters λ and α (orbits)
System parameters B and fCCLK (memory and pin
constraints)
Scrubbing parameters μ and γ (device configuration
capability)
11
SOAP Approach
12
Environmental Parameters
λ - SEU rates for devices in various orbits of
interest
Calculated per-bit and per-device using
CREME96
α – Correction factors for single-bit and multibit upsets (SBU/MBU)
From beam tests on Virtex-5 devices
13
System Parameters
Factors chosen by the system designer
based on available memories, power
budget, etc.
Affect scrubbing detection and correction
rates (see equations on next slide)
B – Configuration bus width in bits
fCCLK – Configuration clock speed in Hz
14
Scrubbing Parameters
μ – Repair rate for scrubbing technique
(per second)
γ – Detection rate for scrubbing technique
(per second)
15
Markov Algorithm Models
Blind
Built-in CRC-32
Basic detection
Frame ECC with CRC-32
No detection
CRC acts as “safety net” for upsets
undetected by Frame ECC
Frame ECC with CRC-32 and Essential
Bits (EB)
Only scrubs errors that may be critical
16
Blind Scrubbing
17
Readback CRC-32 Scrubbing
18
CRC-32 w/ Frame ECC Scrubbing
19
Case Study
Applies SOAP method to hypothetical
systems with realistic parameters
Devices
Xilinx Virtex-5
Xilinx Virtex-6
Orbits
ISS low earth orbit (LEO)
Molniya highly elliptical orbit (HEO)
8-bit SelectMAP bus at 33 MHz
Accounts for access speed of slow rad-hard
PROM
20
Case Study
Two mission types
Non upset critical (non-UC) – System continues
to run upon detection and correction of upset
Only
count critical upsets as system “unavailable”
Upset critical (UC) – System requires reset
upon detection of upset to ensure state integrity
Requires
detection
All detected upsets render system unavailable for
reset period
Will benefit from essential bits mask used in
detection
21
Non-UC Results
Continuous blind scrubbing offers highest
availability
CRC-32 offers similar availability with low
implementation complexity
Frame ECC suffers because TBUs can be
falsely corrected, resulting in further errors
22
UC Results
23
UC Results
24
Results
Frame ECC with CRC-32 and Essential Bits
mask offers highest availability
Roughly one extra nine over other methods
Xilinx-provided soft-error mitigation (SEM) core
implements similar
strategy
Other strategies still competitive
Complex state machine or software and additional
memory required for Frame ECC/EB
Model does not account for vulnerability associated
with internal scrubbing
25
Conclusions
Predicts availability for various FPGA
scrubbing strategies on real and hypothetical
platforms
Uses analytical models rather than
experimentation
Markov availability modeling with parametric
approach
Allows optimization of scrubbing strategy during
design phase
In case study, blind scrubbing best for non-UC
and Frame ECC with EB mask best for UC
26