Preprocessing Input Data to Augment Fault Tolerance in Space Applications Jayakrishnan K.

Download Report

Transcript Preprocessing Input Data to Augment Fault Tolerance in Space Applications Jayakrishnan K.

Preprocessing Input Data to Augment
Fault Tolerance in Space Applications
Jayakrishnan K. Nair
Zahava Koren
Israel Koren
C. Mani Krishna
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Motivation
Applications in harsh environments


Onboard processing of huge amounts of sensor data in
real time
Vital to anticipate and counter faults preemptively
Example: Space systems vulnerable to many faults

Bombardment by charged particles in space
 Alpha Particles
 Cosmic Rays


Power Glitches and Stray Capacitance effects
Crosstalk at CCD sensors in the detector array of
imaging systems
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Data Faults
Advanced real-time applications in hostile environments


High likelihood of input data faults
Data faults occur at source, transit from source or while in memory
We focus on input data errors
Re-running the process or a secondary is useless as the
input remains the same

Current schemes can handle process faults well, but not input data
faults
Input precision and reliability is vital to good performance

Corruption at input translates to unreliable, imprecise output
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Proposed Solution – Input Preprocessing
Input data can be preprocessed to detect and dynamically
recover from input errors

Use inherent redundancy in natural data and application semantics
 Spatial, Spectral and Temporal Correlation
Dynamic Preprocessing algorithms




Application-specific, use domain knowledge on input datasets
Statistically analyze input data to find potential outliers
Use locality modeling of data in space, spectrum and/or time
Use absolute theoretical bounds on natural data
Automatically adjust to changing turbulence in data


Better results with more cohesive datasets
Reduce false alarms (pseudo-corrections)
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Next Generation Space Telescope
* Ref: NASA
A deep space telescope spacecraft to replace Hubble



Detectors sample once every 1000s, exposed to heavy radiation
Limited downlink bandwidth (6 GB/day) -> onboard processing
COTS processors based system -> increased vulnerability
Cosmic rays can corrupt pixel data : these must be cleaned


Multiple readouts during each baseline (N= 64)
Uses this redundancy to identify and recover from transient effects.
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Input Analytical Model
Gaussian Correlation Model (GCM): The
difference between consecutive pixel intensities
follow a Gaussian distribution
(i+1) = (i) + i
where (i) are the pristine pixels in a datasets, i is a Gaussian
RV with zero mean and standard deviation representative of
simulated NGST datasets
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Fault Models
Uncorrelated model: Bitflips occur independently with a
fixed probability, 0
Correlated model: Block faults affecting contiguous
memory regions show a correlated pattern


Correlation in vertical and horizontal directions are considered
Probability corr () increases with length R of run of bitflips at 
R
corr () =  (ini)
j
j=1
where ini is the probability for initializing a fresh run, and R
is the length of the longer run among both directions.
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Algo_NGST for Dynamic Preprocessing
Application-specific for NGST, uses temporal correlation
Dynamic Statistical Analysis to obtain a voter matrix


Pixels are paired with immediate neighbors at front and back in a
pixel-window of width  for least mean distance
Indices of the turbulence across data are obtained
Filter out voters based on sensitivity parameter [1,100]

For trading-off effectiveness with computational overhead
Identify three Bit Windows using dynamic bitmasks



Window A is the most stable bit-window, has MSBs
Window C has LSBs that change with every pixel, hence ignored
Window B in middle has a temporal model for bitwise consistency
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Image Smoothing Algorithms
Optimal Median Smoothing


Each pixel is replaced by the median of a sliding
window
More robust than mean smoothing
Bitwise Majority Voting


Each bit in pixel is replaced by a majority vote in the
corresponding bit position in a sliding window
Preserves bit-wise information at the uncorrupted bits
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Relative Error in Dataset (%)
Precision Improvement for GCM datasets
Probability of a bitflip in data
A promising reduction factor in input average relative error, in the
range ~50 to ~1000, is obtained for a practical range 0<10%
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Computational Overhead
Sensitivity  can be adjusted to scale the algorithm to the achieve apposite
balance between correction and computational overhead
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Relative Error in Dataset (%)
Results for correlated input faults
Probability of a bitflip in data
The two smoothing algorithms perform very similarly, but Algo_NGST
yields better performance across all probabilities by reducing false alarms
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Orbital Thermal Imaging Spectroscope
OTIS



Reads radiation reflected by earth’s surface for various
wavelengths
Computes emissivity and temperature for each coordinate
Input and Output are represented as three-dimensional floatingpoint arrays
Unlike NGST, there is no temporal redundancy


Spectral Correlation – unreliable as it falls sharply outside a band
Spatial Correlation with Locality bounds – usable for
preprocessing
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
OTIS Datasets
Three distinctive datasets from OTIS



* Ref: E. Ciocca
Blob: Broad areas of unchanging temperature, high correlation
Stripe: Prominent vertical turbulence, other regions benign
Spots: Plethora of spots, turbulence distributed over entire region
Assumptions for Preprocessing



Exceptions occur as trends, never as single outliers
Single-bit anomalies are faults
Any theoretically out-of-bound value is a fault
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Performance Comparison for “Blob”
Relative Error in Dataset (%)
80
60
40
20
0
0.02
0.04
0.06
Probability of a bitflip in data
0.08
0.1
A very high gain in precision is obtained when bitflips are present
in highly correlated data.
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Performance Comparison for “Stripe”
Relative Error in Dataset (%)
80
60
40
20
0
0.02
0.04
0.06
0.08
0.1
Probability of a bitflip in data
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Performance Comparison for “Spots”
Relative Error in Dataset (%)
80
60
40
20
0
0.02
0.04
0.06
0.08
0.1
Probability of a bitflip in data
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Relative Error in Dataset (%)
OTIS Results for correlated input faults
80
60
40
20
0
0.05
0.01
0.15
Probability of a bitflip in data
0.2
Beyond a certain limit, the preprocessing is profligate in
generating false positives
0.25
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Conclusions
Input Faults in Space systems

Process-fault tolerance schemes cannot handle input faults
Input Preprocessing




Inherent Redundancy at input for proactive error correction
Natural Correlation in Temporal, Spectral or Spatial locality
Application-specific preprocessing algorithms for dynamic recovery
Use application semantics and domain knowledge of input data
Results


Works well for uncorrelated and correlated faults
Significant improvements in input precision for varying fault
probabilities and statistically diverse datasets
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst
Thank You
URL: http://www.ecs.umass.edu/ece/realtime
Architecture and Real-Time Systems Lab – University of Massachusetts, Amherst