No Slide Title

Download Report

Transcript No Slide Title

A Simplified Approach to Fault
Tolerant State Machine Design
for Single Event Upsets
Melanie Berg
Overview


Presentation describes “Hardened by Design” techniques at
a high level of abstraction… FGPA/ASIC logic Design
Background
—
—
—


Berg
Definition of Fault Tolerance
State Machines
Synchronous Design Theory
Proposed Method of SEU detection
Proposed Method of SEU correction
SLIDE 2
D219/MAPLD 2004
Definition of Fault
Tolerance



Masking or recovering from erroneous conditions in a
system once they have been detected
The degree of fault tolerance implementation is defined by
your system level requirements… I.e. what actually is
acceptable behavior upon error
Questions that must be answered within the system
requirements documentation:
—
—
—
—
Berg
Does your system only need to detect an error?
How quickly must the system respond to an error?
Must your system also correct the error?
Is the system susceptible to more than one error per clock
cycle?
SLIDE 3
D219/MAPLD 2004
Synchronous Design with
Asynchronous Events




This discussion focuses on sequential Single Event Upsets (SEUs)
within a synchronous design environment.
The SEU is considered a soft (temporary) error which has
occurred due to a DFF being hit by a charged particle.
Configuration or SRAM errors will not be considered
Although the design is synchronous, it is very
important to note that the SEU is an
asynchronous event…
—
—
—
Berg
Generally not taken into account
Metastability and unpredictable events can occur
Can invoke a SEFI
SLIDE 4
D219/MAPLD 2004
Common Fault Tolerant
Implementation

Triple Mode Redundancy (TMR) is the
most commonly implemented solution of
SEU tolerance.
—

Berg
CLR
D
SET
Glitches within the TMR voting logic
(due to mitigation across separate clock
domains or hazardous combinational
logic) must be taken into account incase
a SEU occurs near a clock edge
TMR can be very area extensive
SLIDE 5
Q
Q
Q
Voting
Logic
In many cases it is not implemented
correctly
—

Why …. Because it is a very simple
solution
D
SET
CLR
D
SET
CLR
Q
Q
Q
D219/MAPLD 2004
Glitches in TMR
Circuitry: Example
sysclk
Reset
If Outsig glitches near a
clock edge, unpredictable
results within the counter
occur
A
B
C
OutSig
TMR Circuit
E
32 bits
Counter
For this example, C will be hit by
an SEU, the TMR logic should
stay stable. However, poor
TMR circuitry was synthesized
and a glitch occurs on OutSig
Berg
SLIDE 6
D219/MAPLD 2004
Glitchy TMR Circuitry
Continued
Berg
SLIDE 7
D219/MAPLD 2004
Proposed EDAC
Methodology


Berg
Goal: The proposed EDAC techniques are:
— Targeted for synchronous Finite State Machine Designs
— Less area extensive than TMR
— Glitch Free and synchronous: Reduces the rate of SEFI
Note: Synchronous Design techniques referred to in this presentation are
derived from the ASIC industry and are implemented using HDL…
— DFF data inputs should not change within the setup and hold of the
DFF: Metastability and unpredictable functionality will occur
— Within a synchronous design, metastability will only happen at clock
domain crossings…Must use metastability filters (synchronizers) to
protect against these Asynchronous events
— Synchronous design theory minimizes clock boundary crossings
— This is a challenge when SEUs can occur at any point in time
anywhere in the circuit
SLIDE 8
D219/MAPLD 2004
Synchronous State
Machines




A Finite State Machine (FSM) is designed to
deterministically transition through a pattern of defined
states
A synchronous FSM utilizes flip-flops to hold its currents
state, transitions according to a clock edge and only accepts
inputs that have been synchronized to the same clock
Generally FSMs are utilized as control mechanisms
Concern/Challenge:
—
Berg
If an SEU occurs within a FSM, the entire system can lock up
into an unreachable state: SEFI!!!
SLIDE 9
D219/MAPLD 2004
Synchronous State
Machines

The structure consists of four major parts:
Inputs
— Current State Register
— Next State Logic
— Output logic
Inputs
—
Clock
SLIDE 10
D219/MAPLD 2004
Outputs
Berg
Current State
Next State
Encoding Schemes



Berg
Each state of a
FSM must be
mapped into some
type of encoding
(pattern of bits)
Once the state is
mapped, it is then
considered a
defined (legal) state
Unmapped bit
patterns are illegal
states
Example:
Five states need to be mapped.
There is only one input: Start
Start=0
IDLE
Start=1
GetData
Process
Data
BadData
Send
Data
SLIDE 11
D219/MAPLD 2004
Encoding Schemes
Registers: binary
encoding
Good state : SEND_DATA
STATES (5):
1
0
0
IDLE
:000
GET_DATA
:001
PROCESS_DATA:010
BAD_DATA
:011
SEND_DATA
:100
1
1
0
Bad state: unmapped
Registers: One
Hot encoding
STATES (5):
Good state : SEND_DATA
IDLE
:00001
GET_DATA
:00010
PROCESS_DATA:00100
BAD_DATA
:01000
SEND_DATA
:10000
1
0
0
0
0
1
1
0
0
0
Bad state: unmapped
Berg
SLIDE 12
D219/MAPLD 2004
Safe State
Machines???

A “Safe” State Machine has been defined as one that:
Has a set of defined states
— Can deterministically jump to a defined state if an illegal
state has been reached (due to a SEU).
—

Synthesis tools offer a “Safe” option (demand from our
industry):
TYPE states IS ( IDLE, GET_DATA, PROCESS_DATA, SEND_DATA, BAD_DATA );
SIGNAL current_state, next_state : states;
attribute SAFE_FSM: Boolean;
attribute SAFE_FSM of states: type is true;

However…Designers Beware!!!!!!!
—
Berg
Synthesis Tools Safe option is not deterministic if an SEU
occurs near a clock edge!!!!!
SLIDE 13
D219/MAPLD 2004
Binary Encoding: How
Safe is the “Safe”
Attribute?


If a Binary encoded FSM flips into an illegal
(unmapped) state, the safe option will return the
FSM into a known state that is defined by the
others or default clause
If a Binary encoded FSM flips into a good state,
this error will go undetected.
If the FSM is controlling a critical output, this
phenomena can be very detrimental!
— How safe is this?
—
Berg
SLIDE 14
D219/MAPLD 2004
Safe State
Machines???
State(1) Flips upon SEU:
Using the “Safe” attribute will transition the user to
a specified legal state upon an SEU
2
1
0
1
0
0
Good State
STATES (5):
1
1
0
Illegal State:
unmapped
IDLE
TURNON_A
TURNOFF_A
TURNON_B
TURNOFF_B
:000
:001
:010
:011
:100
Using the “Safe” attribute will not detect the SEU:
This could cause detrimental behavior
2
1
0
0
0
1
0
1
1
Good State:
TURNON_A
legal State: TURNON_B
Berg
SLIDE 15
D219/MAPLD 2004
One-Hot vs. Binary

There used to be a consensus suggesting that Binary is
“safer” than One-Hot
—

This theory has been changed!
—
—
Berg
Based on the idea that One-Hot requires more DFFs to
implement a FSM thus has a higher probability of incurring an
error
Most of the community now understands that although OneHot requires more registers, it has the built-in detection that is
necessary for safe design
Binary encoding can lead to a very “un-safe” design
SLIDE 16
D219/MAPLD 2004
Proposed SEU Error
Detection: One-Hot
Berg
Q
CLR
CLR
Q
D
SET
Q
SLIDE 17
Q
Metastability filter
to synchronize
SEU
XNOR
D219/MAPLD
combinational
logic 2004
Outputs

Current State

Next State
MUX

Inputs
SET

One-Hot requires only one bit be
active high per clock period
If more than one bit is turned on,
then an error will be detected.
Combinational XNOR over the FSM
bits is sufficient for SEU detection…
even if a SEU occurs near a clock
edge
A MUX can be used to transition the
current state into a defined “ERROR
STATE” if the parity check fails
If the system can not receive Multiple
Event Upsets within one clock period,
then the circuitry can never flip into a
legal state (illegally)!
D

Clock
Error State Pattern
FSM SEU: Error
Correction : Using
Companion States


There exists many publications on Error Correction theory.
None directly address how to correctly implement FSM fault
correction while using current day synthesis tools.
—
—
—
—
Berg
Glitch control: Generally synthesis tools will produce “glitchy”
logic
Synthesis “optimization” algorithms will erase the necessary
redundancy for EDAC
The user must sometimes hand instantiate logic
The user must place the necessary attributes to avoid
redundant logic erasure.
SLIDE 18
D219/MAPLD 2004
Error Correction
within One Cycle:
Using Companion
States

We’ll base the derivation off of a 4 state FSM:
Original FSM
Intrans=’0'
STATEA
Ousig=’1'
STATED
Outsig=’0'
Intrans=’1'
STATEB
Outsig = ‘0’
STATEC
Outsig=’0'
Berg
SLIDE 19
D219/MAPLD 2004
Error Correction
within One Cycle:
Using Companion
States

1.
Find an encoding such that the states have a
hamming distance of 3 (at least 3 bits must be
different from state to state)...
—
—
—
—
—
Berg
00000 (state-A),
11100(state-B),
01111(state-C),
10011(state-D).
Five bits are necessary to encode a four-state machine
in order to achieve the required hamming distance of
three.
SLIDE 20
D219/MAPLD 2004
Error Correction
within One Cycle:
Using Companion
States
2.
For each encoding, calculate the companion
encodings such that the hamming distance is
one… for example:
—
Companion encoding for state A (00000) is:

—
Companion encoding for state B (11100) is:

Berg
00001,00010,00100,01000,10000
11101,11110,11001,10100,01100
SLIDE 21
D219/MAPLD 2004
Error Correction
within One Cycle:
Using Companion
States



When implementing the state machine, state A is encoded as 00000
and then (theoretically) “OR-ed” with all of its companion
encodings. This covers all possible SEUs
Do the same for all other states
Use the output of the “OR-ed” states to determine next state logic.
—

Berg
Thus if a bit flips… the companion state will catch it and the FSM
will be able to correctly determine the next state
Be careful! The “OR” logic is more complex than
simply using a string of “OR” gates.
SLIDE 22
D219/MAPLD 2004
Error Correction
within One Cycle:
Glitch Control





Berg
One major issue that is extremely overlooked is SEUs
occurring near clock edges
If this occurs, your error checking logic may cause a
glitch
Due to routing timing differences, this can cause
incorrect values to be latched into the current state
registers.
Refer to a Karnaugh Map for glitch-less
implementation
The designer may have to hand instantiate the logic if
the synthesis tool does not adhere to the VHDL as
expected
SLIDE 23
D219/MAPLD 2004
Error Correction within One
Cycle: Glitch Control
State(0)
00
00
1
01
1
01
11
1
10
1
State(3)
11
State(2)
10
1
State(1)
StateA companion states SOP (including State(4) dimension):
State(0)State(1)State(2)State(3) + State(0)State(1)State(2)State(4) +
State(0)State(1)State(3)State(4) + State(0)State(2)State(3)State(4) +
State(1)State(2)State(3)State(4)
Berg
SLIDE 24
D219/MAPLD 2004
Error Correction
within One Cycle:
Glitch Control

The designer will have to include the synthesis
directives in order to turn off the tools
“optimization”:
Preserve_driver
— Preserve_signal
—

Berg
Always check the gate level output of the synthesis
tool.
SLIDE 25
D219/MAPLD 2004
Conclusion


This presentation proposes methods of Fault Tolerant State
Machine implementation due to potential IC SEU
susceptibility.
Be aware of potential glitches due to asynchronous SEUs
occurring near a clock edge…
—
—
—

Berg
Mitigation Techniques must be Glitch Free!
Mitigation may need a synchronization circuit
Due to metastability and routing delay differences, can be
more catastrophic than expected
Special directives must be used in order to drive the
synthesis tools when implementing fault tolerant redundant
logic because the tools are generally focused on area and
speed optimization.
SLIDE 26
D219/MAPLD 2004