Fault-Tolerance in VHDL Description: Transient

Download Report

Transcript Fault-Tolerance in VHDL Description: Transient

Fault-Tolerance in VHDL Description:
Transient-Fault Injection &
Early Reliability Estimation
TIMA-INPG
Lab
Fabian Vargas, Alexandre Amory
[email protected]
Raoul Velazco
[email protected]
Catholic University – PUCRS
Electrical Engineering Dept.
Av. Ipiranga, 6681
90619-900 Porto Alegre
Brazil
TIMA-INPG Laboratory
46, Av. Félix Viallet
38031 – Grenoble
France
Summary




1. Motivation: Important issues on the design of FT circuits
for space applications
2.1. The Proposed Approach:
 Built-In Reliability Functions Library
 Target Architecture: main blocks
2.2. Reliability Early-Estimation:
 Main steps of the procedure and fault-coverage
estimation
 Fault-Injection Mechanism: LFSR to inject
single/multiple faults
 Example of fault injection in the VHDL: Generate
Statement
3. Conclusions & Future Work
[email protected]
2
1. Motivation:
Important concerns of computer designers for space applications :

Power computation, area usage, weight, and
dependability (availability, reliability, and testability).
Main Characteristics & Drawbacks :




application-specific systems (requirements change frequently from application to application) :
 very expensive systems!
Synthesis (EDA) tools do not represent effective development facilities
 the short time available for making remedical changes to a faulty
application in time-critical systems is not often respected.
 not optimized compilers.
There is a lack of commercial libraries with special components (incorporating FT facilities)
Development of an FPGA/ASIC board to test/validate the FT strategies :
 takes time and money!
[email protected]
3
1. Motivation:
 Radiation causes Single-Event Upset (SEU) in memory elements:
 Processor latches and cache mem. cells are sensitive to SEUs
 FPGAs store logic/routing in latches.
body
S
0V
0V
0V
ion track
D
5V
p+
De la yed
(Di ffusion)
+ -
+ -
n+
+ -
ro n
+ -
drift
n+
funneling
+ -
elec
t
+ -
cu r
r
en t
+ -
Prompt
(Dr ift + Funne ling)
Current
N FET
gate
+ -
diffusion
p substrate
0
(a)
0.2
0.4
1
10
100
Time
(nsec. )
(b)
Fig. 1. Illustration of the charge collection mechanism that causes single-event upset :
(a) particle strike and charge generation;
(b) current pulse shape generated in the n+p junction during the collection of the
charge.
[email protected]
4
2.1. The Proposed Approach:
Built-In Reliability Functions Library:
achieving the desired circuit fault-tolerance
T ry to se le ct diffe ren t re liab ilit y fun ction s
VH D L C irc u it D e sc rip tio n
B uil t-In Rel iabi li ty
Functions Library
Ge ne ratio n o f th e Fa ult-T o le ran t H W
H igh -R elia b ility H W Pa rt
VH D L Simula to r
NO
Transient-Faul t C overage
D esired R eli abil i ty Level ?
F au lt-In je ctio n
C o ns tra in ts
Y ES
H W Syn th es is
C ircu it R e lia bility Verifica tio n Step
Fig. 2. Block diagram of the FT-PRO tool being developed to
automate the process of generating storage element
transient-fault-tolerant complex circuits.
[email protected]
5
2.1. The Proposed Approach:
Built-In Reliability Functions Library:
achieving the desired circuit fault-tolerance
(a)
(b)
Fig. 3. Target block diagram generated by the FT-PRO Tool:
(a) for a single register;
(b) for an n-register bank.
[email protected]
6
2.1. The Proposed Approach:
Built-In Reliability Functions Library:
achieving the desired circuit fault-tolerance
(a)
(b)
Fig. 4. Control block diagrams: (a) Parity Generator;
(b) Checker/Corrector.
[email protected]
7
2.2. Reliability Early-Estimation:
injecting transient faults (SEUs) in VHDL
code

Insertion of the transient (single or multiple) fault in the VHDL code
according to a predefined MTBF.


Simulate the circuit.
After simulation, we look for the primary outputs (POs) of the circuit to
verify, for each of the injected transient faults, if they affected the functional
circuit operation.
[email protected]
8
2.2. Reliability Early-Estimation:
injecting transient faults (SEUs) in VHDL
code

In this case, we can obtain one of the three conclusions :

the fault was not propagated to the POs, then it is considered redundant;

the fault was propagated to the POs of the circuit and it was detected by the
built-in reliability functions appended to the memory elements. (This can be
verified by reading out the outputs of the comparators along with the VHDL
code after simulation.) Then, the reliability of the circuit is maintained.

if the fault produced an erroneous PO and it was not detected by the appended
hardware, then the reliability of the circuit is reduced.
– This happens because either the reliability functions used in the
program fail to detect such a fault, or the choice of the memory
elements to be made fault-tolerant is not adequate (because
important blocks of storage elements remain in the original form).
[email protected]
9
2.2. Reliability Early-Estimation:
injecting transient faults (SEUs) in VHDL
code

At the end of this process, we compute the overall
transient fault coverage as a function of the predefined
MTBF for the target application as follows:
Transient_Fault_Coverage(MTBF) =
K
.
(M - E)
Where:
K is the number of detected transient faults;
M is the total number of injected transient faults;
E is the number of redundant transient faults in the VHDL code.
[email protected]
10
2.2. Reliability Early-Estimation:
injecting transient faults (SEUs) in VHDL
code
Fig. 5. Approach used to inject faults in the VHDL code. (Example for
a circuit that operates with 8 information bits plus 5 check bits).
[email protected]
11
2.2. Reliability Early-Estimation:
injecting transient faults (SEUs) in VHDL
code
Three different operating modes:
(a) normal_mode. No fault injection is possible during the simulation process.
(b) precision_fault-injection_mode. Single/multiple faults can be injected in the selected
memory register. User defines which bits and in which sequence the selected bits will
be flipped by setting specific seeds into the LFSR, before clocking it. This results in
the injection of the fault(s) in the selected memory element. Reset the LFSR, and
repeat the operation to insert another seed into this element and so on.
(c) random_fault-injection_mode. A unique reset is performed in the beginning of the
process in order to inject the first seed. After this, every time the user wants to inject
a fault in the selected memory element, he needs only to generate the clock signal is
activated, a fault is pseudo-randomly injected into the selected memory element by
the LFSR.
[email protected]
12
2.2. Reliability Early-Estimation:
injecting transient faults (SEUs) in VHDL
code

At the VHDL code level, the LFSR can be
implemented by means of a Generate Statement.

This mechanism can be used as a conditional
elaboration of a portion of a VHDL description.
[email protected]
13
package FAULT_INJECTION_PKG is
...
-- fault injection mode
-- 0 => normal mode
-- 1 => precision fault injection mode
-- 2 => random fault injection mode
constant FAULT_INJECTION : integer := 0;
-- to allow fault injection in high data order, set this constant
constant FAULT_DATA_HIGH : std_logic := '1';
...
end FAULT_INJECTION_PKG;
-----------------------------------------------entity REG_FT is
port(
CLOCK,
RESET,
-- chip enable
CE :in std_logic;
-- input from data bus
D :in std_logic_vector(7 downto 0);
-- output to data bus
Q :out std_logic_vector(7 downto 0);
ERROR :out std_logic_vector(1 downto 0) );
end REG_FT;
[email protected]
14
architecture REG_FT of REG_FT is
-- register (info + check bits)
signal REG : std_logic_vector(12 downto 0);
-- info bits
alias INFO_REG : std_logic_vector(7 downto 0) is reg(12 downto 5);
-- check bits
alias CHECK_REG : std_logic_vector(4 downto 0) is reg(4 downto 0);
...
begin
...
NORMAL_MODE:
if FAULT_INJECTION = 0 generate
-- input data from data bus
INFO_REG <= D;
-- parity from parity generator
CHECK_REG <= PARITY_GEN;
end generate;
PRECISION_FAULT_INJECTION_MODE:
if FAULT_INJECTION = 1 generate
FAULT_DATA_HIGH_BLOCK:
if FAULT_DATA_HIGH = '1' generate
LFSR_DATA_HIGH: LFSR port map(
LFSR_IN
=> INFO_REG(7 downto 4),
LFSR_OUT=> LFSR_OUT_DATA_HIGH,
CLK_IN
=> CLK_LFSR,
RST_IN => RESET);
-- insert a fault in the 4 MSB bits
INFO_REG <= LFSR_OUT_DATA_HIGH & INFO_REG(3 downto 0);
end generate;
end generate;
...
end REG_FT;
[email protected]
15
2.2. Reliability Early-Estimation:
injecting transient faults (SEUs) in VHDL
code
Consider the clock signal C1 used to drive the LFSR. The goal of this control
signal is to determine the moment when the LFSR evaluates, i.e. the exact
moment when a fault is injected in the selected memory element.
Possible implementation at the VHDL code level : command after, to introduce
timing constraints to memory element assignments.
...
C1 := “1” after 100ms;
C1 := “0” after 200ms;
C1 := “1” after 300ms;
C1 := “0” after 400ms;
...
Note that the number of faults injected depends on the type of the seed placed
in the LFSR.
[email protected]
16
3. Conclusions & Future Work:
We presented a new approach to automate the process of generating fault tolerant
complex circuits described in VHDL language. The approach uses coding techniques
associated to registers or group of registers to detect the occurrence of a bit-flip
(Single-Event Upset – SEU) and to localize the affected memory element (thus,
performing error correction).

In a second step, this approach also estimates the reliability of such complex circuits
with respect to SEU. This procedure is also performed in an early stage of the design
process, i.e., at the circuit VHDL specification level.


This approach is being automated through the development of the FT_PRO tool.
A test vehicle (a Z80-like microprocessor) is being implemented in a commercial FPGA
to be exercised under radiation at the Lawrence Berkeley Lab facility (88-inch
cyclotron). Experimental results will allow to verify the effectiveness of the reliability
early-estimation procedure, as well as will provide a valuable feedback to future
improvements of the built-in reliability functions database.

[email protected]
17