Fault-Tolerance in VHDL Description: Transient

Download Report

Transcript Fault-Tolerance in VHDL Description: Transient

Circuit Modeling and Fault Injection Approach to
Predict SEU Rate and MTTF in Complex Circuits
Fabian Vargas, Alexandre Amory
[email protected]
Catholic University – PUCRS
Electrical Engineering Dept.
Av. Ipiranga, 6681
90619-900 Porto Alegre
Brazil
Summary





1. Motivation and Preliminary Considerations
2. Circuit Modeling and Fault Injection Approach
2.1. VHDL Skeleton and the Error Management Unit
3. Development of the Statistical Models
3.1. SEU Rate
– Cross-Section
– Error (SEU) Rate
3.2. MTTF Rate
– Reliability Model
4. Discussions & Example of Computation
4.1. 3 FT-implementations: Data-path, Control-path, Both
5. Final Considerations
[email protected]
2
1. Motivation and Preliminary
Considerations

In 1962: first predictions of existence of such a phenomena.

By the year of 1975: existence of Single-Event Upsets (SEUs)
was verified in practice.

From that time to now:
- Extensive theoretical work to explain failure mechanisms,
- Sophisticated test techniques and procedures developed to
extrapolate the laboratory data failure rates to realistic
radiation environments like space, nuclear power plants,
or commercial flights (33,000 feet).
[email protected]
3
1. Motivation and Preliminary
Considerations
 Laboratory experiments are typically performed using the in-flux test
method: high-energy particle accelerator (cyclotron).
 In practice, only one type of ion specie is used (associated high-cost
to change ion sources into the accelerator).
 The DUT package lid is removed and placed in a vacuum chamber.
The test socket is mounted on a platform which can be rotated so that the
angle of incidence between the ion beam and chip surface can be
changed.
 The DUT is electrically exercised by a tester connected to the test
socket through a set of cables and special connectors to the vacuum
chamber.
[email protected]
4
1. Motivation and Preliminary
Considerations
In summary, the In-flux Test Method ...
 Provides very accurate SEU rate predictions.
 Drawbacks:
- high cost associated: two or three cyclotron hours may result in some
thousands of tenths of dollars.
- requires the development of specific HW (and SW) interfaces: which
takes money and time during the design process.
- “time-to-market” is affected: development of rigorous test sets, which
take long procedures to be validated before the device characterization
step itself takes place.
[email protected]
5
1. Motivation and Preliminary
Considerations
Compared to traditional in-flux test methods:
 does not require laboratory experiments to
characterize microelectronic devices for operation in radiation
environments
 execution simplicity
 intrinsic low-cost
 presents not only fault injection mechanisms adapted
to circuits modeled in VHDL, but it also considers a fault modeling
strategy that represents real radiation-induced transient faults (SEUs)
in memory elements.
[email protected]
6
1. Motivation and Preliminary
Considerations
(a)
(b)
Fig. 1. Comparison between the design flows of devices
for operation in radiation environments:
(a) the traditional in-flux method and
(b) the proposed approach.
[email protected]
7
2. Circuit Modeling & Fault Injection
Approach
Goal  prepare the VHDL code
to run in a fault simulation process.
1) Generate and instantiate an “Error Management Unit - EMU” inside the
architecture of the circuit VHDL main code.

Goal  control the fault injection process during fault simulation
2) Run Srand Algorithm from the C-ANSI language:

Goal  random generation of the time instants to inject faults during simulation.
[email protected]
8
2. Circuit Modeling & Fault Injection
Approach
Inside the EMU architecture, two Linear Feedback Shift Register (LFSR) entities:
LFSR_Reg_ Selector and LFSR_Bit_ Selector are instantiated as
Components :
a) LFSR_Reg_ Selector  selects the memory element to which the transient
fault will be injected.
b) LFSR_Bit_Selector  selects the bit position in the memory element that will
be upset.

Note: these LFSRs are implemented by modified primitive polynomials in
order to generate all 2n memory element addresses and 2m bit positions.
[email protected]
9
2. Circuit Modeling & Fault Injection
Approach
Functions of the Error Management Unit - EMU:

a) reads data from external file: randtime.txt to determine the:



time instants to inject faults
initial seeds for the LFSR processes
b) generates a simulation report file: result.txt, which contains information
about:
 the total # of faults injected,
 the list of memory elements and bit positions affected by faults,
 the # of faults injected in each one of these elements.
[email protected]
10
Line
This skeleton-based VHDL code
is melt to run in a fault
simulation set.
Main characteristic: the ease
automation of the procedure by
which the skeleton can be
generated from a synthesizable
VHDL circuit description.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Code Structure
library IEEE;
use IEEE.std_logic_1164.all;
Entity circuit_example is
port ( ...);
End circuit_example;
Architecture arch of circuit_example is
|
|
Component ErrorManagementUnit
port ( ...);
End Component;
|
|
Begin
|
|
EMU: ErrorManagementUnit
port map ( ... );
|
|
End arch;
--------------------------------------------------------------------------------------------------------library IEEE;
use IEEE.std_logic_1164.all
Entity ErrorManagementUnit is
port ( ... );
End ErrorManagementUnit;
Architecture arch_EMU of ErrorManagementUnit is
|
|
Component LFSR is
port ( ... );
End Component;
|
|
File RandomTimeFile : TEXT open READ_MODE is “randtime.txt”;
File ResultFile : TEXT open WRITE_MODE is “result.txt”;
|
|
Begin
|
|
LFSR_Reg_ Selector: LFSR
port map ( ... );
|
|
LFSR_Bit_Selector: LFSR
port map ( ... );
|
|
End arch_EMU;
[email protected]
11
3. Development of the Statistical Models
3.1. SEU Rate
Cross Section :

L =
N , given in [(errors.device)/(particles.bit)]
1R
where:
N: # of functional errors [errors].
1: total # of faults injected during fault simulation [faults/device].
R: # of memory cells [bits].
SEU Rate :

 = L 2 , given in [errors/(bit.s)]
where:
L: cross section of the CUT [(errors.device)/(faults.bit)].
2: frequency of which faults are injected in the circuit in real
environment [faults/device.second].
[email protected]
12
3. Development of the Statistical Models
3.2. MTTF Rate
P1,1 = 1 - e-t
(I)
Pr,n = Cn,r.P1,1r.(1 - P1,1)n - r
(II)
R1(t) = 1 - Pr,n
r=d+1
(III)
R1(Nt) = [1 - Pr,n]N
(IV)
Rw(Nt) = [1 - Pr,n]NW
(V)
MTTF = 0Rwdt
(VI)
Rw(Nt) = {[1 - Pr,n]Wt/t }t
(VII)
-t
MTTF =
(VIII)
W.ln[1 - Pr,n]
MTTF1 =
-t
(IX)
W1.ln[1 - Pr,n1]
MTTFCC = MTTFMIN (MTTF1 , MTTF2 , ... , MTTFn)
[email protected]
(X)
13
4. Discussions and Example Computation
General Characteristics
Type of architecture
Number and type of instructions
Von
Neumann
27
Data/Instruction cache
Number of general purpose registers
Type of register bank
Control Registers
No
16
Dual Port
3
Number of flags
Data format
4
16 bits
Remarks
Without pipeline, multiplexed instruction/data bus
and memory (Von Newman)
5 (branch), 5 (Logic), 4 (Arithmetic), 5 (Mem.
Access), 8 (Others), 16-bit Instructions
16-bit registers
A Single-Port version was also implemented
Program Counter (PC), Stack Pointer
Instruction Register (IR)
Carry-out, Overflow, Negative, Zero
(SP),
Table 1. General characteristics of the Microprocessor R3:
a primary case-study for the proposed methodology.
[email protected]
14
4. Discussions and Example Computation
We
generated
3
different
implementations of the R3 processor.
fault-tolerant
(FT)
Based on the use of information redundancy (Hamming
Code + 1 Parity Bit per register) to protect the memory
elements:
a) only in the datapath (version 1);
b) only in the control path (version 2);
c) in both parts of the processor (version 3).
[email protected]
15
4. Discussions and Example Computation


Case Study ...
Commercial aircraft flying at 33,000 feet altitude during a time period of 10 hours.
According to [10], at this region, high-energy particles are represented mainly by
neutrons, whose energy varies up to 100MeV, in a flux up to 10 particles/(cm2.hour).


( Note: this energy is large enough to produce an upset at circuits designed with
the present state-of-the-art submicronic technologies).
We assumed the “worst case” parameter: 10 particles incident on the circuit, with all
of them producing an upset per hour of circuit operation.

We developed an application program that was run in the 3 different FT versions of
the processor, one at a time.

Srand Program generated a total # of 239 time instants for a pre-specified VHDL
code fault simulation time of 4 hours.

[email protected]
16
4. Discussions and Example Computation
Version of the
processor
1
2
Number of
faults
escaping
detection
(functional
errors)
Number
of
faults
detected
Number of
faults
corrected
201
189
38
28
25
211
229
217
10
FT memory
elements
implemented only
the datapath
FT memory
elements
implemented only
the control part
FT memory
elements
3
implemented in
both parts: datapath
and control
Table 2. Fault simulation summary for the 3 FT versions of the R3
processor described in VHDL
[email protected]
17
5. Final Considerations:
We presented a novel approach based on a VHDL description to predict the
SEU rate and the mean time to failure (MTTF) for complex circuits.


Compared to traditional in-flux test methods:
does
due
not require laboratory experiments.
to its simplicity, it presents an intrinsic low-cost.
presents
not only fault injection mechanisms adapted to circuits modeled in VHDL, but
it also considers a fault modeling strategy that represents real radiation-induced
transient faults (SEUs) in memory elements.
The methodology core: Error Management Unit - MEU , described in VHDL as
an entity that is parameterized by the user.



Methodology automation: CAD tool to perform circuit modeling, fault injection
and simulation data analysis.
EMU Program (emu.vhd): www.ee.pucrs.br/~vargas/Programs .
[email protected]
18