Fault Tolerant State Machines Gary Burke, Stephanie Taft Jet Propulsion Laboratory, California Institute of Technology Burke D_160 / MAPLD - 2004

Download Report

Transcript Fault Tolerant State Machines Gary Burke, Stephanie Taft Jet Propulsion Laboratory, California Institute of Technology Burke D_160 / MAPLD - 2004

Fault Tolerant State Machines
Gary Burke, Stephanie Taft
Jet Propulsion Laboratory, California Institute of
Technology
Burke
1
D_160 / MAPLD - 2004
Reasons for Fault Tolerant State Machines
• Reliable designs are essential for Flight
systems
• The state machine needs to be tolerant of
single event upsets
Burke
2
D_160 / MAPLD - 2004
State Machines
• A state machine is a sequential machine that when
built into an FPGA or ASIC controls the
sequencing of actions in the digital logic
• The current state of a machine is held in a state
register which is updated on a clock
• The next value of the state register (next state) is
derived from the current state and the inputs
• Outputs from the state machine are decoded from
the state register and can also be combined with
the inputs
Burke
3
D_160 / MAPLD - 2004
State Machine Encoding
• Each distinct state of a state machine is
represented by a unique binary code
• Encoding is the assignment of binary codes
to states
Burke
4
D_160 / MAPLD - 2004
Different Methods of Encoding States
• Binary
– The simplest encoding method in which each
state is given the next available binary number
in sequence
• One Hot
– The number of bits in the code is equal to the
number of states
– Each encoded state has just 1 bit in the encoded
word set to a 1 (the rest are 0)
Burke
5
D_160 / MAPLD - 2004
Different Methods of Encoding States
Continued
• Hamming Distance of 2 (H2)
– Compared to Binary encoding Hamming 2 uses one
extra bit to ensure all codes are separated by a
Hamming distance of 2
– It will take 2 changes in the state register to reach
another known state
• Hamming Distance of 3 (H3)
– This extension on Hamming distance of 2 encoding
uses additional bits to ensure all codes are separated by
a Hamming distance of 3
– It will take 3 changes in the state register to reach
another known state
Burke
6
D_160 / MAPLD - 2004
Synthesis
• To check the overhead of each of the state
machines, they were individually synthesized
• Finite state machine optimization is turned off
• A clock frequency of 50 MHz is used
• Target device is a Xilinx Spartan 2, speed grade 6
• Error injection circuitry is not included
Burke
7
D_160 / MAPLD - 2004
Synthesis Results
State
Machine
Size
# Slice
Flip
Flops
# of 4
input
LUTs
Clock
Max
Minimum
Period Synthesized Period
(ns)
Frequency
(ns)
(MHz)
Binary
4
8
12
16
24
32
One Hot
4
8
32
16
24
32
2
3
4
4
5
5
7
15
25
38
50
96
20
20
20
20
20
20
272.1
178.8
129.6
122.1
109.6
94.5
3.7
5.6
7.7
8.2
9.1
10.6
4
8
12
16
24
32
10
20
31
41
63
237
20
20
20
20
20
20
238.2
194.8
173.0
148.9
148.9
68.6
4.2
5.1
5.8
6.7
6.7
14.6
Burke
State # Slice # of 4 Clock
Max
Minimum
Machine Flip input Period Synthesized Period (ns)
Size
Flops LUTs (ns) Frequency
(MHz)
Hamming 2
4
8
12
16
24
32
Hamming 3
4
8
12
16
24
32
8
3
4
5
5
6
6
8
22
41
49
84
107
20
20
20
20
20
20
226.6
133.5
124.5
117.8
91.5
87.3
4.4
7.5
8.0
8.5
10.9
11.5
5
6
7
7
9
9
15
42
55
71
91
137
20
20
20
20
20
20
162.8
117.4
105.0
102.6
88.7
83.5
6.1
8.5
9.5
9.8
11.3
12.0
D_160 / MAPLD - 2004
Four Bit State Encoding
4 Bit State Encoding
16
15
14
12
10
10
# of Slice Flip Flops
8
8
# of Four Input LUTs
7
6.1
6
3.7
4
4
4.4
4.2
Clock Period (ns)
5
3
2
2
0
Binary
Burke
One Hot
Hamming 2
9
Hamming 3
D_160 / MAPLD - 2004
Eight Bit State Encoding
8 Bit State Encoding
25
22
20
20
15
15
15
# of Slice Flip Flops
# of Four Input LUTs
Clock Period (ns)
10
8
5.6
5
8.5
7.5
6
5.1
4
3
0
Binary
Burke
One Hot
Hamming 2
10
Hamming 3
D_160 / MAPLD - 2004
Twelve Bit State Encoding
12 Bit State Encoding
60
55
50
41
40
# of Slice Flip Flops
31
30
# of Four Input LUTs
25
Clock Period (ns)
20
12
7.7
10
4
5.8
8.0
5
7
9.5
0
Binary States
Burke
One Hot
Hamming 2
11
Hamming 3
D_160 / MAPLD - 2004
Sixteen Bit State Encoding
16 Bit State Encoding
80
71
70
60
49
50
# of Slice Flip Flops
41
38
40
# of Four Input LUTs
Clock Period (ns)
30
20
10
16
8.2
4
6.7
8.5
5
7
9.8
0
Binary
Burke
One Hot
Hamming 2
12
Hamming 3
D_160 / MAPLD - 2004
Twenty-Four Bit State Encoding
24 Bit State Encoding
100
91
90
84
80
70
63
60
# of Slice Flip Flops
50
50
# of Four Input LUTs
40
Clock Period (ns)
30
24
20
10
5
9.1
6.7
10.9
6
9
11.3
0
Binary
Burke
One Hot
Hamming 2
13
Hamming 3
D_160 / MAPLD - 2004
Thirty-Two Bit State Encoding
32 Bit State Encoding
250
237
200
150
137
# of Four Input LUTs
107
96
100
# of Slice Flip Flops
Clock Period (ns)
50
32
5
10.6
14.6
11.5
6
9
12.0
0
Binary
Burke
One Hot
Hamming 2
14
Hamming 3
D_160 / MAPLD - 2004
Fault Injection Test
• A test circuit is generated with an example of each
state machine executing the same task, plus a
reference state machine
• The task chosen requires a16-state state machine,
to detect a 16-bit pattern in a serial input stream
• An error generator injects faults into all state
machines except the reference state machine
Burke
15
D_160 / MAPLD - 2004
Error Injection Test Continued
• The outputs of each state machine are compared to
the reference output
• A set of counters tallies the comparison outputs
• 2 types of failure are logged for each state
machine:
– Failure to detect pattern
– False detection of pattern (false-positive)
Burke
16
D_160 / MAPLD - 2004
Error Injection Test Continued
• Non-key patterns are 1-bit different from the key pattern,
to increase the likelihood of a false match
• Error rate can vary, set to 1:199 clocks in example
• Errors are weighted by distributing them pseudo-randomly
over 16 bits. A state machine with a word size of n,
receives n/16 of the total faults
• Synchronous fault injection is before the state register
• Asynchronous fault injection is after the state register
• All results are from actual implementation of the test
circuits in a Spartan 2 FPGA
Burke
17
D_160 / MAPLD - 2004
Error Rate – Synchronous Faults
Synchronous (rate=199)
0.1
0.09
0.08
errors per pattern
0.07
0.06
single
false-pos single
0.05
double
false-pos double
0.04
0.03
0.02
0.01
0
Binary
Burke
1-Hot
H2
18
H3
D_160 / MAPLD - 2004
Error Rate – Asynchronous Faults
Asynchronous (rate=199)
0.02
0.018
0.016
errors per pattern
0.014
0.012
single
false-pos single
0.01
double
false-pos double
0.008
0.006
0.004
0.002
0
Binary
Burke
1-Hot
H2
19
H3
D_160 / MAPLD - 2004
Error Rate – Asynchronous Pulse Faults
Pulse (rate=199)
0.018
0.016
0.014
errors per pattern
0.012
single
0.01
false-pos single
double
0.008
false-pos double
0.006
0.004
0.002
0
Binary
Burke
1-Hot
H2
20
H3
D_160 / MAPLD - 2004
Results: Binary Encoding
• Lowest resources used
• Second fastest speed after One Hot
– Fastest for small number of states
• Second-most sensitive to errors
• Generates false-positive errors i.e. reports
false pattern matches
Burke
21
D_160 / MAPLD - 2004
Results: One Hot Encoding
• No false-positive errors (single faults)
• Fastest speed except for small number of states
and large number of states
• Uses more resources than Binary
• Inefficient for large number of states
• Worst fault tolerance of all encoding tested
• Has 2x the error rate of binary encoding
Burke
22
D_160 / MAPLD - 2004
Results: Hamming Distance of 2 (H2)
Encoding
• No false-positive errors (single faults)
• Better Fault Tolerance than Binary
• More resources needed than One Hot,
except for large number of states
Burke
23
D_160 / MAPLD - 2004
Results: Hamming Distance of 3 (H3)
Encoding
• Zero single-fault errors
– Immune to synchronous and asynchronous
errors
• Lowest double-fault errors
• Most resources used (*)
~2x binary encoding
• Slowest speed (*)
(*) Except for large number of states
Burke
24
D_160 / MAPLD - 2004
Summary
• Binary encoding will give unpredictable results when
faults are injected; generating false-positive errors in the
pattern matching example
• One Hot encoding provides false-positive protection, but
at the cost of considerably more errors
• Hamming 2 encoded state machines will provide
significantly better fault tolerance at a cost of about 25%
more resources than binary
• Hamming 3 encoded state machines give excellent fault
tolerance but at a ~2x increase in resources
Burke
25
D_160 / MAPLD - 2004