Fault Tolerant State Machines Gary Burke, Stephanie Taft Jet Propulsion Laboratory, California Institute of Technology Burke D_160 / MAPLD - 2004
Download ReportTranscript Fault Tolerant State Machines Gary Burke, Stephanie Taft Jet Propulsion Laboratory, California Institute of Technology Burke D_160 / MAPLD - 2004
Fault Tolerant State Machines Gary Burke, Stephanie Taft Jet Propulsion Laboratory, California Institute of Technology Burke 1 D_160 / MAPLD - 2004 Reasons for Fault Tolerant State Machines • Reliable designs are essential for Flight systems • The state machine needs to be tolerant of single event upsets Burke 2 D_160 / MAPLD - 2004 State Machines • A state machine is a sequential machine that when built into an FPGA or ASIC controls the sequencing of actions in the digital logic • The current state of a machine is held in a state register which is updated on a clock • The next value of the state register (next state) is derived from the current state and the inputs • Outputs from the state machine are decoded from the state register and can also be combined with the inputs Burke 3 D_160 / MAPLD - 2004 State Machine Encoding • Each distinct state of a state machine is represented by a unique binary code • Encoding is the assignment of binary codes to states Burke 4 D_160 / MAPLD - 2004 Different Methods of Encoding States • Binary – The simplest encoding method in which each state is given the next available binary number in sequence • One Hot – The number of bits in the code is equal to the number of states – Each encoded state has just 1 bit in the encoded word set to a 1 (the rest are 0) Burke 5 D_160 / MAPLD - 2004 Different Methods of Encoding States Continued • Hamming Distance of 2 (H2) – Compared to Binary encoding Hamming 2 uses one extra bit to ensure all codes are separated by a Hamming distance of 2 – It will take 2 changes in the state register to reach another known state • Hamming Distance of 3 (H3) – This extension on Hamming distance of 2 encoding uses additional bits to ensure all codes are separated by a Hamming distance of 3 – It will take 3 changes in the state register to reach another known state Burke 6 D_160 / MAPLD - 2004 Synthesis • To check the overhead of each of the state machines, they were individually synthesized • Finite state machine optimization is turned off • A clock frequency of 50 MHz is used • Target device is a Xilinx Spartan 2, speed grade 6 • Error injection circuitry is not included Burke 7 D_160 / MAPLD - 2004 Synthesis Results State Machine Size # Slice Flip Flops # of 4 input LUTs Clock Max Minimum Period Synthesized Period (ns) Frequency (ns) (MHz) Binary 4 8 12 16 24 32 One Hot 4 8 32 16 24 32 2 3 4 4 5 5 7 15 25 38 50 96 20 20 20 20 20 20 272.1 178.8 129.6 122.1 109.6 94.5 3.7 5.6 7.7 8.2 9.1 10.6 4 8 12 16 24 32 10 20 31 41 63 237 20 20 20 20 20 20 238.2 194.8 173.0 148.9 148.9 68.6 4.2 5.1 5.8 6.7 6.7 14.6 Burke State # Slice # of 4 Clock Max Minimum Machine Flip input Period Synthesized Period (ns) Size Flops LUTs (ns) Frequency (MHz) Hamming 2 4 8 12 16 24 32 Hamming 3 4 8 12 16 24 32 8 3 4 5 5 6 6 8 22 41 49 84 107 20 20 20 20 20 20 226.6 133.5 124.5 117.8 91.5 87.3 4.4 7.5 8.0 8.5 10.9 11.5 5 6 7 7 9 9 15 42 55 71 91 137 20 20 20 20 20 20 162.8 117.4 105.0 102.6 88.7 83.5 6.1 8.5 9.5 9.8 11.3 12.0 D_160 / MAPLD - 2004 Four Bit State Encoding 4 Bit State Encoding 16 15 14 12 10 10 # of Slice Flip Flops 8 8 # of Four Input LUTs 7 6.1 6 3.7 4 4 4.4 4.2 Clock Period (ns) 5 3 2 2 0 Binary Burke One Hot Hamming 2 9 Hamming 3 D_160 / MAPLD - 2004 Eight Bit State Encoding 8 Bit State Encoding 25 22 20 20 15 15 15 # of Slice Flip Flops # of Four Input LUTs Clock Period (ns) 10 8 5.6 5 8.5 7.5 6 5.1 4 3 0 Binary Burke One Hot Hamming 2 10 Hamming 3 D_160 / MAPLD - 2004 Twelve Bit State Encoding 12 Bit State Encoding 60 55 50 41 40 # of Slice Flip Flops 31 30 # of Four Input LUTs 25 Clock Period (ns) 20 12 7.7 10 4 5.8 8.0 5 7 9.5 0 Binary States Burke One Hot Hamming 2 11 Hamming 3 D_160 / MAPLD - 2004 Sixteen Bit State Encoding 16 Bit State Encoding 80 71 70 60 49 50 # of Slice Flip Flops 41 38 40 # of Four Input LUTs Clock Period (ns) 30 20 10 16 8.2 4 6.7 8.5 5 7 9.8 0 Binary Burke One Hot Hamming 2 12 Hamming 3 D_160 / MAPLD - 2004 Twenty-Four Bit State Encoding 24 Bit State Encoding 100 91 90 84 80 70 63 60 # of Slice Flip Flops 50 50 # of Four Input LUTs 40 Clock Period (ns) 30 24 20 10 5 9.1 6.7 10.9 6 9 11.3 0 Binary Burke One Hot Hamming 2 13 Hamming 3 D_160 / MAPLD - 2004 Thirty-Two Bit State Encoding 32 Bit State Encoding 250 237 200 150 137 # of Four Input LUTs 107 96 100 # of Slice Flip Flops Clock Period (ns) 50 32 5 10.6 14.6 11.5 6 9 12.0 0 Binary Burke One Hot Hamming 2 14 Hamming 3 D_160 / MAPLD - 2004 Fault Injection Test • A test circuit is generated with an example of each state machine executing the same task, plus a reference state machine • The task chosen requires a16-state state machine, to detect a 16-bit pattern in a serial input stream • An error generator injects faults into all state machines except the reference state machine Burke 15 D_160 / MAPLD - 2004 Error Injection Test Continued • The outputs of each state machine are compared to the reference output • A set of counters tallies the comparison outputs • 2 types of failure are logged for each state machine: – Failure to detect pattern – False detection of pattern (false-positive) Burke 16 D_160 / MAPLD - 2004 Error Injection Test Continued • Non-key patterns are 1-bit different from the key pattern, to increase the likelihood of a false match • Error rate can vary, set to 1:199 clocks in example • Errors are weighted by distributing them pseudo-randomly over 16 bits. A state machine with a word size of n, receives n/16 of the total faults • Synchronous fault injection is before the state register • Asynchronous fault injection is after the state register • All results are from actual implementation of the test circuits in a Spartan 2 FPGA Burke 17 D_160 / MAPLD - 2004 Error Rate – Synchronous Faults Synchronous (rate=199) 0.1 0.09 0.08 errors per pattern 0.07 0.06 single false-pos single 0.05 double false-pos double 0.04 0.03 0.02 0.01 0 Binary Burke 1-Hot H2 18 H3 D_160 / MAPLD - 2004 Error Rate – Asynchronous Faults Asynchronous (rate=199) 0.02 0.018 0.016 errors per pattern 0.014 0.012 single false-pos single 0.01 double false-pos double 0.008 0.006 0.004 0.002 0 Binary Burke 1-Hot H2 19 H3 D_160 / MAPLD - 2004 Error Rate – Asynchronous Pulse Faults Pulse (rate=199) 0.018 0.016 0.014 errors per pattern 0.012 single 0.01 false-pos single double 0.008 false-pos double 0.006 0.004 0.002 0 Binary Burke 1-Hot H2 20 H3 D_160 / MAPLD - 2004 Results: Binary Encoding • Lowest resources used • Second fastest speed after One Hot – Fastest for small number of states • Second-most sensitive to errors • Generates false-positive errors i.e. reports false pattern matches Burke 21 D_160 / MAPLD - 2004 Results: One Hot Encoding • No false-positive errors (single faults) • Fastest speed except for small number of states and large number of states • Uses more resources than Binary • Inefficient for large number of states • Worst fault tolerance of all encoding tested • Has 2x the error rate of binary encoding Burke 22 D_160 / MAPLD - 2004 Results: Hamming Distance of 2 (H2) Encoding • No false-positive errors (single faults) • Better Fault Tolerance than Binary • More resources needed than One Hot, except for large number of states Burke 23 D_160 / MAPLD - 2004 Results: Hamming Distance of 3 (H3) Encoding • Zero single-fault errors – Immune to synchronous and asynchronous errors • Lowest double-fault errors • Most resources used (*) ~2x binary encoding • Slowest speed (*) (*) Except for large number of states Burke 24 D_160 / MAPLD - 2004 Summary • Binary encoding will give unpredictable results when faults are injected; generating false-positive errors in the pattern matching example • One Hot encoding provides false-positive protection, but at the cost of considerably more errors • Hamming 2 encoded state machines will provide significantly better fault tolerance at a cost of about 25% more resources than binary • Hamming 3 encoded state machines give excellent fault tolerance but at a ~2x increase in resources Burke 25 D_160 / MAPLD - 2004