FPGA IP Verification for Use in Severe Environments 2005 MAPLD International Conference September 2005 Paper #237 Ian Land Ian Bryant.

Download Report

Transcript FPGA IP Verification for Use in Severe Environments 2005 MAPLD International Conference September 2005 Paper #237 Ian Land Ian Bryant.

FPGA IP Verification for Use
in Severe Environments
2005 MAPLD International Conference
September 2005
Paper #237
Ian Land
Ian Bryant
Trends
Smaller geometries allow more functions
Synthesizable HDL makes design-reuse practical
Gate-level design is difficult with high density
 Resource-intensive
 Takes a long time
 Increases likelihood of error
Thus, block-level design is needed
 Intellectual property (IP) reduces effort and risk, if done right…
 A robust design process is followed, with thorough verification
 IP is proven in many applications, including space & severe
environments
 A MIL-STD-1553 example demonstrates
Land
2
MAPLD 2005/237
Robust Design Process
Structured design flow should be phase-gated
 Proposal
 Justification for development and creation of the project plan
 Definition and Planning
 Preliminary datasheet creation defining the core
 Test plan is needed
 Development
 The core is implemented and deliverables are created
 Verification and Validation
 Testing against plan and specification (ie. MIL-STD-1553; PCI)
 Release
 Release of product for volume sales
 Configuration Management, Feedback and Revision
Phase
Land
Gate
3
MAPLD 2005/237
MIL-STD-1553 Example
 Actel has developed three products
 A full-featured BC, RT, MT
 A ‘simple’ bus controller
 A ‘simple’ remote terminal
 Highlight: the simple remote terminal, Core1553BRT
 Originally released in 2002 (first production August, 2002)
 12 and 16 MHz version
 Updated for minor changes in 12/2002
 Loop back test, version text in code, etc.
 Updated for Verilog translation issue in April 2004
 Updated in 9/2004 and 11/2004 to work with design tool updates
 Revised to include 20 and 24 MHz versions in January 2005
 Manchester encoders/decoders tested as part of full-featured BC, RT, MT
 ProASIC3/E FPGA Family support added
Land
11
MAPLD 2005/237
Mil-STD-1553 RT Development
Proposal
 Substantial customer demand for MIL-STD-1553 bus interface
 Review of specification and competitive products suggested we
could improve market offerings with rad-tolerant 1553 FPGA
Definition
 MIL-STD-1553 Specification
 Preliminary datasheet highlighting the features in the proposal
Development
 Developed remote terminal
 Paid careful attention to Manchester encoder/decoder blocks
that would be re-used across product family
 Built two testbenches
 Verification – runs full set of tests and mimics validation
 User – runs fewer tests for incorporation into larger system design
Land
12
MAPLD 2005/237
RT Development, p.2
 Verification and Validation
 Stable, tested code with reviewed test results
 Check corner cases and key parameters
 Make sure parity errors injected on every bit
 12 and 16 MHz; 12 is the harder case due to clock extraction
 Tested against existing MIL-STD-1553 COTS tester and
 Certified Development Kit at Test Systems, Inc.
 Completely for 16 MHz and partially for 12 MHz
 Validated Core1553 Evaluation Board
 This is important to use with the verification test bench for future updates
 Release gives first-rate integration
 Core builds complete, board release, release note, user guide, data
sheet, certification papers
 Solution improves integration
 Developed application note, reference design and example designs
since 2002
Land
13
MAPLD 2005/237
Updates for Speed and Space
Added 20 and 24 MHz in early 2005 (v2.2)
 Manchesters validated in full-featured BC, RT, MT core
 Moved CLKSPD generic to 2-bit input port
 Allows single netlist to support four frequencies
 Modified top-level and backend timers
 Updated test benches for 20 and 24 MHz and port maps
 Fixed erroneous SYNCOUT pulses
 Occur with some non-Actel transmitters on the bus
Updating for space in late 2005 (v3.0)
 Protect the core from entering illegal states
 Hardware test for a babbling transmitter
 Re-qualify the core at Test Systems, Inc.
Land
14
MAPLD 2005/237
Severe Environment Considerations
 Level 3 verification minimum; level 4 validation
 MIL-STD-1553 cores have 3rd-party review at Test Systems, Inc.
 Requires a validation report review - actions and responses
 Have a certification envelope - test VHDL & Verilog versions at different speeds
 Have exceptional documentation and support
 Tool flow documented with versions for exact design replication
 Minimize possibility of integration engineer problems
 High coverage standards and well-explained variances
 Code coverage target of 100% for RTL
 Consider using error detection and correction for memory
 Protect the core from entering illegal states and memory upsets
 Synplicity default could lock if SEU upset
 Adds redundancy and reduces risk
 Use EDAC for memory
 Avoid the possibility of a babbling transmitter
 Can occur if failure of redundant system
 Continuously investigate other means to improve quality
 Over-sampling
 The need for incorporating DO-254
Land
15
MAPLD 2005/237
MIL-STD-1553B Tool Issues
Limit tools and document for validated cores
 Version 3.0 core will be qualified in hardware with
 Synplicity 8.1 used for synthesis
 Designer 6.2 used for layout
 ModelSim 6.0c Actel OEM used for simulation
 So is what happens if a customer uses
 Exemplar, or even Synplicity 7.71
 The qualification is not repeatable…
 The customer still needs to qualify their system
 IP vendors should document what tool versions are used for
qualified IP cores to be used in severe environments for
 Repeatability
 Re-use
Land
17
MAPLD 2005/237
Code Coverage
A way to prove that the test benches actually
test all the designed in functions
 Allows to verify that all lines of code covered
 Today’s tools allow
 Statement coverage
 Branch coverage
 Condition Coverage
 Expression Coverage
 Toggle Coverage
BUT
 Does not guarantee that the design actually implements the
specification
 Both the core and testbench may not include a function
Land
18
MAPLD 2005/237
Core1553BRT Code Coverage
 Modular core design allows us to create tests to exercise
a particular portion of code
 Verification Testbench reaches >99%
 Non covered lines are inspected and verified, typically conversion
functions or branches in code that are coded purely for safety
Land
19
MAPLD 2005/237
Coverage is Actually 100%
Branch coverage does not show 100%, but it is.
The reason is that we have safe coding, that
checks conditions before it does stuff, these
conditions are always true but the code is
better and safer with these statements. Some
others are
We never do the others, because we list valid
states 0-7 above, but the VHDL language
requires us to cover all possible states including
"ZZZ" in std_logic, this could be rewritten as -which would give 100% coverage but whose
meaning is not so obvious !
when INIT => case MUXSEL is
when INIT => case MUXSEL is
when "000" => DSTATE <= WRITE0;
-- RX Mode Code
when "010" => DSTATE <= TXSTAT;
-- TX Mode Code
when "001" => DSTATE <= WRITE0;
-- RX Data Transfer
when "011" => DSTATE <= TXSTAT;
-- TX Data Transfer
when "100" => DSTATE <= WRITE0;
-- Bcast RX Mode Code
when "110" => DSTATE <= MSGSTAT; -- Bcast TX Mode Code
-- RX Mode Code
when "010" => DSTATE <= TXSTAT;
-- TX Mode Code
when "001" => DSTATE <= WRITE0;
-- RX Data Transfer
when "011" => DSTATE <= TXSTAT;
-- TX Data Transfer
when "100" => DSTATE <= WRITE0;
-- Bcast RX Mode Code
when "110" => DSTATE <= MSGSTAT;
-- Bcast TX Mode Code
LATCHSW <= '1';
LATCHSW <= '1';
when "101" => DSTATE <= WRITE0;
when "000" => DSTATE <= WRITE0;
when "101" => DSTATE <= WRITE0;
-- Bcast RX Data Transfer
-- Bcast RX Data Transfer
when others => DSTATE <= MSGSTAT; -- Bcast TX Data Transfer
when "111" => DSTATE <= MSGSTAT; -- Bcast TX Data Transfer
LATCHSW <= '1';
LATCHSW <= '1';
end case;
when others =>
end case;
There is a trade off here between coverage and readability
In the first example its understandable what the 111 condition does,
no so in the second ? They synthesize to the same circuit
Land
20
MAPLD 2005/237
Coverage
From 99% to 100%
Getting the last 1% of coverage is time
consuming
 Especially in designs that include lots of error detection and
recovery logic
 Often in attempting to do this you will by accident force the
design into an unexpected state that highlights an issue
Core1553BRT
 In going from 99% to 100% we discovered that when we are
transmitting and verifying the loop backed data - if the last
word of a burst (Data or Status) contained all zeros and a
Manchester error was introduced by the transceiver then we did
not detect the error
 We did detect just Manchester errors
 We did detect just data errors
 Additional tests now added to test benches to verify this in all
future releases.
Land
21
MAPLD 2005/237
Safe State Machines
 Although space FPGA’s incorporate redundancy though
triple flip flops and voting, RTL code also needs to be safe
 Commercial FPGA synthesis tools can generate ‘unsafe’
state machines
 Optimized for small area or speed
 One - hot state machines by default
 Some have option of Safe State machines
 Make sure all illegal states are covered
 BUT HOW DO YOU PROVE IT IS SAFE?
 For example, beware of hidden illegal conditions in the code like
counters that count to a value and reset
 What happens if the count toggles to a value > the reset condition?
 In reality - design redundancy in and test it
 Fix the state encoding
 Synthesis tool independent
 Make test benches to force illegal states
Land
22
MAPLD 2005/237
Safe State Machines
Design
 Hard Code states using
bit_vectors
-- RT Data word transfers signals
-- Hard encoded for safe state machines
 Make sure all 2**N values specified
 In the Case statement


signal DSTATE : bit_vector(3 downto 0);
constant IDLE
: bit_vector(3 downto 0) := "0000";
…..
Do not use others clause, list all states.
Simulator will warn if you’ve forgotten
any states
Using bit_vector means that you need
not worry about the ‘X’ and ‘Z’ branches
in the case
constant ALLDONE : bit_vector(3 downto 0) := "1100";
constant UNUSED0 : bit_vector(3 downto 0) := "1101";
constant UNUSED1 : bit_vector(3 downto 0) := "1110";
constant UNUSED2 : bit_vector(3 downto 0) := "1111";
 In Illegal States
attribute syn_preserve of DSTATE : signal is true;
 Clear critical signals

attribute syn_encoding of DSTATE : signal is "orginal";
e.g. Transmit enable
attribute syn_replicate of DSTATE : signal is false;
 Send FSM back to IDLE state
 Create a FSM_ERROR output
Case DSTATE is
….
 One for each state machine
when UNUSED0 | UNUSED1 | UNUSED2 =>
 Synthesis
FSMD_ERROR <= '1';
 Make sure state registers are not
duplicated, if they are you may not
detect the illegal state
 Make sure any FSM optimization in the
Synthesis tool is disabled
DSTATE
<= IDLE; -- clear critical controls
BENDREQ
ENC_STB
DBUSY
<= '0';
<= '0';
<= '0';
CMDDONE
<= '0';
end case;
Land
23
MAPLD 2005/237
Safe State Machines
Testing
 How do you prove that the resultant netlist includes the safe state
machine ?
 Identify the STATE registers in the netlist.
 Using the simulator force the state register to all states

Reset core after each test to prevent side effects of forcing states
 Verify that the FSM_ERROR output is asserted
printf("Testing Main State Machine - 16 states, 13-15 Illegal");
for state in 0 to 15 loop
resetcore(RSTNOW,CLK16);
printf(" Testing State %d : Restart by typing : do forcefsm.do 0 %04b",fmt(state)&fmt(state));
assert FALSE report "Ignore ERROR, restart simulation ^^^^^^" severity ERROR ;
-- before restarting state machine is forced to the illegal state
wait for 1 us; -- allow time for tcl script to force error
check_state(state, (state>=13), status, ERR);
end loop;
resetcore(RSTNOW,CLK16);
---------------------------------------------------------------------------------------------------------------------------------------force -deposit sim:/tbench/u12__0/uut1/DSTATE_3/Q $state_bit3 0
force -deposit sim:/tbench/u12__0/uut1/DSTATE_2/Q $state_bit2 0
force -deposit sim:/tbench/u12__0/uut1/DSTATE_1/Q $state_bit1 0
force -deposit sim:/tbench/u12__0/uut1/DSTATE_0/Q $state_bit0 0
Land
24
MAPLD 2005/237
Safe State Machines
Results and Memory Protection
Has an effect on gate count and performance
compared to normal implementation flows
 Causes a 7% increase in gate count
 Causes a 1% drop in performance
 But still fits in device and meets performance requirements
Memory Usage
 Make sure that EDAC memory is used,
 Consider about scrub rates, etc.
 Avoid memory because it is more easily upset by radiation
Land
25
MAPLD 2005/237
What is a ‘Babbling’ Transmitter?
 Requirements
 All RT’s are required to monitor outputs to detect if they are
babbling and if so stop, referred to as a Fail Safe Timer
 If detected by the bus controller it sends a message to the
terminal using the other bus to stop the babbling transmitter
 How can a RT babble?
 Two errors (failures) have to occur within the terminal:
1. The logic that controlled the enable signal to the transmitter has to
fail, and second,
2. The terminal's fail-safe timer (maximum of 800.0 microseconds)
has to have failed.
 Some designs use a digital counter for the fail-safe timer, a single
failure in a clock line could cause a babbling transmitter
Land
26
MAPLD 2005/237
Avoid Babbling Transmitter
Design
 Transmit Timeout
process(CLKSPD)
 MIL-STD-1553 requires that a
separate circuit monitors the
transmissions and stops the
transmitter if a babbling
transmission is detected i.e.
greater than 33 words transmitted
 Even though the protocol state
machines may never theoretically
cause this, it is a requirement to
include this logic
 Separate circuit that monitors the
Transmit enables and detects if
active for greater than 680us
 If triggers, then enable to external
begin
case CLKSPD is
when "00" => HWTIMVALUE <= "0100001"; -- 12MHz
when "01" => HWTIMVALUE <= "0101011"; -- 16MHz
when "10" => HWTIMVALUE <= "0110110"; -- 20MHz
when others => HWTIMVALUE <= "1000001"; -- 24MHz
end case;
end process;
PTXTTIM: process(CLK,RSTn)
variable TXT_TIMER : std_logic_vector(14 downto 0);
begin
if RSTn='0' then
TXT_TIMER := ( others => '0');
TXT_ERROR <= '0';
elsif CLK'event and CLK='1' then
TXT_ERROR <= '0';
if TXT_TXBUSY='1' then
TXT_TIMER := TXT_TIMER + 1;
transceiver is disabled and error
condition generated.
else
TXT_TIMER := ( others => '0');
end if;
if TXT_TIMER(14 downto 8) = HWTIMVALUE then
TXT_ERROR <= '1';
end if;
end if;
end process;
Land
27
MAPLD 2005/237
Babbling Transmitter Testing
How do you test this ?
 Protocol State machines do not do this in normal operation
Create test mode input - TESTTXTTOUT
 Modifies the protocol state machine
 When high, causes >32 data words to be transmitted
 Test benches set this and verify that the core detects the
babbling transmitter
 Allows testing, but does this create an additional failure
mechanism ?
 May be pulled inactive by an external resistor, if this was to fail
then the core would fail
External Input can be disabled
 Can remove logic from core to prevent this error condition
 Synthesis will remove the error injection logic.
Land
28
MAPLD 2005/237
Another consideration
Over Sampling
 Some systems can be improved by over-sampling input streams
 Then filtering or voting
 1553B
 Already has well protected data stream
 Manchester coding

“00” and “11” patterns are error conditions
 Parity on data words
 Core1553BRT
 Samples incoming data at 6X, 8X 10X or 12X the base 2MHZ rate
 Required for clock extraction and ability to handle 1553B jitter and noise
requirements
 Additional over sampling is not implemented at present because
 As is, Core1553BRT passes all requirements required by the 1553B RT test
 Would require higher speed clocks
 Higher power consumption
 Larger device
 Would require a major redesign
 Adds additional risk with a major redesign
Land
29
MAPLD 2005/237
RTCA/DO-254
Design Assurance Guidance for Electronic HW
 Advisory Circular 20-152
 Ratified 6/30/05, calls for DO-254 compliance for design assurance
levels A, B or C
 DO-254 standard originally developed in 2000
 DO-254 is a hardware standard, IP is hardware
 There are many misunderstandings about this standard
 So far, there is no precedence for DO-254 certified IP
 We are focusing on section 10 by considering to provide Hardware
Design Life Cycle Data for relevant cores
 What does it require?
 A DO-254 development flow in addition to the ISO-certified flow
 More documentation
 It forces discipline to follow a test plan and document against that plan
 PHAC and HAS are important elements
 Without this, customers treat our IP as COTS products (section 11)
Land
30
MAPLD 2005/237
Lessons Learned
 High quality = attention to detail
 You cannot do too much verification for IP in severe environments
 We found a bug increasing code coverage from 98% to 100%
 Have gate reviews backed with data
 Document variations from perfect
 For example, if code coverage is 99%, understand why
 Experience matters
 Design
 Products
 Customers
 There needs to be a way to add objectivity to verification
 Against a tester
 By a third party
 Have another person review the code or perform verification
 You can always improve
 Core originally tested at multiple speeds, but not multiple languages
 DO-254 adds additional discipline to the development process
Land
32
MAPLD 2005/237
Conclusion
Pre-built and verified IP can reduce risk, if
 A structured, robust development process is followed
 Phase-gate process, even if simplified
 Additional concerns for severe environments are considered
 Safe state machines
 Redundant check for babbling
 Verification and validation is demonstrated
 Code coverage near 100%
 Certification of demonstration board design
 Deliverables and documentation ease use
 Helps integration and design re-use
 Many customers prove the core in a variety of environments
 More than one company can do on its own
Land
33
MAPLD 2005/237
Conclusion
Block-based Design Enables Development
Spacecraft I/O Board Example
Shared Memory
(on or off-chip)
ASM51 MCU
(8051)
Serial
Channel
Remote
Monitor
Land
Prog.
I/O
Sensor
Module
PCI bus to
instrument panel
1553 bus to
rest of craft
PCI
1553 RT
Memory Data Bus
Special Function Register Bus
Synchronous Serial
Channel (SDLC)
Asynchronous Serial
Channel (UART)
Data Transfer Port
Avionics Control Port
34
MAPLD 2005/237