FPGA IP Verification for Use in Severe Environments 2005 MAPLD International Conference September 2005 Paper #237 Ian Land Ian Bryant.
Download
Report
Transcript FPGA IP Verification for Use in Severe Environments 2005 MAPLD International Conference September 2005 Paper #237 Ian Land Ian Bryant.
FPGA IP Verification for Use
in Severe Environments
2005 MAPLD International Conference
September 2005
Paper #237
Ian Land
Ian Bryant
Trends
Smaller geometries allow more functions
Synthesizable HDL makes design-reuse practical
Gate-level design is difficult with high density
Resource-intensive
Takes a long time
Increases likelihood of error
Thus, block-level design is needed
Intellectual property (IP) reduces effort and risk, if done right…
A robust design process is followed, with thorough verification
IP is proven in many applications, including space & severe
environments
A MIL-STD-1553 example demonstrates
Land
2
MAPLD 2005/237
Robust Design Process
Structured design flow should be phase-gated
Proposal
Justification for development and creation of the project plan
Definition and Planning
Preliminary datasheet creation defining the core
Test plan is needed
Development
The core is implemented and deliverables are created
Verification and Validation
Testing against plan and specification (ie. MIL-STD-1553; PCI)
Release
Release of product for volume sales
Configuration Management, Feedback and Revision
Phase
Land
Gate
3
MAPLD 2005/237
MIL-STD-1553 Example
Actel has developed three products
A full-featured BC, RT, MT
A ‘simple’ bus controller
A ‘simple’ remote terminal
Highlight: the simple remote terminal, Core1553BRT
Originally released in 2002 (first production August, 2002)
12 and 16 MHz version
Updated for minor changes in 12/2002
Loop back test, version text in code, etc.
Updated for Verilog translation issue in April 2004
Updated in 9/2004 and 11/2004 to work with design tool updates
Revised to include 20 and 24 MHz versions in January 2005
Manchester encoders/decoders tested as part of full-featured BC, RT, MT
ProASIC3/E FPGA Family support added
Land
11
MAPLD 2005/237
Mil-STD-1553 RT Development
Proposal
Substantial customer demand for MIL-STD-1553 bus interface
Review of specification and competitive products suggested we
could improve market offerings with rad-tolerant 1553 FPGA
Definition
MIL-STD-1553 Specification
Preliminary datasheet highlighting the features in the proposal
Development
Developed remote terminal
Paid careful attention to Manchester encoder/decoder blocks
that would be re-used across product family
Built two testbenches
Verification – runs full set of tests and mimics validation
User – runs fewer tests for incorporation into larger system design
Land
12
MAPLD 2005/237
RT Development, p.2
Verification and Validation
Stable, tested code with reviewed test results
Check corner cases and key parameters
Make sure parity errors injected on every bit
12 and 16 MHz; 12 is the harder case due to clock extraction
Tested against existing MIL-STD-1553 COTS tester and
Certified Development Kit at Test Systems, Inc.
Completely for 16 MHz and partially for 12 MHz
Validated Core1553 Evaluation Board
This is important to use with the verification test bench for future updates
Release gives first-rate integration
Core builds complete, board release, release note, user guide, data
sheet, certification papers
Solution improves integration
Developed application note, reference design and example designs
since 2002
Land
13
MAPLD 2005/237
Updates for Speed and Space
Added 20 and 24 MHz in early 2005 (v2.2)
Manchesters validated in full-featured BC, RT, MT core
Moved CLKSPD generic to 2-bit input port
Allows single netlist to support four frequencies
Modified top-level and backend timers
Updated test benches for 20 and 24 MHz and port maps
Fixed erroneous SYNCOUT pulses
Occur with some non-Actel transmitters on the bus
Updating for space in late 2005 (v3.0)
Protect the core from entering illegal states
Hardware test for a babbling transmitter
Re-qualify the core at Test Systems, Inc.
Land
14
MAPLD 2005/237
Severe Environment Considerations
Level 3 verification minimum; level 4 validation
MIL-STD-1553 cores have 3rd-party review at Test Systems, Inc.
Requires a validation report review - actions and responses
Have a certification envelope - test VHDL & Verilog versions at different speeds
Have exceptional documentation and support
Tool flow documented with versions for exact design replication
Minimize possibility of integration engineer problems
High coverage standards and well-explained variances
Code coverage target of 100% for RTL
Consider using error detection and correction for memory
Protect the core from entering illegal states and memory upsets
Synplicity default could lock if SEU upset
Adds redundancy and reduces risk
Use EDAC for memory
Avoid the possibility of a babbling transmitter
Can occur if failure of redundant system
Continuously investigate other means to improve quality
Over-sampling
The need for incorporating DO-254
Land
15
MAPLD 2005/237
MIL-STD-1553B Tool Issues
Limit tools and document for validated cores
Version 3.0 core will be qualified in hardware with
Synplicity 8.1 used for synthesis
Designer 6.2 used for layout
ModelSim 6.0c Actel OEM used for simulation
So is what happens if a customer uses
Exemplar, or even Synplicity 7.71
The qualification is not repeatable…
The customer still needs to qualify their system
IP vendors should document what tool versions are used for
qualified IP cores to be used in severe environments for
Repeatability
Re-use
Land
17
MAPLD 2005/237
Code Coverage
A way to prove that the test benches actually
test all the designed in functions
Allows to verify that all lines of code covered
Today’s tools allow
Statement coverage
Branch coverage
Condition Coverage
Expression Coverage
Toggle Coverage
BUT
Does not guarantee that the design actually implements the
specification
Both the core and testbench may not include a function
Land
18
MAPLD 2005/237
Core1553BRT Code Coverage
Modular core design allows us to create tests to exercise
a particular portion of code
Verification Testbench reaches >99%
Non covered lines are inspected and verified, typically conversion
functions or branches in code that are coded purely for safety
Land
19
MAPLD 2005/237
Coverage is Actually 100%
Branch coverage does not show 100%, but it is.
The reason is that we have safe coding, that
checks conditions before it does stuff, these
conditions are always true but the code is
better and safer with these statements. Some
others are
We never do the others, because we list valid
states 0-7 above, but the VHDL language
requires us to cover all possible states including
"ZZZ" in std_logic, this could be rewritten as -which would give 100% coverage but whose
meaning is not so obvious !
when INIT => case MUXSEL is
when INIT => case MUXSEL is
when "000" => DSTATE <= WRITE0;
-- RX Mode Code
when "010" => DSTATE <= TXSTAT;
-- TX Mode Code
when "001" => DSTATE <= WRITE0;
-- RX Data Transfer
when "011" => DSTATE <= TXSTAT;
-- TX Data Transfer
when "100" => DSTATE <= WRITE0;
-- Bcast RX Mode Code
when "110" => DSTATE <= MSGSTAT; -- Bcast TX Mode Code
-- RX Mode Code
when "010" => DSTATE <= TXSTAT;
-- TX Mode Code
when "001" => DSTATE <= WRITE0;
-- RX Data Transfer
when "011" => DSTATE <= TXSTAT;
-- TX Data Transfer
when "100" => DSTATE <= WRITE0;
-- Bcast RX Mode Code
when "110" => DSTATE <= MSGSTAT;
-- Bcast TX Mode Code
LATCHSW <= '1';
LATCHSW <= '1';
when "101" => DSTATE <= WRITE0;
when "000" => DSTATE <= WRITE0;
when "101" => DSTATE <= WRITE0;
-- Bcast RX Data Transfer
-- Bcast RX Data Transfer
when others => DSTATE <= MSGSTAT; -- Bcast TX Data Transfer
when "111" => DSTATE <= MSGSTAT; -- Bcast TX Data Transfer
LATCHSW <= '1';
LATCHSW <= '1';
end case;
when others =>
end case;
There is a trade off here between coverage and readability
In the first example its understandable what the 111 condition does,
no so in the second ? They synthesize to the same circuit
Land
20
MAPLD 2005/237
Coverage
From 99% to 100%
Getting the last 1% of coverage is time
consuming
Especially in designs that include lots of error detection and
recovery logic
Often in attempting to do this you will by accident force the
design into an unexpected state that highlights an issue
Core1553BRT
In going from 99% to 100% we discovered that when we are
transmitting and verifying the loop backed data - if the last
word of a burst (Data or Status) contained all zeros and a
Manchester error was introduced by the transceiver then we did
not detect the error
We did detect just Manchester errors
We did detect just data errors
Additional tests now added to test benches to verify this in all
future releases.
Land
21
MAPLD 2005/237
Safe State Machines
Although space FPGA’s incorporate redundancy though
triple flip flops and voting, RTL code also needs to be safe
Commercial FPGA synthesis tools can generate ‘unsafe’
state machines
Optimized for small area or speed
One - hot state machines by default
Some have option of Safe State machines
Make sure all illegal states are covered
BUT HOW DO YOU PROVE IT IS SAFE?
For example, beware of hidden illegal conditions in the code like
counters that count to a value and reset
What happens if the count toggles to a value > the reset condition?
In reality - design redundancy in and test it
Fix the state encoding
Synthesis tool independent
Make test benches to force illegal states
Land
22
MAPLD 2005/237
Safe State Machines
Design
Hard Code states using
bit_vectors
-- RT Data word transfers signals
-- Hard encoded for safe state machines
Make sure all 2**N values specified
In the Case statement
signal DSTATE : bit_vector(3 downto 0);
constant IDLE
: bit_vector(3 downto 0) := "0000";
…..
Do not use others clause, list all states.
Simulator will warn if you’ve forgotten
any states
Using bit_vector means that you need
not worry about the ‘X’ and ‘Z’ branches
in the case
constant ALLDONE : bit_vector(3 downto 0) := "1100";
constant UNUSED0 : bit_vector(3 downto 0) := "1101";
constant UNUSED1 : bit_vector(3 downto 0) := "1110";
constant UNUSED2 : bit_vector(3 downto 0) := "1111";
In Illegal States
attribute syn_preserve of DSTATE : signal is true;
Clear critical signals
attribute syn_encoding of DSTATE : signal is "orginal";
e.g. Transmit enable
attribute syn_replicate of DSTATE : signal is false;
Send FSM back to IDLE state
Create a FSM_ERROR output
Case DSTATE is
….
One for each state machine
when UNUSED0 | UNUSED1 | UNUSED2 =>
Synthesis
FSMD_ERROR <= '1';
Make sure state registers are not
duplicated, if they are you may not
detect the illegal state
Make sure any FSM optimization in the
Synthesis tool is disabled
DSTATE
<= IDLE; -- clear critical controls
BENDREQ
ENC_STB
DBUSY
<= '0';
<= '0';
<= '0';
CMDDONE
<= '0';
end case;
Land
23
MAPLD 2005/237
Safe State Machines
Testing
How do you prove that the resultant netlist includes the safe state
machine ?
Identify the STATE registers in the netlist.
Using the simulator force the state register to all states
Reset core after each test to prevent side effects of forcing states
Verify that the FSM_ERROR output is asserted
printf("Testing Main State Machine - 16 states, 13-15 Illegal");
for state in 0 to 15 loop
resetcore(RSTNOW,CLK16);
printf(" Testing State %d : Restart by typing : do forcefsm.do 0 %04b",fmt(state)&fmt(state));
assert FALSE report "Ignore ERROR, restart simulation ^^^^^^" severity ERROR ;
-- before restarting state machine is forced to the illegal state
wait for 1 us; -- allow time for tcl script to force error
check_state(state, (state>=13), status, ERR);
end loop;
resetcore(RSTNOW,CLK16);
---------------------------------------------------------------------------------------------------------------------------------------force -deposit sim:/tbench/u12__0/uut1/DSTATE_3/Q $state_bit3 0
force -deposit sim:/tbench/u12__0/uut1/DSTATE_2/Q $state_bit2 0
force -deposit sim:/tbench/u12__0/uut1/DSTATE_1/Q $state_bit1 0
force -deposit sim:/tbench/u12__0/uut1/DSTATE_0/Q $state_bit0 0
Land
24
MAPLD 2005/237
Safe State Machines
Results and Memory Protection
Has an effect on gate count and performance
compared to normal implementation flows
Causes a 7% increase in gate count
Causes a 1% drop in performance
But still fits in device and meets performance requirements
Memory Usage
Make sure that EDAC memory is used,
Consider about scrub rates, etc.
Avoid memory because it is more easily upset by radiation
Land
25
MAPLD 2005/237
What is a ‘Babbling’ Transmitter?
Requirements
All RT’s are required to monitor outputs to detect if they are
babbling and if so stop, referred to as a Fail Safe Timer
If detected by the bus controller it sends a message to the
terminal using the other bus to stop the babbling transmitter
How can a RT babble?
Two errors (failures) have to occur within the terminal:
1. The logic that controlled the enable signal to the transmitter has to
fail, and second,
2. The terminal's fail-safe timer (maximum of 800.0 microseconds)
has to have failed.
Some designs use a digital counter for the fail-safe timer, a single
failure in a clock line could cause a babbling transmitter
Land
26
MAPLD 2005/237
Avoid Babbling Transmitter
Design
Transmit Timeout
process(CLKSPD)
MIL-STD-1553 requires that a
separate circuit monitors the
transmissions and stops the
transmitter if a babbling
transmission is detected i.e.
greater than 33 words transmitted
Even though the protocol state
machines may never theoretically
cause this, it is a requirement to
include this logic
Separate circuit that monitors the
Transmit enables and detects if
active for greater than 680us
If triggers, then enable to external
begin
case CLKSPD is
when "00" => HWTIMVALUE <= "0100001"; -- 12MHz
when "01" => HWTIMVALUE <= "0101011"; -- 16MHz
when "10" => HWTIMVALUE <= "0110110"; -- 20MHz
when others => HWTIMVALUE <= "1000001"; -- 24MHz
end case;
end process;
PTXTTIM: process(CLK,RSTn)
variable TXT_TIMER : std_logic_vector(14 downto 0);
begin
if RSTn='0' then
TXT_TIMER := ( others => '0');
TXT_ERROR <= '0';
elsif CLK'event and CLK='1' then
TXT_ERROR <= '0';
if TXT_TXBUSY='1' then
TXT_TIMER := TXT_TIMER + 1;
transceiver is disabled and error
condition generated.
else
TXT_TIMER := ( others => '0');
end if;
if TXT_TIMER(14 downto 8) = HWTIMVALUE then
TXT_ERROR <= '1';
end if;
end if;
end process;
Land
27
MAPLD 2005/237
Babbling Transmitter Testing
How do you test this ?
Protocol State machines do not do this in normal operation
Create test mode input - TESTTXTTOUT
Modifies the protocol state machine
When high, causes >32 data words to be transmitted
Test benches set this and verify that the core detects the
babbling transmitter
Allows testing, but does this create an additional failure
mechanism ?
May be pulled inactive by an external resistor, if this was to fail
then the core would fail
External Input can be disabled
Can remove logic from core to prevent this error condition
Synthesis will remove the error injection logic.
Land
28
MAPLD 2005/237
Another consideration
Over Sampling
Some systems can be improved by over-sampling input streams
Then filtering or voting
1553B
Already has well protected data stream
Manchester coding
“00” and “11” patterns are error conditions
Parity on data words
Core1553BRT
Samples incoming data at 6X, 8X 10X or 12X the base 2MHZ rate
Required for clock extraction and ability to handle 1553B jitter and noise
requirements
Additional over sampling is not implemented at present because
As is, Core1553BRT passes all requirements required by the 1553B RT test
Would require higher speed clocks
Higher power consumption
Larger device
Would require a major redesign
Adds additional risk with a major redesign
Land
29
MAPLD 2005/237
RTCA/DO-254
Design Assurance Guidance for Electronic HW
Advisory Circular 20-152
Ratified 6/30/05, calls for DO-254 compliance for design assurance
levels A, B or C
DO-254 standard originally developed in 2000
DO-254 is a hardware standard, IP is hardware
There are many misunderstandings about this standard
So far, there is no precedence for DO-254 certified IP
We are focusing on section 10 by considering to provide Hardware
Design Life Cycle Data for relevant cores
What does it require?
A DO-254 development flow in addition to the ISO-certified flow
More documentation
It forces discipline to follow a test plan and document against that plan
PHAC and HAS are important elements
Without this, customers treat our IP as COTS products (section 11)
Land
30
MAPLD 2005/237
Lessons Learned
High quality = attention to detail
You cannot do too much verification for IP in severe environments
We found a bug increasing code coverage from 98% to 100%
Have gate reviews backed with data
Document variations from perfect
For example, if code coverage is 99%, understand why
Experience matters
Design
Products
Customers
There needs to be a way to add objectivity to verification
Against a tester
By a third party
Have another person review the code or perform verification
You can always improve
Core originally tested at multiple speeds, but not multiple languages
DO-254 adds additional discipline to the development process
Land
32
MAPLD 2005/237
Conclusion
Pre-built and verified IP can reduce risk, if
A structured, robust development process is followed
Phase-gate process, even if simplified
Additional concerns for severe environments are considered
Safe state machines
Redundant check for babbling
Verification and validation is demonstrated
Code coverage near 100%
Certification of demonstration board design
Deliverables and documentation ease use
Helps integration and design re-use
Many customers prove the core in a variety of environments
More than one company can do on its own
Land
33
MAPLD 2005/237
Conclusion
Block-based Design Enables Development
Spacecraft I/O Board Example
Shared Memory
(on or off-chip)
ASM51 MCU
(8051)
Serial
Channel
Remote
Monitor
Land
Prog.
I/O
Sensor
Module
PCI bus to
instrument panel
1553 bus to
rest of craft
PCI
1553 RT
Memory Data Bus
Special Function Register Bus
Synchronous Serial
Channel (SDLC)
Asynchronous Serial
Channel (UART)
Data Transfer Port
Avionics Control Port
34
MAPLD 2005/237