Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 MIRCHANDANI P226/MAPLD2005

Download Report

Transcript Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 MIRCHANDANI P226/MAPLD2005

Using Software Rules To Enhance
FPGA Reliability
Chandru Mirchandani
Lockheed-Martin Transportation & Security Solutions
September 7-9, 2005
MIRCHANDANI
1
P226/MAPLD2005
Introduction

To meet…
•

Develop a Process to…
•
•
•

System Objectives
Verify FPGA Capability
Validate FPGA Reliability
Enhance FPGA Quality
By developing an Adaptive Model……..
…...using Software Rules….
MIRCHANDANI
2
P226/MAPLD2005
Problem Statement



Requirement: Display sensor data in near-real
time
Constraints: No loss of data, data quality &
integrity, and timeliness
Information: Uncertain…to make design decision
with lowest risk of failure
Solution………Adaptive Model
MIRCHANDANI
3
P226/MAPLD2005
Software Reliability

Develop Criteria for Design Objective Acceptance

Prioritize tasks or functions in order of criticality


Develop metrics to measure performance of tasks
with respect to constraints
Evaluate design options based on measured
reliability metrics
MIRCHANDANI
4
P226/MAPLD2005
Typical Software Options

Critical software functions are distributed as
redundant instances on multiple processors, thus
minimizing the loss of service due to a processor
failure……..
Application A1
Application A1
Processor 1
(I-ary)
(II-ary)
Processor 2
MIRCHANDANI
5
P226/MAPLD2005
Typical Software Options (contd.)

Distributing system level functions so that
multiple users can independently use the
function…....
Processor 1
Application B1
Application B1
Processor 2
MIRCHANDANI
6
P226/MAPLD2005
Typical Software Options (contd.)

Data replication to minimize the loss of critical
data in the event of a processor failure or
software system failure…..
Processor 1
Application C1
Storage 1
Application C1
Storage 2
Processor 2
MIRCHANDANI
7
P226/MAPLD2005
Redundant Instances of Software



Initially detect, contain and recover from faults as
soon as possible, and in the event this is not
possible
Allow the control to be passed on to the
redundant instance within the reliability and
availability requirements levied on the system
Finally, include language defined mechanisms to
detect and prevent the propagation of errors
MIRCHANDANI
8
P226/MAPLD2005
Methodology



Estimate the reliability based on instruction set
and operational usage
Re-design critical elements to decrease risk
Re-evaluate the risk of failure based on a change
in critical task design based on performance and
requirements

Re-evaluate the reliability based on failure rate

Factor in the Uncertainty in Evaluation
MIRCHANDANI
9
P226/MAPLD2005
Task Times
Task Class
Steps
Step Time
(stask)
Task Time
Reading r
xri
Sr
sr.xri
(sr.xri).nr
=
tr
Parsing p
xpi
sp
sp.xpi
(sp.xpi).np
=
tp
Pre-processing p1
xp1i
sp1
sp1.xp1i
(sp1.xp1i).np1 =
tp1
Monitoring M
xMi
sM
sM.xMi
(sM.xMi).nM
=
tM
Sorting s
xsi
ss
ss.xsi
(ss.xsi).ns
=
ts
Processing P
xPi
sP
sP.xPi
(sP.xPi).nP
=
tP
Post-processing p2
xp2i
sp2
sp2.xp2i
(sp2.xp2i).np2 =
tp2
Status-gathering S
xSi
sS
sS.xSi
(sS.xSi).nS
=
tS
Writing w
xwi
sw
sw.xwi
(sw.xwi).nw
=
tw
MIRCHANDANI
10
Total Tasks Time (ttask)
P226/MAPLD2005
FPGA System - Conceptual

Consider a FPGA-based system comprising of the
Reading, Parsing and Pre-Processing Tasks…..
SR
SPP
SP
Input
SP
SR
Output
SPP
…each Task is a subsystem
MIRCHANDANI
11
P226/MAPLD2005
Task Reliability Block Diagram
Reading
HW
Reading
SW
Reading
CCF
Reading
Reading
HW
SW
[1-{1-(exp(-(1-γh).λshwi.t).exp(-(1-γs).λsswi.t))}^2]
AND
MIRCHANDANI
(exp(-γh.uh.λhwi.t).exp(-γs.us.λswi.t)
OR
12
P226/MAPLD2005
Definitions
Calendar Time – τ
Mission Time to Calculate the Reliability
Execution – ei
Percentage of Mission Time used by the Task (or Subsystem)
Execution Time – t
ei . τ
Usage for SW
Percentage of the Total software used by the Task
Usage for HW
Percentage of Area of the Active portion of the Device used by Task
λshwi
Failure Intensity of Task i hardware with respect to Execution time
λsswi
Failure Intensity of Task i software with respect to Execution time
γhi
Fraction of Task i Task hardware that are common cause failures
γsi
Fraction of Task i Task software that are common cause failures
MIRCHANDANI
13
P226/MAPLD2005
Parameters & Derivations






MIRCHANDANI
λshwi = λhwi.uh.(1-γh)
λsswi = λswi.us.(1-γs)
λhwi.uh.(γh) and λswi.us.(γs)
ei . Τ
Subsystem Reliability
RSS1 . RSS2 . RSS3
Failure Intensity:
Failure Intensity:
Common Cause:
Execution Time t:
RSSi :
System Reliability RS :
Reading
Parsing
Pre-Processing
Usage SW - us
0.3
0.3
0.4
Usage HW - uh
0.3
0.4
0.3
λhwi
0.3
0.4
0.3
λswi
0.3
0.4
0.3
Execution - ei
0.2
0.1
0.7
14
P226/MAPLD2005
System Configuration Options
Configuration
HW Common Cause Fraction
SW Common Cause Fraction
γh
γs
Same Code & Device
0.01
1
Same Code & Diff Devices
0.0025
0.9975
Diff Code & Same Device
0.01
0.5
Diff Code & Devices
0.0025
0.1
MIRCHANDANI
15
P226/MAPLD2005
Results
Option
Configuration
FPGA-based System Reliability
1
Same Code, Same Devices
0.895726564
2
Same Code, Diff Devices
0.895973815
3
Diff Code, Same Devices
0.944752579
4
Diff Code, Diff Devices
0.98356125
MIRCHANDANI
16
P226/MAPLD2005
Conclusions

Cost and Schedule Slips

Development Delays and Costs

Adaptive Model

Optimization and Design Constraints
Contact Address: [email protected]
MIRCHANDANI
17
P226/MAPLD2005