Software Certification Will it Fly?

Download Report

Transcript Software Certification Will it Fly?

Holistic reliability management
Keene & Associates
Software and System Reliability
Concepts to Assess and Improve
Products and Their Underlying
Product Development Process
Hewlett Packard February 15, 2006
Dr. Samuel Keene, FIEEE
[email protected]
1
Holistic reliability management
Keene & Associates
Dr Samuel Keene, FIEEE






Six Sigma Sr Master Black Belt
Past President of IEEE RS
HW, SW and System Reliability
Reliability Engineer of the Year in 1996
Reliability Engineer, Consultant, and Educator
Education: Physics (BS and MS), Operations Research,
and MBA
2
Holistic reliability management
ho·lis·tic
Keene & Associates
(h -l s t k)
a. Emphasizing the importance of the whole and the
interdependence of its parts.
b. Concerned with wholes rather than analysis or separation
into parts: holistic medicine; holistic ecology.
Note: Safety, security, reliability, and survivability are
system attributes
3
Holistic reliability management
Keene & Associates
The whole is more than the sum of its
parts




Hardware reliability
Hardware to software reliability comparison
System concepts
Software reliability concepts
 Measuring Software reliability
 Building more reliable software and systems
 Managing variation
Recent Ref: The Quest for Imperfection, Machine Design, 10.5.5 (C,
Misapplied measurements and focus)
4
Holistic reliability management
Keene & Associates
Notorious Failures
(assignable cause)
•Patriot missile misfire (1991) operational
profile change
•Jupiter Fly by – Programmed to switch power
supplies if communication not received within in
7 days (15 year mission)
•Mars Climate Orbitor (1998) mix of metric and
Imperial units
•DSC Communications failure (1991) – 4 bits
changed in 13 LOC but not regression tested
5
Holistic reliability management
Keene & Associates
Allegedly, The
first time the F15 Crossed the
equator
Normal Everyday Flight?
6
Holistic reliability management
Keene & Associates
One more special cause driven reliability
problem
Pfizer Pharmaceutical products were experiencing
intermittent failures in a Paris operating room
On site investigation revealed that a doctor’s cell
phone was infringing on Pfizer’s medical
equipment
Solution: redesign chassis covers reducing the
orifices (holes) in the equipment covers to block
radiation
7
Holistic reliability management
Keene & Associates
Bath tub curve
 The slope provides insight into the failure mechanism
Wearout
Infant
Mortality
Random
l(t)
b < 1.0
b =1.0
Time
b> 1.0
8
Holistic reliability management
Keene & Associates
Hardware failures are special cause driven also
History:
Parts count
Mil Hnbk 217
•Part type
•Stress
•Some application factors
My IBM FA experience
PRISM model
9
Holistic reliability management
Keene & Associates
Reliability Prediction Failure Analysis
Experience
 My IBM Failure Analysis experience
Pareto (80-20) effect observed
 Special Cause vs Common Cause
 Actually, a 99-01 % breakdown of parts
experiencing reliability problems
10
Holistic reliability management
Keene & Associates
Prism Reliability Model
Based upon extensive military and commercial field return data modified by
broad spectrum of expert application factors, (e.g., EMC related questions):
•Are the equipment orifices smaller than 1/10 emission wavelengths
exposure?
•Will the product be EMC certified by EU for emissions and
susceptibility?
•Do traces on alternate layers run orthogonal to each other?
•Are adjacent traces separated by twice their width?
•Plus 5 additional EMC application questions
Built in best practices and lessons learned
11
Holistic reliability management
Keene & Associates
Failures vs Faults
Failure
Fault
Departure of system
behavior in execution
from user needs
Defect in system
implementation that (can)
causes the failure when
executed
User-oriented
Developer-oriented
12
Keene & Associates
Holistic reliability management
The path to failure
Programming
oversight
(Error)
Fault
(Failure
susceptibility)
Fault
Activation
(Failure
trigger)
Failure
E.g. F(x) = 1/(x+234); well behaved except at x = -234
Programming error can occur anywhere in the process
from requirements development to test
13
Holistic reliability management
Keene & Associates
A+B=C
1 + 0=
1+1=
1 + .5 =
1 +A=
14
Holistic reliability management
Keene & Associates
Software Maintenance
Perfective changes – adding functionality, which might be
new or overlooked
Adaptive – to have the code work in a changed environment
Corrective – fixing bugs
Preventive – preclude a problem
15
Holistic reliability management
Keene & Associates
Pie Chart of Percentage Activity vs Maintenance type
C ategory
Perfectiv e
A daptiv e
C orrectiv e
Prev entativ e
16
Holistic reliability management
Keene & Associates
Reliability
Failure Intensity
Time
17
Holistic reliability management
Keene & Associates
Operational profile
Established definition: Operational profile is the set of
input events that the software will receive during
execution along with the probability that the events will
occur
Modified definition: Operational profile (usage) is: (1)
the set of input events that the software will receive
during execution along with the probability that the
events will occur, and (2) the set of context-sensitive
input events generated by external hardware and software
systems that the software can interact with during
execution. This is the configuration (C) and machine
(M). One could also add in the operator variation (O) on
18
impacting the software reliability.
Holistic reliability management
Keene & Associates
Operational Profile Example
Occurance Probability
Operation
Enter card
0.332
Verify Pin
0.332
Withdraw checking
0.199
Withdraw savings
0.066
Deposit checking
0.040
Deposit savings
0.020
Query status
0.00664
Test terminal
0.00332
Input to stolen cards list
0.00058
Backup files
0.000023
Total
1.000000
Table 1. Operational Profile for ATM Machine
19
Keene & Associates
Holistic reliability management
Reliability Estimation during Testing
18
16
Conventional test
14
12
FI/FIO
10
8
6
Operational-profile-driven test
reaches FIO faster
4
2
0
0
0.1
0.2
0.3
0.4
0.5
Mcalls
20
Holistic reliability management
Keene & Associates
Failure intensity plot
21
Holistic reliability management
Keene & Associates
CASRE model selection rules for picking the “best fit model”
1.
Do a goodness-of-fit test (i.e., KS or Chi-Square) on the model results
2.
Rank the models according to their prequential likelihood values (larger)
3.
-ln(Prequential Likelihood), though, smaller is better
4.
In case of a tie in prequential likelihood, break the tie using the values of model bias
5.
In case of a tie in model bias, break the tie using the values of model bias trend
6.
Optional - in case of a tie in model bias trend, break the tie using model noise
From Dr Allen Nikora, NASA JPL, CASRE Developer
22
Holistic reliability management
Keene & Associates
Software Reliability Predictive Models
Model Name
Keene
Musa Basic
Musa Logarithmic
Shooman
Jelinski-Moranda
Lipow
Goel-Okumoto
Schick-Wolverton
Dual Test
Weibull
Testing Success
Data Inputs
KSLOCs; SEI Level; fault density; years to maturity
Error count; time of error detection
Error count; time of error detection
Error count; time of error detection
Error count; time of error detection
Error count; time of error detection; intervals
Error count; time of error detection; intervals
Error count; time of error detection
Common error count; error count from both groups
Error count; time of error detection
# of test runs successful; total # of runs
23
Holistic reliability management
Keene & Associates
Raleigh Model Reliability Prediction
Based on Profile of Development Process Defect Discovery
Requirements
Design
Process/Product
Characteristics
Code
Unit Test
System Test
Operation
Estimation & Development
Faults/Failure Data Collection
Early-Stage
Prediction
Code-Phase
Prediction
Unit-Test
Phase
Prediction
SystemTest Phase
Prediction
Operation
Phase
Prediction
Software Reliability Estimation/Performance Evaluation
24
Holistic reliability management
Keene & Associates
Inspection Exercise

Task: You have 60 seconds to document the number of times
the 6th letter of the alphabet appears in the following text:
The Necessity of Training Farm Hands for First
Class Farms in the Fatherly Handling of Farm Live
Stock is Foremost in the Eyes of Farm Owners.
Since the Forefathers of the Farm Owners Trained
the Farm Hands for First Class Farms in the
Fatherly Handling of Farm Live Stock, the Farm
Owners Feel they should carry on with the Family
Tradition of Training Farm Hands of First Class
Farmers in the Fatherly Handling of Farm Live
Stock Because they Believe it is the Basis of Good
Fundamental Farm Management.
25
Holistic reliability management
Keene & Associates
The Reality
Quantitatively
measuring
software quality is
more like finding
flaws in silk than
measuring the size
of pearls or the
firmness of fruit
26
Holistic reliability management
Keene & Associates
Time Assertion
Software does not wear out
over time! If it is logically
incorrect today it will be
logically incorrect tomorrow
Models need to consider the
quality of the test cases and
complexity of the software
e.g., 1 LOC vs. 1M LOC
27
Holistic reliability management
Keene & Associates
Reliability Focus
 “System Management” Failures (Brandon Murphy)
 Requirements deficiencies
 Interface deficiencies
 The best products result from the best development
process, example, “The defect prevention process”
used by IBM to be the first to achieve SEI Level 5 for
their SW development process.
28
Holistic reliability management
Keene & Associates
Customer Fulfillment: Kano Diagram
Satisfaction
Unexpected
(Unspoken)
Requirement
Unfulfilled
Requirement
Fulfilled
Expected
(Unspoken)
Dissatisfaction
29
Holistic reliability management
Keene & Associates
Conclusion: Design, Software, Requirements
Capture, and the Development Process (especially
the quality of communications) made a big difference
in reliability!
30
Holistic reliability management
Keene & Associates
Keene Process-Based (apriori) SW Reliability
Model
 Process Capability (SEI Level)
 Development Organization
 Maintaining Organization
 Code Extent (SLOC)
 Exponential growth to a plateau level
 Historical Factors
 R growth profile
 Usage level
 Fault latency
 % Severity 1 and 2 failures
 Fault activation rate
 MTTR
31
Holistic reliability management
Keene & Associates
Fault Profile Curves vis a vis the CMM
Level
•I have observed a 10:1 variation in latent fault rate among
developers of military quality systems
•The best documented software fault rate has been on the
highly touted space shuttle program. It has a published
fault rate of 0.1 faults/KSLOC on newly released code (but
this is only after 8 months of further customer testing)
•The fault rate at customer turnover is 0.5 faults/KSLOC
based upon private correspondence with the lead SS
assurance manager.
•The entire code base approaches 6 sigma level of fault
rate or 3-4 faults/KSLOC. Boeing Missiles and Space
Division, another Level 5 Developer, told me they have
achieved like levels of fault rate in their mature code base.
32
Holistic reliability management
Keene & Associates
Mapping of the SEI process capability levels (I,II,III,IV,V) against
probable fault density distributions of the developed code (Depiction)
Level 1: Initial (adhoc)
Level 4: Managed (measured and capable)
Level 2: Repeatable (policies)
Level 5: Optimized (optimizing)
Level 3: Defined (documented)
33
Keene & Associates
Holistic reliability management
Combined Results-Curt Smith ISSRE 99
700
600
35%
550
91%
500
99%
450
400
350
300
SWEEP actuals
250
SWEEP prediction
200
CASRE actuals
150
Generalized Poisson estimate
100
DPM prediction
50
34
Apr-98
Mar-98
Feb-98
Jan-98
Dec-97
Nov-97
Oct-97
Sep-97
Aug-97
Jul-97
Jun-97
May-97
Apr-97
Mar-97
Feb-97
Jan-97
Dec-96
Nov-96
Oct-96
Sep-96
Aug-96
Jul-96
Jun-96
May-96
Apr-96
Mar-96
0
Feb-96
Software Errors Remaining / 100 KSLOC
650
Holistic reliability management
Keene & Associates
Synonyms
 Keene Process Based Model same as the Development
Process Model (Smith)
 SWEEP (SW Error Estimation Process) developed by
Software Productivity Consortium is an implementation
of the Raleigh (Smith). Raleigh prediction model
developed by John Gaffney of IBM.
35
Holistic reliability management
Keene & Associates
Progressive Software Reliability Prediction
Steps:
1) Collect Data:
Get fault rates for
defect data profile.
Defect Data from
SW development
phases
System Test
Use Rayleigh Model to
project latent fault
density, fi ,at delivery.
fault density
2) Curve fit:
3) Predict SteadyState MTBF:
Insert observed fi into
fi
fi=Latent fault density
at delivery.
Actual data
Development phase
Operational
MTBF
Keene’s model for
operational MTBF profile.
t
Raleigh models: Steven Kan and
John Gaffney
36
Holistic reliability management
Keene & Associates
Development Focus
Rigorous Development Process
Requirements Capture
“Voice of the Customer”
Prototypes
Lessons Learned
High Level Flow Diagrams
Data Descriptions
Architecture
Firewalls
Partitions
Safe Subset of Language
37
Holistic reliability management
Keene & Associates
Development Focus Continued
Safety Emphasis is Number 1
FTA
FMEA
Clean Room Approach
Development
Cross Functional Team
Design Reviews
Walkthroughs,
Inspections
Built in Safety Monitor
Robust Human Interface
38
Holistic reliability management
Keene & Associates
Development Focus cont.
Fault Avoidance
Fault Tolerance
FMEA
PFMEA
DPP ******
Failure Review Board
Manage Failure/Anomaly Logs
Fault Insertion
Customer Feedback
Alpha Testing
Beta Testing
39
Holistic reliability management
Keene & Associates
COTS challenge
•
•
•
•
Assure interoperability of
COTS: incompatibility of
data format, protocol,
operating assumptions
Version compatibility,
migration and extensibility
Vendor responsiveness
Vendor participation and
cooperativeness
40
Holistic reliability management
Keene & Associates
Visualizations:
Flow Graphs (devise tests, reduce
coupling, manage complexity, prioritize
analysis and verification)
Entity Relationship Diagrams
State Transition Diagrams
Data Structures
Swim Lane Diagrams
Message Handling Diagrams
GUI Screens
Prototypes
User Feedback
Data Flow Diagrams
41
BI Review BI Admin BI Creator Core Team Analyst
Admin
Creator
Keene & Associates
Swim Holistic
Lanes:
Previous
Implementation
reliability
management
Create
ECR
Create
Tasks
Investigate
Review
ECR
Tech
Review
ECR
Close
ECR
Disposition
Approve
ECR
Assign
Tasks
Complete
Execute
Assignment
Execute
Tasks
Business
Decision
Claim
Task
Draft Doc
Close
Task
Doc/Part
Admin
Reject
Doc/Part
Review
Approve
Release
Doc/Part
42
Holistic reliability management
Keene & Associates
Looking at Failures
1. Failures cluster
In particular code areas
In type or cause
2. All failures count – don’t dismiss
3. Prediction models count test failures only once
during testing; but every failure in the field
4. Software development has been said to be a
“defect removal process”
43
Holistic reliability management
Keene & Associates
Software changes degrade the
architecture and increase code
complexity
Design for maintenance
44
Holistic reliability management
Keene & Associates
Small Changes are Error Prone
LOC Changed Likelihood of error
1 line
50%
5 lines
75%
20 lines
35%
Classic Example: DSC Corp, Plano Texas,
3bits of a MSLOC program were changed
leading to municipal phone outages in major
metropolitan areas
Edwards, William, “Lessons Learned from 2 Years Inspection Data”, Crosstalk
Magazine, No. 39, Dec 1992, cite: Weinberg. G., “Kill That Code!”, IEEE Tutorial
on Software Restructuring, 1986, p. 131.
45
Holistic reliability management
Keene & Associates
Good design practices





Design for change
Question requirements
Design for “nots” as well as “shalls”
FMEA
Use and maintain documentation, eg flow graphs,
control charts, entity-relationship diagrams,…
 Question data
46
Keene & Associates
Holistic reliability management
FAILURE MODE AND EFFECTS ANALYSIS
Product Panel Assembly Process
Date ______
Team Members ________________
Page of
Process
Process
Description Function
Failure Mode
Causes
Effects
SEV FREQ DET
RPN
Recommended Control
Procure
zinc plated
plastic
panel
Provide
conductive
surface
Plating may not Dirty parts
adhere to
during plating
plastic surface process
completely
Product
malfunction
7
3
10
350
Use carbonized plastic
instead of plating.
Mount fuel
gage
To provide Cage may be
fuel reading mounted
upside down
Operator error Customer
7
will need to
send product
for repair
2
2
28
Design the mounting holes
in different sized so the
gage cannot be mounted
wrong.
Assemble
functon
indicators
To snap in
lamp cover
in proper
sequence
4
2
64
Silk screen letters on the
panel instead of lamp
cover.
Cover installed Operator error Customer
in wrong
confused
sequence
and gets
false
indications
8
47
Holistic reliability management
Keene & Associates
Why Testing Under Expected Operational
Scenarios is Not Sufficient
Samuel Keene
48
Holistic reliability management
Keene & Associates
Software Fault Injection
•A form of software testing
Not statistical testing, Not correctness proofs
•“What-if Game”
•The more you play, the more confident you can
become that your software can deal with anomalous
situations – Unanticipated Events
•Determines the consequences of Incorrect code or
input data
•Crash testing software
49
Six Sigma – World Class Process
Holistic reliability management
System Management
VOC
C
c
Keene & Associates
Development Process
Developed by Sam
Keene Jan 2006
Requirements &
Capability (CMM)
Interfaces
Communication
• KANO (M,S,D)
• Collaborative Tools
• Needs & Context
• Common Vocabulary
• Spiral Model
• Team Focus (CFDT)
Operational Profile
• GQM
• Regression Cases
Traceability
Design Point Centering
• Variational Testing
• Optimization
• Alpha - Beta
KPIV Variability
Product Verification
• Robustness & Margin
• DOE
Hardware
Understandability
• Cpk
• Program Structure
• MSA (GR+R)
• Design Documentation
Application
Process Improvement
• Environment
• Metrics
• Off-Nominal Modes
• DPP
• Stat Analysis
50
Holistic reliability management
Keene & Associates
Useful References
Draft Standard for Software Reliability Prediction IEEE_P_1633
[IEEE 90] Institute of Electrical and Electronics Engineers. IEEE Standard
Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. New
York, NY: 1990.
CASRE model: http://www.openchannelfoundation.org/projects/CASRE_3.0
51