Introduction and Course Outline

Download Report

Transcript Introduction and Course Outline

Software Reliability
CSCI 5801: Software Engineering
Software Reliability
Software Reliability
• What you know after testing:
– Software passes all cases in test suite
• What customer wants to know:
– Is the code well written in general?
– How often will it fail?
– What has to happen for it to fail?
– What happens when it fails?
Larger Context of Reliability
• Fault detection (testing and validation)
– Detect faults before the system is put into
operation
• Fault avoidance
– Build systems with the objective of creating
fault-free software
• Fault tolerance
– Build systems that continue to operate when
faults occur
Code Reviews
• Examining code without running it
– Remove dependency on test cases
• Methodology: look for typical flaws
• Best done by others who have different POV
– Code walkthroughs done by other programmers
– Pair programming in XP
– Static analysis tools
–
• Goal: Detect flaws before they become faults
(fault avoidance)
Code Walkthroughs
• Going through code by hand, statement by statement
– 90 – 125 statements/hour on average
• Team with ~4 members, with specific roles:
–
–
–
–
Moderator: runs session, insure proceeds smoothly
Code author
Inspectors (at least 2)
Scribe: writes down results/suggestions
• Estimated to find 60% to 90% of code errors
Code Walkthroughs
• Preparation
– Developer provides colleagues with code listing
and documentation
– Participants study the documentation in advance
• Meeting
– Developer leads reviewers through the code,
describing what each section does and
encouraging questions
– Inspectors look for possible flaws and suggest
improvements
Code Walkthroughs
• Example checklist:
– Data faults: Initialization, constants, array bounds, character
strings
– Control faults: Conditions, loop termination, compound
statements, case statements
– Input/output faults: All inputs used; all outputs assigned a
value
– Interface faults: Parameter numbers, types, and order;
structures and shared memory
– Storage management faults: Modification of links, allocation
and de-allocation of memory
– Exceptions: Possible errors, error handlers
Static Analysis Tools
• Scan source code for possible faults and anomalies
– Lint for C programs
– PMD for Java
Good programming practice
eliminates all warnings from
source code
• Examples:
– Control flow: Loops with multiple exit or entry points
– Data use: Undeclared or uninitialized variables, unused
variables, multiple assignments, array bounds
– Interface faults: Parameter mismatches, non-use of
functions results, uncalled procedures
– Storage management: Unassigned pointers, pointer
arithmetic
PMD Example
Static Analysis Tools
• Cross-reference table: Shows every use of a
variable, procedure, object, etc.
• Information flow analysis: Identifies input
variables on which an output depends.
• Path analysis: Identifies all possible paths
through the program.
Software reliability
• Definition:
Probability that the system will not fail during a
certain period of time in a certain environment
– Failures/CPU hour, etc.
• Questions:
• How much more testing is needed to reach
required reliability?
• What is expected reliability gain for further
testing?
12
Statistical Testing
• Testing software for reliability rather than fault
detection
– Measuring the number of errors/transaction
allows the reliability of the software to be
predicted
• Key problem: Software will never be 100% reliable!
– An acceptable level of reliability should be specified in
RSD, and the software tested and modified until that level
of reliability is reached
Reliability Prediction
• Reliability growth model
– Mathematical model of how system reliability is
predicted to change over time as faults found and
removed
– Extrapolated from current data about failures
• Can be used to determine whether system meets
reliability requirements
– Mean time to failure
– Average failures per transaction
• Can be used to predict when testing will be
completed and what level of reliability is feasible
Operational Profile
• Problem:
Statistical testing requires large number of test cases
for statistical significance (thousands)
• Where do such test cases come from?
– Often too many to create by hand
– Random generation not sufficient
Operational Profile
• Operational profile:
Set of test data whose frequency matches the actual
frequency of these inputs from ‘normal’ usage of the
system
– Close match with actual usage is necessary or the
measured reliability will not be reflected in the actual
usage of the system
• Can be generated from real data collected from an
existing system or (more often) depends on
assumptions made about the pattern of usage of a
system.
Example Operational Profile
Number o f
in pu ts
Note that some types
of inputs much more
likely than others
.. .
In pu t clas ses
LPM Estimates
• Logarithmic Poisson execution time model
(LPM)
– Major bugs found quickly
– Those major bugs cause most failures
– Effectiveness of fault correction decreases over
time
– There exists a point at which further testing has
little gain
18
Reliability prediction
Reliability Measurement Problems
• Operational profile uncertainty
– The operational profile may not be an accurate
reflection of the real use of the system
• High costs of test data generation
– Costs can be very high if the test data for the
system cannot be generated automatically
• Statistical uncertainty
– You need a statistically significant number of
failures to compute the reliability but highly
reliable systems will rarely fail
Stress Testing
• Goal of stress testing:
Determine what it will take to “break” system
–
–
–
–
“Break” = no longer meets requirements in some way
Functional: fails to perform required functions
Reliability: fails more often than specified
Performance: slower than required
• Approaches:
– Increase load/decrease resources until system breaks
– Perform “attacks” designed to produce undesirable result
Stress Testing
• Increase load on system in different ways
– Number of students simultaneously adding courses
– Size of files/databases that must be read
–…
• Decrease resources available to system (may require
fault injection software)
– Increase number of other processes running on system
– Increase lag time of networked resources
• Goal: point at which system fails should be much
greater than scenarios listed in RSD
Stress Testing
• “Attack” testing common in security
– Goal of normal testing:
Input for specific
test case
System
Desired response for
specific test case
– Goal of secure programming:
Any input
System
Does not produce
undesirable result
Stress Testing
• Based on risk analysis from design stage:
– Can roster database be deleted?
– Can intruder read files (in violation of FERPA)?
– Can a student add a course but not be added to the roster?
Fault Tolerance
• Goals:
– System continues to operate when problems occur
– System avoids critical failures (data loss, etc.)
• Problems can occur from many sources
– Anticipated at design stage
– Unanticipated (hardware faults, etc.)
• Cannot prevent all failures!
Fault Tolerance
• Usually based on idea of “backward recovery”
– Record system state at specific events (checkpoints).
After failure, recreate state at last checkpoint.
– Combine checkpoints with system log (audit trail of
transactions) that allows transactions from last
checkpoint to be repeated automatically.
• Note that backward recovery software must
also be thoroughly tested!