Dependable software development

Download Report

Transcript Dependable software development

Critical systems development
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 1
Topics covered




Dependable processes
Dependable programming
Fault tolerance
Fault tolerant architectures
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 3
Approaches for developing
dependable software

Fault avoidance
•
•
•

Fault detection (Chapter 24)
•
•

The system is developed in such a way that human error is
avoided and thus system faults are minimised.
The development process is organised so that faults in the
system are detected and repaired before delivery to the
customer.
Activities: dependable programming practices
Verification and validation techniques are used to discover and
remove faults in a system before it is deployed.
Activities: reviews, formal verification, testing
Fault tolerance
•
•
The system is designed so that faults in the delivered software
do not result in system failure.
Activities: fault detection, damage assessment, recovery, repair.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 5
Fault-free software


Fault-free software means software which
conforms to its specification. It does NOT mean
software which will always perform correctly as there
may be specification errors.
The cost of producing fault free software is very
high. It is only cost-effective in exceptional
situations. It is often cheaper to accept software
faults and pay for their consequences than to
expend resources on developing fault-free software.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 6
Fault removal costs
Many
Few
Very few
Number of residual errors
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 7
Fault-free software development


Current methods of software engineering now allow
for the production of fault-free software, at least for
relatively small systems.
Examples
•
•
•
•
•
•
•
Dependable software processes
Quality management
Formal specification
Static verification
Strong typing
Safe programming
Protected information
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 8
Dependable processes



To ensure a minimal number of software
faults, it is important to have a well-defined,
repeatable software process.
A well-defined repeatable process is one that
does not depend entirely on individual skills;
rather can be enacted by different people.
For fault detection, it is clear that the process
activities should include significant effort
devoted to verification and validation.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 9
Dependable process characteristics
Documentable
The process should have a defined process model that sets
out the activities in the process and the documentation that is
to be produced during these activities.
Standardised
A comprehensive set of software development standards that
define how the software is to be produced and documented
should be available.
Auditable
The process should be understandable by people apart from
process participants who can check that process standards are
being followed and make suggestions for process
improvement.
Diverse
The process should include redundant and diverse verification
and validation activities.
Robust
The process should be able to recover from failures of
individual process activities.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 10
Validation activities







Requirements inspections.
Requirements management.
Model checking.
Design and code inspection.
Static analysis.
Test planning and management.
Configuration management.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 11
Topics covered




Dependable processes
Dependable programming
Fault tolerance
Fault tolerant architectures
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 12
Safe programming practices



Faults in programs are usually a consequence of
programmers making mistakes.
These mistakes occur because people lose track of
the relationships between program variables.
Examples of safe programming practices
•
•
•
•
Design for simplicity.
Protect information from unauthorised access.
Avoid “unsafe” programming language constructs.
Handle exceptions consistently.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 14
Information protection


Information should only be exposed to those parts of
the program which need to access it. This involves
the creation of objects or abstract data types that
maintain state and that provide operations on that
state.
This avoids faults for three reasons:
•
•
•
the probability of accidental corruption of information is
reduced;
the information is surrounded by ‘firewalls’ so that
problems are less likely to spread to other parts of the
program;
as all information is localised, you are less likely to make
errors and reviewers are more likely to find errors.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 15
A queue specification in Java
interface Queue {
public void put (Object o) ;
public void remove (Object o) ;
public int size () ;
} //Queue
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 16
Signal declaration in Java
class Signal {
static public final int red = 1 ;
static public final int amber = 2 ;
static public final int green = 3 ;
public int sigState ;
}
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 17
Structured programming





First proposed in 1968 as an approach to
development that makes programs easier to
understand and that avoids programmer errors.
Programming without gotos.
While loops and if statements as the only
control statements.
Top-down design.
An important development because it promoted
thought and discussion about programming.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 18
Error-prone constructs

Floating-point numbers
•

Pointers
•

Run-time allocation can cause memory overflow.
Parallelism
•

Pointers referring to the wrong memory areas can corrupt
data. Aliasing can make programs difficult to understand
and change.
Dynamic memory allocation
•

Inherently imprecise. The imprecision may lead to invalid
comparisons.
Can result in subtle timing errors because of unforeseen
interaction between parallel processes.
Recursion
•
Errors in recursion can cause memory overflow.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 19
Error-prone constructs





Interrupts
•
Interrupts can cause a critical operation to be terminated
and make a program difficult to understand.
Inheritance
•
Code is not localised. This can result in unexpected
behaviour when changes are made and problems of
understanding.
Aliasing
•
Using more than 1 name to refer to the same state
variable.
Unbounded arrays
•
Buffer overflow failures can occur if no bound checking on
arrays.
Default input processing
•
An input action that occurs irrespective of the input.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 20
Exception handling



A program exception is an error or some
unexpected event such as a power failure.
Using normal control constructs to detect
exceptions needs many additional statements to be
added to the program. This adds a significant
overhead and is potentially error-prone.
Exception handling constructs allow for such events
to be handled without the need for continual status
checking to detect exceptions.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 21
Exceptions in Java 1
class SensorFailureException extends Exception {
}
SensorFailureException (String msg) {
super (msg) ;
Alarm.activa te (msg) ;
}
// SensorFailureException
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 22
Exceptions in Java 2
class Sensor {
int readVal () throws SensorFailureException {
try {
int theValue = DeviceIO.readInteger () ;
if (theValue < 0)
throw new S ensorFailureException ("Sensor failure") ;
return theValue ;
}
}
catch (deviceIOException e)
{
throw new S ensorFailureException (“ Sensor read error ”) ;
}
}
// readVal
// Sensor
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 23
Topics covered




Dependable processes
Dependable programming
Fault tolerance
Fault tolerant architectures
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 27
Fault tolerance




In critical situations, software systems must be fault
tolerant.
Fault tolerance is required where there are high
availability requirements or where system failure
costs are very high.
Fault tolerance means that the system can continue
in operation in spite of software failure.
Even if the system has been proved to conform to its
specification, it must also be fault tolerant as there
may be specification errors or the validation may be
incorrect.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 28
Fault tolerance actions

Fault detection
•

Damage assessment
•

The parts of the system state affected by the fault must be
detected.
Fault recovery
•

The system must detect that a fault (an incorrect system
state) has occurred.
The system must restore its state to a known safe state.
Fault repair
•
The system may be modified to prevent recurrence of the
fault. As many software faults are transitory, this is often
unnecessary.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 29
Fault detection

Preventative fault detection
•

The fault detection mechanism is initiated before the state
change is committed. If an erroneous state is detected,
the change is not made.
Retrospective fault detection
•
The fault detection mechanism is initiated after the system
state has been changed. This is used when a incorrect
sequence of correct actions leads to an erroneous state
or when preventative fault detection involves too much
overhead.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 32
Type system extension


Preventative fault detection really involves
extending the type system by including
additional constraints as part of the type
definition.
These constraints are implemented by
defining basic operations within a class
definition.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 33
PositiveEvenInteger 1
class PositiveEvenInteger {
int val = 0 ;
Positive EvenInteger (int n) throws NumericException
{
if (n < 0 | n%2 == 1)
throw new N umericException () ;
else
val = n ;
}// PositiveEve nInteger
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 34
PositiveEvenInteger 2
public void assign (int n) throws N umericException
{
if (n < 0 | n%2 == 1)
throw new N umericException ();
else
val = n ;
} // assign
int toInteger ()
{
return val ;
} //to Integer
boolean equals (PositiveEvenInteger n)
{
return (val == n.val) ;
} // equals
} //PositiveEve n
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 35
Damage assessment



Analyse system state to judge the extent of
corruption caused by a system failure.
The assessment must check what parts of
the state space have been affected by the
failure.
Generally based on ‘validity functions’ that
can be applied to the state elements to
assess if their value is within an allowed
range.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 36
Robust array 1
class RobustArray {
// Checks that all the objects in an array o f objects
// conform to some defined constraint
boolean [] checkState ;
CheckableObject [] theRobustArray ;
RobustArray (CheckableObject [] theArray)
{
checkState = new bo olean [theArray.length] ;
theRobustArray = theArray ;
} //RobustArray
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 37
Robust array 2
public void assessDamage () throws ArrayD amagedException
{
boolean hasBeenDamaged = false ;
for (int i= 0; i <this.theRobustArray.length ; i ++)
{
if (! theRobustArray [i].check ())
{
checkState [ i] = true ;
hasBeenDamaged = true ;
}
else
checkState [ i] = false ;
}
if (hasBeenDamaged)
throw new A rrayDamagedException () ;
} //assessDamage
} // RobustArray
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 38
Damage assessment techniques



Checksums are used for damage
assessment in data transmission.
Redundant pointers can be used to check
the integrity of data structures.
Watch dog timers can check for nonterminating processes. If no response after a
certain time, a problem is assumed.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 39
Fault recovery and repair

Forward recovery
•

Backward recovery
•


Apply repairs to a corrupted system state.
Restore the system state to a known safe state.
Forward recovery is usually application specific
- domain knowledge is required to compute
possible state corrections.
Backward error recovery is simpler. Details of a
safe state are maintained and this replaces the
corrupted system state.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 40
Forward recovery

Corruption of data coding
•

Error coding techniques which add redundancy to coded
data can be used for repairing data corrupted during
transmission.
Redundant pointers
•
•
When redundant pointers are included in data structures
(e.g. two-way lists), a corrupted list or filestore may be
rebuilt if a sufficient number of pointers are uncorrupted
Often used for database and file system repair.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 41
Backward recovery


Transactions are a frequently used method
of backward recovery. Changes are not
applied until computation is complete. If an
error occurs, the system is left in the state
preceding the transaction.
Periodic checkpoints allow system to 'rollback' to a correct state.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 42
Safe sort procedure




A sort operation monitors its own execution and
assesses if the sort has been correctly executed.
It maintains a copy of its input so that if an error
occurs, the input is not corrupted.
Based on identifying and handling exceptions.
Possible in this case as the condition for a‘valid’ sort
is known. However, in many cases it is difficult to
write validity checks.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 43
Safe sort 1
class SafeSort {
static void sort ( int [] in tarray, int order ) throws SortError
{
int [] copy = new int [intarray.length];
// copy t he input array
for (int i = 0; i < intarray.length ; i++)
copy [i] = intarray [i] ;
try {
Sort.bubblesort (intarray, intarray.length, order) ;
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 44
Safe sort 2
if (order == Sort.ascending)
for (int i = 0; i <= intarray.length-2 ; i++)
if (intarray [i] > i ntarray [i+1])
throw new S ortError () ;
else
for (int i = 0; i <= intarray.length-2 ; i++)
if (intarray [i+1] > intarray [i])
throw new S ortError () ;
} // try block
catch (SortError e )
{
for (int i = 0; i < intarray.length ; i++)
intarray [i] = copy [i] ;
throw new S ortError ("Array not sorted") ;
} //catch
} // sort
} // SafeSort
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 45
Topics covered




Dependable processes
Dependable programming
Fault tolerance
Fault tolerant architectures
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 46
Fault tolerant architecture




Defensive programming cannot cope with faults that
involve interactions between the hardware and the
software.
Misunderstandings of the requirements may mean
that checks and the associated code are incorrect.
Where systems have high availability requirements,
a specific architecture designed to support fault
tolerance may be required.
This must tolerate both hardware and software
failure.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 47
Diversity and redundancy

Redundancy
•

Diversity
•


Keep more than 1 version of a critical component
available so that if one fails then a backup is available.
Provide the same functionality in different ways so that
they will not fail in the same way.
However, adding diversity and redundancy adds
complexity and this can increase the chances of
error.
Some engineers advocate simplicity and extensive V
& V is a more effective route to software
dependability.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 48
Diversity and redundancy examples


Redundancy. Where availability is critical
(e.g. in e-commerce systems), companies
normally keep backup servers and switch to
these automatically if failure occurs.
Diversity. To provide resilience against
external attacks, different servers may be
implemented using different operating
systems (e.g. Windows and Linux)
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 49
Hardware fault tolerance




Depends on triple-modular redundancy (TMR).
There are three replicated identical components that
receive the same input and whose outputs are
compared.
If one output is different, it is ignored and component
failure is assumed.
Based on most faults resulting from component
failures rather than design faults and a low
probability of simultaneous component failure.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 50
Hardware reliability with TMR
A1
A2
Output
comparator
A3
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 51
Output selection



The output comparator is a (relatively) simple
hardware unit.
It compares its input signals and, if one is
different from the others, it rejects it.
Essentially, the selection of the actual output
depends on the majority vote.
The output comparator is connected to a
fault management unit that can either try to
repair the faulty unit or take it out of service.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 52
Fault tolerant software architectures

The success of TMR at providing fault tolerance is
based on two fundamental assumptions
•
•

Neither of these assumptions are true for software
•
•

The hardware components do not include common design
faults;
Components fail randomly and there is a low probability of
simultaneous component failure.
It isn’t possible simply to replicate the same component
as they would have common design faults;
Simultaneous component failure is therefore virtually
inevitable.
Software systems must therefore be diverse.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 53
Design diversity


Different versions of the system are designed and
implemented in different ways. They therefore ought
to have different failure modes.
Different approaches to design (e.g object-oriented
and function oriented)
•
•
•
Implementation in different programming languages;
Use of different tools and development environments;
Use of different algorithms in the implementation.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 54
Software analogies to TMR

N-version programming
•
•

The same specification is implemented in a number of
different versions by different teams. All versions compute
simultaneously and the majority output is selected using a
voting system.
This is the most commonly used approach e.g. in many
models of the Airbus commercial aircraft.
Recovery blocks
•
•
A number of explicitly different versions of the same
specification are written and executed in sequence.
An acceptance test is used to select the output to be
transmitted.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 55
N-version programming
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 56
N-version programming


The different system versions are designed
and implemented by different teams. It is
assumed that there is a low probability that
they will make the same mistakes. The
algorithms used should but may not be
different.
There is some empirical evidence that teams
commonly misinterpret specifications in the
same way and chose the same algorithms in
their systems.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 58
Recovery blocks
Test for
success
Try algorithm
1
Acceptance
test
Algorithm 1
Acceptance test
fails – retry
Continue execution if
acceptance test succeeds
Signal exception if all
algorithms fail
Retry
Re-test
Algorithm 2
Re-test
Algorithm 3
Recovery blocks
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 59
Recovery blocks



These force a different algorithm to be used
for each version so they reduce the
probability of common errors.
However, the design of the acceptance test
is difficult as it must be independent of the
computation used.
There are problems with this approach for
real-time systems because of the sequential
operation of the redundant versions.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 60
Problems with design diversity


Teams are not culturally diverse so they tend to
tackle problems in the same way.
Characteristic errors
•
•
•
•
Different teams make the same mistakes. Some parts of
an implementation are more difficult than others so all
teams tend to make mistakes in the same place;
Specification errors;
If there is an error in the specification then this is reflected
in all implementations;
This can be addressed to some extent by using multiple
specification representations.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 61
Specification dependency



Both approaches to software redundancy are
susceptible to specification errors. If the specification
is incorrect, the system could fail
This is also a problem with hardware but software
specifications are usually more complex than
hardware specifications and harder to validate.
This has been addressed in some cases by
developing separate software specifications from the
same user specification.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 62
Critical Systems Validation
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 63
Topics covered




Reliability validation
Safety assurance
Security assessment
Safety and dependability cases
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 65
Validation of critical systems

The verification and validation costs for critical
systems involves additional validation processes
and analysis than for non-critical systems:
•
•
The costs and consequences of failure are high so it is
cheaper to find and remove faults than to pay for system
failure;
You may have to make a formal case to customers or to a
regulator that the system meets its dependability
requirements. This dependability case may require
specific V & V activities to be carried out.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 66
Reliability validation



Reliability validation involves exercising the program
to assess whether or not it has reached the required
level of reliability.
This cannot normally be included as part of a normal
defect testing process because data for defect
testing is (usually) atypical of actual usage data.
Reliability measurement therefore requires a
specially designed data set that replicates the
pattern of inputs (the operational profile) to be
processed by the system.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 68
The reliability measurement process
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 69
Reliability validation activities




Establish the operational profile for the
system.
Construct test data reflecting the operational
profile.
Test the system and observe the number of
failures and the times of these failures.
Compute the reliability after a statistically
significant number of failures have been
observed.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 70
Statistical testing



Testing software for reliability rather than fault
detection.
Measuring the number of errors allows the reliability
of the software to be predicted. Note that, for
statistical reasons, more errors than are allowed for
in the reliability specification must be induced.
An acceptable level of reliability should be
specified and the software tested and amended until
that level of reliability is reached.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 71
Reliability measurement problems

Operational profile uncertainty
•

High costs of test data generation
•

The operational profile may not be an accurate
reflection of the real use of the system.
Costs can be very high if the test data for the
system cannot be generated automatically.
Statistical uncertainty
•
You need a statistically significant number of
failures to compute the reliability but highly
reliable systems will rarely fail.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 72
Operational profiles


An operational profile is a set of test data whose
frequency matches the actual frequency of these
inputs from ‘normal’ usage of the system. A close
match with actual usage is necessary otherwise the
measured reliability will not be reflected in the actual
usage of the system.
It can be generated from real data collected from an
existing system or (more often) depends on
assumptions made about the pattern of usage of a
system.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 73
An operational profile
Number o f
in pu ts
.. .
In pu t clas ses
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 74
Reliability prediction


A reliability growth model is a mathematical model of
the system reliability change as it is tested and faults
are removed.
It is used as a means of reliability prediction by
extrapolating from current data
•
•

Simplifies test planning and customer negotiations.
You can predict when testing will be completed and
demonstrate to customers whether or not the reliability
growth will ever be achieved.
Prediction depends on the use of statistical testing to
measure the reliability of a system version.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 76
Equal-step reliability growth
Reliability
(ROCOF)
t1
t2
t3
t4
t5
Time
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 77
Random-step reliability growth
Note dif ferent reliability
improvements
Reliability
(ROCOF)
Fault repair adds ne w fault
and decreases reliability
(increases ROCOF)
t1
t2
t3
t4
t5
Time
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 79
Growth model selection




Many different reliability growth models have
been proposed.
There is no universally applicable growth
model.
Reliability should be measured and observed
data should be fitted to several models.
The best-fit model can then be used for
reliability prediction.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 80
Reliability prediction
Reliability
= Measured reliability
Fitted reliability
model curv e
Required
reliability
Estimated
time of reliability
achievement
Modified from Sommerville’s originals
Time
Software Engineering, 7th edition. Chapter 20&24
Slide 81
Topics covered




Reliability validation
Safety assurance
Security assessment
Safety and dependability cases
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 82
Safety assurance

Safety assurance and reliability
measurement are quite different:
•
•
Within the limits of measurement error, you
know whether or not a required level of
reliability has been achieved;
However, quantitative measurement of safety is
impossible. Safety assurance is concerned with
establishing a confidence level in the system.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 83
Safety assurance activities




Safety reviews
Safety arguments
Safety-oriented processes
Run-time checking
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 84
Safety reviews





Review for correct intended function.
Review for maintainable, understandable
structure.
Review to verify algorithm and data structure
design against specification.
Review to check code consistency with
algorithm and data structure design.
Review adequacy of system testing.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 86
Safety arguments



Safety arguments are intended to show that the
system cannot reach in unsafe state.
These are weaker than correctness arguments
which must show that the system code conforms to
its specification.
They are generally based on proof by contradiction
•
•

Assume that an unsafe state can be reached;
Show that this is contradicted by the program code.
A graphical model of the safety argument may be
developed.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 88
Construction of a safety argument




Establish the safe exit conditions for a component or
a program.
Starting from the END of the code, work backwards
until you have identified all paths that lead to the exit
of the code.
Assume that the exit condition is false.
Show that, for each path leading to the exit that the
assignments made in that path contradict the
assumption of an unsafe exit from the component.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 89
Insulin delivery code
currentDose = computeInsulin () ;
// Safety check - adjust currentDose if necessary
// if st atement 1
if (previ ousDose == 0)
{
if (currentDose > 16)
currentDose = 16 ;
}
else
if (currentDose > (previ ousDose * 2) )
currentDose = previousDose * 2 ;
// if st atement 2
if ( currentDose < minimumDose )
currentDose = 0 ;
else if ( c urrentDose > ma xDose )
currentDose >
currentDose = maxDose ;
administerInsulin (currentDose) ;
Modified from Sommerville’s originals
maxDose?
Software Engineering, 7th edition. Chapter 20&24
Slide 90
Safety argument model
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 91
Program paths

Neither branch of if-statement 2 is executed
•

then branch of if-statement 2 is executed
•

currentDose = 0.
else branch of if-statement 2 is executed
•

Can only happen if CurrentDose is >= minimumDose and
<= maxDose.
currentDose = maxDose.
In all cases, the post conditions contradict the
unsafe condition that the dose administered is
greater than maxDose.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 92
Safety related process activities





Creation of a hazard logging and monitoring
system.
Appointment of project safety engineers.
Extensive use of safety reviews.
Creation of a safety certification system.
Detailed configuration management (see
Chapter 29).
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 94
Run-time safety checking


During program execution, safety checks
can be incorporated as assertions to check
that the program is executing within a safe
operating ‘envelope’.
Assertions can be included as comments (or
using an assert statement in some
languages). Code can be generated
automatically to check these assertions.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 97
Insulin administration with assertions
static void administerInsulin ( ) throws SafetyException {
int maxIncrements = InsulinPump.maxDose / 8 ;
int increments = InsulinPump.currentDose / 8 ;
// assert currentDose <= InsulinPump .maxDose
if (InsulinPump.currentDose > InsulinPump.maxDose)
throw new S afetyException (Pump.doseHigh);
else
for (int i=1; i<= increments; i++)
{
generateSignal () ;
if (i > maxIncrements)
throw new S afetyException ( Pump.incorrectIncrements);
} // for loop
} //administerInsulin
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 98
Topics covered




Reliability validation
Safety assurance
Security assessment
Safety and dependability cases
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 99
Security assessment



Security assessment has something in common with
safety assessment.
It is intended to demonstrate that the system cannot
enter some state (an unsafe or an insecure state)
rather than to demonstrate that the system can do
something.
However, there are differences
•
•
Safety problems are accidental; security problems are
deliberate;
Security problems are more generic - many systems
suffer from the same problems; Safety problems are
mostly related to the application domain
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 100
Security validation

Experience-based validation
•

Tool-based validation
•

Various security tools such as password checkers are
used to analyse the system in operation.
Tiger teams
•

The system is reviewed and analysed against the types of
attack that are known to the validation team.
A team is established whose goal is to breach the security
of the system by simulating attacks on the system.
Formal verification
•
The system is verified against a formal security
specification.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 101
Topics covered




Reliability validation
Safety assurance
Security assessment
Safety and dependability cases
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 103
Safety and dependability cases


Safety and dependability cases are
structured documents that set out detailed
arguments and evidence that a required
level of safety or dependability has been
achieved.
They are normally required by regulators
before a system can be certified for
operational use.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 104
The system safety case


It is now normal practice for a formal safety case to
be required for all safety-critical computer-based
systems e.g. railway signalling, air traffic control, etc.
A safety case is:
•

A documented body of evidence that provides a
convincing and valid argument that a system is
adequately safe for a given application in a given
environment.
Arguments in a safety or dependability case can be
based on formal proof, design rationale, safety
proofs, etc. Process factors may also be included.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 105
Components of a safety case
Component
Description
System description
An overview of the system and a description of its critical components.
Safety requirements
The safety requirements abstracted from the system requirements
specification.
Hazard and risk
analysis
Documents describing the hazards and risks that have been identified
and the measures taken to reduce risk.
Design analysis
A set of structured arguments that justify why the design is safe.
Verification and
validation
A description of the V & V procedures used and, where appropriate, the
test plans for the system. Results of system V &V.
Review reports
Records of all design and safety reviews.
Team competences
Evidence of the competence of all of the team involved in safety-related
systems development and validation.
Process QA
Records of the quality assurance processes carried out during system
development.
Change
management
processes
Records of all changes proposed, actions taken and, where
appropriate, justification of the safety of these changes.
Associated safety
cases
References to other safety cases that may impact on this safety case.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 106
Argument structure
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 107
Insulin pump argument
Claim:
Evidence:
Evidence:
Evidence:
Argument:
The maximum single dose computed by the insulin pump will not exceed maxDose.
Safety argument for insulin pump as shown in Figure 24.7
Test data sets for insulin pump
Static analysis report for insulin pump software
The safety argument presented shows that the maximum dose of insulin that can be
computed is equal to maxDose.
In 400 tests, the value of D ose was correctly comp uted and never exceeded maxDose.
The static analysis of the control software revealed no anomalies.
Overall, it is reasonable to assume that the claim is justified.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 108
Claim hierarchy
The insulin pump
will not deliver a
single dose of insulin
that is unsafe
The maximum single
dose computed by
the pump software
will not exceed
maxDose
In normal
operation, the
maximum dose
computed will not
exceed maxDose
Modified from Sommerville’s originals
maxDose is set up
correctly when the
pump is configured
maxDose is a safe
dose for the user of
the insulin pump
If the software fails,
the maximum dose
computed will not
exceed maxDose
Software Engineering, 7th edition. Chapter 20&24
Slide 109
Key points (Chapter 20)


Approaches to developing dependable systems:
fault avoidance, fault detection and fault tolerance.
Fault avoidance measures include
•
•


The use of a well-defined repeatable process.
Dependable programming practices
The four aspects of program fault tolerance are:
failure detection, damage assessment, fault
recovery and fault repair.
Fault tolerant architectures employ redundancy and
diversity.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 110
Key points (Chapter 24)



Reliability measurement and prediction relies on
exercising the system using an operational profile
and modelling how the reliability of a software
system improves as it is tested and faults are
removed.
Safety arguments or proofs are a way of
demonstrating that a hazardous condition can never
occur.
Security validation may involve experience-based
analysis, tool-based analysis or the use of ‘tiger
teams’ to attack the system.
Modified from Sommerville’s originals
Software Engineering, 7th edition. Chapter 20&24
Slide 111