SENG 521 - Nanjing University

Download Report

Transcript SENG 521 - Nanjing University

SENG 521
Software Reliability &
Testing
Fault Tolerant Software Systems:
Techniques (Part 4b)
Department of Electrical & Computer Engineering, University of Calgary
B.H. Far
([email protected])
http://www.enel.ucalgary.ca/~far/Lectures/SENG521/04b/
SENG521 (Fall 2002)
[email protected]
1
Fault Tolerance (Review)



A fault-tolerant computing system must be capable
of providing specified services in the presence of a
bounded number of failures.
These failures could occur because of faults present
in either the components of the system or in the
system’s design.
Most of the software faults are due to deficiencies
of design and almost all of the hardware fault
tolerance techniques cannot be applied in software.
SENG521 (Fall 2002)
[email protected]
2
Acceptance Testing


A program-specific error detection mechanism to
check on the results of program execution.
Usually evaluates to either “true” or “false”.
ensure<acceptance test>by P0 else-by P1 else fail

Examples:


Checksums for program parts
Internal check points:
ABS[(SQRT(x)*SQRT(x)) – x] < E
SENG521 (Fall 2002)
[email protected]
3
External Consistency


A kind of external error detection mechanism
to judge correctness of execution of a
program.
Examples:




Exception signal when dividing by zero
Integer overflow signal
Interrupt signal for program loop
Float point numerical failure check
SENG521 (Fall 2002)
[email protected]
4
Example
6
x  y   xi  yi
i 1

y  10

, 2111,10 
x  1020 ,1223,1024 ,1018 , 3 ,  1021
30


, 2 ,  1026 ,1022
19
The correct answer is 8779.
But ordinary implementation of this will return zero
due to rounding and large differences in the order of
magnitude of the summands.
SENG521 (Fall 2002)
[email protected]
5
Redundancy

Dual software technique:


Implementing two (or more) distinct versions of
the same software and executing them for the
same set of inputs. Any discrepancy in the
outputs of the two versions may trigger an alarm.
Redundancy techniques’ efficiency depends
on coincident, correlated and dependent
faults.
SENG521 (Fall 2002)
[email protected]
6
Coincident Faults


Coincident Faults: when two or more
functionally equivalent software components
fail on the same input.
When two or more software versions give the
same incorrect response, an identical-andwrong (IAW) answer is obtained.
SENG521 (Fall 2002)
[email protected]
7
Correlated & Dependent Faults


Correlated Faults: Two faults are correlated
when the measured probability of the
coincidence failures is significantly higher than
what would be expected from coincidence.
If
Pi _ fails| j _ fails  Pi _ fails
There will be no failure independence.
SENG521 (Fall 2002)
[email protected]
8
Possible Failure Scenario

What if the software
components produce
doublet or triplet IAW
responses?
P1
P2
P3
Adjudication Algorithm
SENG521 (Fall 2002)
Doublet & triplet
IAW faults
[email protected]
9
Adjudication by Voting


A “voter” compares results from two or more
functionally equivalent software components
and decides which of the answers provided
by those components is correct.
Various versions of voting algorithm:



Majority voting
2-of-N voting
Consensus voting
SENG521 (Fall 2002)
[email protected]
10
Techniques





Recovery blocks
N-version programming
Consensus recovery block
Acceptance voting
N self-checking programming
SENG521 (Fall 2002)
[email protected]
11
Recovery Blocks (RB)


Using multiple
versions of software
module and
acceptance test.
The output of the 1st
module is tested for
acceptability and if
fails, the 2nd module
is executed after
backward state
recovery.
SENG521 (Fall 2002)
[email protected]
12
N-Version Programming




Parallel execution of N
independently developed
functionally equivalent
modules.
Adjudication is via voting.
The voter accepts all N
outputs and selects the
correct one among them.
Advantage of NVP: no
service interrupt.
SENG521 (Fall 2002)
[email protected]
13
Consensus Recovery Block


Composed of NVP
and RB.
IF NVP fails, the
system reverts to RB
using the same
blocks.
input
NVP
success
Correct
output
failure
RB
Correct
output
System
failure
SENG521 (Fall 2002)
[email protected]
14
Acceptance Voting



Like NVP all
versions are executed
in parallel.
The output of ach
module goes to an
acceptance test.
If acceptance test is
successful, the output
goes to a voter.
SENG521 (Fall 2002)
[email protected]
15
N Self-Check Programming


In N Self-Check
Programming (NSCP),
N modules are
executed in pairs.
The pairs’ outputs can
be compared or
accessed for
correctness.
SENG521 (Fall 2002)
[email protected]
16