No Slide Title

Download Report

Transcript No Slide Title

Object-Oriented
Software Testing
Fundamentals
© D. Deugo 2003 – 2008
4004 L2 -
1
An Initial Challenge
© D. Deugo 2003 – 2008
4004 L2 -
2
Myers’s Famous Problem
• Read 3 integers taken to represent the lengths of the sides
of a triangle. Decide if triangle is isosceles, equilateral or
scalene.
• Math: A valid triangle must meet 2 conditions. No side may
have a length of zero, and each side must be longer than the
sum of all sides divided by 2. If s is this sum:
– s = ( a + b + c) / 2
– then s > a, s > b, and s > c must hold
– if a == b == c then equilateral, if 2 sides are equal then isosceles, else
scalene
• Experienced programmers find 7.8 test cases on average
• Myers suggests 14, Binder: 65 wrt a Java implementation:
– Figure 1.1 p.4: class hierarchy
– Figure 1.3 p.6: Java interface
– Table 1.1:
» permutations, invalid and boundary inputs are important
» one must exercise all ways of violating a condition
– Tables 1.2 and 1.3: code (esp. drawing and inheritance) considerations
© D. Deugo 2003 – 2008
4004 L2 -
3
On Quality Attributes
© D. Deugo 2003 – 2008
4004 L2 -
4
Lessons from Lecture 1
• Quality is frequently downplayed in the software
development industry
– despite the appearance of concern through standards such as
ISO 9000…
– just look at lack of concern of design books, and list of known
bugs in commercial software...
• Quality can be approached from different viewpoints:
– we will emphasize product quality engineering, and downplay
process issues and measurement.
» but Myers’s question remains: how to derive test cases?
– it is widely accepted that quality must be customer-oriented
» Eric Yu discusses how to capture the goals of a customer:
more on this later!
• Code cannot be tested on its own: you test against some
specification
– but, in reality, specs and models are seldom kept in sync with
code… traceability is mostly inexistent...
– and we must test as early and frequently as possible!
© D. Deugo 2003 – 2008
4004 L2 -
5
In the Eyes of the Beholder
• A product has several stakeholders, each with their view of
quality and its attributes:
–
–
–
–
–
–
–
–
–
–
end-users: functional requirements, usability, reliability, etc.
operators: ease of integration with other systems, etc.
administrators: ease of configuration
purchasers: cost, added-value, return on investment…
sales people: needs assessment, ease of contract creation
architects: scalable and understandable modeling to be used by
designers, traceability to requirements, reusability (libraries and
frameworks)
designers: similar to architects’ viewpoint but with a concern for
the implementation
implementers: completeness, consistency and correctness of the
design, availability of languages and tools (CASE, debugging, etc.)
testers: testability (e.g., controllability and observability), and
possibly integrability
managers: ease of planning and tracking, confidence assessment,
etc.
© D. Deugo 2003 – 2008
4004 L2 -
6
Some Quality Attributes
• Correctness (a.k.a validity): extent to which specifications are satisfied and
user’s goals fulfilled
• Reliability: extent to which repeated correct behavior is obtained
• Robustness: extent to which correct behavior is obtained in any context
• Responsiveness: extent to which timing requirements for responses are
satisfied
• Efficiency: extent to which use of computing resources is acceptable (e.g.,
memory leaks, complexity of algorithms wrt time and space)
• Integrity (a.k.a security): extent to which invalid access is prevented
• Usability: extent to which learning and interpretative times are minimized
• Maintainability: extent to which correction times can be minimized
• Testability: extent to which quality can be assessed
• Flexibility: extent to which it is easy to modify the system
• Portability: extent to which the system can be ported to another
environment
• Reusability: extent to which a subset of the system can be reused
• Interoperability: extent to which the system can interact with others
© D. Deugo 2003 – 2008
4004 L2 -
7
Before Going Any Further
• One must keep in mind that testing is an endless
task…
– Prioritization with respect to stakeholder ’s goals and quality
attributes is crucial
• If models, code and tests are to be traced back to
stakeholders ’s goals and quality attributes, then
some satisfaction criterion must be associated
with each of these goals and attributes!!!
– Prioritization is not enough: we must also have some approach
that allows us to verify that the goals of stakeholders are met.
– It ’s one thing to measure, it ’s another to know what an actual
measurement tells us or does not tell us about quality…
» beware of metrics defined in vacuum...
© D. Deugo 2003 – 2008
4004 L2 -
8
Typical Metrics
• For reliability: serious failure rate per year
• For maintainability:% of inserted fault detected
• For responsiveness: % of responses that satisfy
the constraints
• For capacity: call capacity (actual vs.. targeted)
– definition: maximum load that can be processed while all
performance parameters are simultaneously met
• For stress tolerance: stress capacity vs.. normal
capacity
– definition: extent to which critical functions are supported
when capacity is exceeded
• For fault tolerance: % of recovered faults
– definition: extent to which the product can recover from
failures
© D. Deugo 2003 – 2008
4004 L2 -
9
Some Definitions
© D. Deugo 2003 – 2008
4004 L2 - 10
About Testing
• Recall testing is only one aspect of SQE.
• IEEE Spectrum (1992):
– an error results in a fault in the software, which can lead to a
defect in the product, which can result in a failure of the function
• Testing is specifically concerned with code:
– Failure: observation of incorrect system behavior
» failure ‘intensity’ (or density) should decrease rapidly over the
duration of each iteration
– Faults or defects: root cause for a failure
» often called a bug
» reliability and robustness may be assessed by tracking a
metric such as “number of defects per million lines of code”
» what constitutes a defect versus a set of defects is not clear!
» Defect elimination does NOT guarantee quality:
• defect/MLOCS is not a customer-oriented metric
• number of hours of failure-free behavior is more customer-oriented
• Binder’s glossary is VERY impressive!
© D. Deugo 2003 – 2008
4004 L2 - 11
Some Definitions for Testing
– Faults and defects are detected by a successful test case
– Testing is about detecting defects
» you cannot prove the absence of faults…
– A test specifies a fault or set of faults to detect
– A test case is a specific setup-execute-report-teardown context
associated to a test:
» a test is typically associated with several test cases
» tests and test cases can be organized into test suites
– A test driver is software to run test cases
– A test manager is software to track the success of test cases
– OA & M: operations, administration and maintenance
» a phase typically at the end of release cycles
» may involve tracking a set of metric for each quality attribute
© D. Deugo 2003 – 2008
4004 L2 - 12
Validation & Verification
• From Probert:
– an activity is said to be a validation activity if it involves the
construction of part of the binary relation conforms <actual
behaviors, worthwhile behaviors>
– validation: are we building the right things?
» we are establishing whether the actual behaviors we
observe correspond to the ones deem to be of value wrt
goals/reqs/specifications for what is under test
» e.g., black-box behavior of the system, or of a procedure
– an activity is said to be a verification activity if it involves
checking the subset relation <instances of invalid constructs,
component constructs> [component verification] OR <actual
transform sequence, legal sequences of specified transform
rules> [transform verification]
– verification: are we building things right?
» Includes component verification (e.g., model-checking)
and transform verification (e.g., traceability between
scenarios and FSMs)
© D. Deugo 2003 – 2008
4004 L2 - 13
Certification
• From Probert:
– Formal: A certification activity decides (gives a yes/no answer
to) whether an actual measure is at least as great as a preset
metric. Actual measure is denoted the certification measure,
preset metric is denoted the certification objective.
– Informal: A certification activity measures the completeness of
a quality assurance activity or program against stated
requirements.
– Example: extent of code coverage achieved by the execution of
a specific test suite.
© D. Deugo 2003 – 2008
4004 L2 - 14
A Certification Process
Probert ’s SQE process:
1.
2.
3.
4.
5.
6.
set quality objectives
define measurable quality (product & process) metrics
identify process certification points
apply metrics at each point
review exit criterion for the current iteration
assess quality objectives and decide on continuous
improvement activities
What ’s downplayed?
– Step 2… It ’s easier said than done… For example, we would
like to be sure that one iteration is traceable to the previous
one in order to measure the convergence entailed by an
incremental process… But how do we do this?
© D. Deugo 2003 – 2008
4004 L2 - 15
An Overview of
Software Testing
(Binder chapter 3)
© D. Deugo 2003 – 2008
4004 L2 - 16
What is Software Testing?
• Binder (p.41): it is the design of a special kind of
s/w system that is:
– fault-directed: target system is to exercise another s/w system
with the intent of finding bugs
and/or
– conformance-directed: target system is to demonstrate
satisfaction of stakeholders ’s goals
• Combinational Logic (ch. 6) and FSMs (ch. 7)
provide general test models for which systematic
approaches exist for the generation of test suites.
• UML(-RT) provides a modeling language for
application-specific capturing. We will also
consider UCMs.
– Binder has interesting stuff to say about UML in ch.8!
• Figure 3.1 p.43 gives the overall strategy.
© D. Deugo 2003 – 2008
4004 L2 - 17
The Steps of Test Design
For Binder, p. 41, test design involves:
– identifying, modeling and analyzing the responsibilities
of the system under test
– designing tests based on these models
– deriving test cases from these tests
» responsibility-based: expected behavior from models
» implementation-based: expected behavior from code
– adding test cases based on code analysis, suspicions
and heuristics
– developing expected results or some other pass/fail
criterion for each test case
Test design aims for interesting test cases, that is ones that
have a good chance of revealing a failure.
© D. Deugo 2003 – 2008
4004 L2 - 18
The Steps of Test Execution
According to Binder, p. 43:
• Establish that the implementation under test is minimally
operational by exercising the interfaces between its parts.
• Execute the test suite: the result of each test case is
evaluated as pass or no pass
• Use a coverage tool to instrument the implementation under
test. Rerun the test suite and evaluate the reported
coverage.
• If necessary, develop additional tests to exercise uncovered
code.
• Stop testing when the coverage goal is met and all test
cases pass.
Two issues:
– we need to categorize the expected results and failures
– we need to study code coverage later: lines, branches?
© D. Deugo 2003 – 2008
4004 L2 - 19
Classifying Failures
As one executes test cases, one may stumble on,
figure 3.3, p. 49:
• a bug (i.e., a fault)
• an omission: the absence of some required functionality
• a surprise: the execution of behavior that is not required
– e.g., reusing an inherited capability when one should not
Fault-based testing is usually a luxury:
• purposely introduce faults in code (called mutations) to see
if these faults are revealed by the test suite...
Debugging is not part of testing:
• debugging is about finding the cause of a failure...
© D. Deugo 2003 – 2008
4004 L2 - 20
What Can be Achieved?
• Limiting factors, p. 54:
– The size of the input/state space
– The number of possible execution sequences (or paths):
» loops, conditions, and dynamic binding contribute to the
combinatorial explosion of the number of possible paths
– Fault sensitivity:
» does the test suite hide faults?
– Coincidental correctness:
» faulty code can still produce on occasion correct behavior
– Absolute limitations:
» exhaustive testing is intractable
» spurious test may be produced if reqs and goals are incorrect
» the test cases themselves can be incorrect
» without trusted expected results to compare to actual tests, pass. no
pass evaluation is dubious.
© D. Deugo 2003 – 2008
4004 L2 - 21
About Execution Sequences
(from Binder section 3.3.2)
Loop
Header
Consider:
for ( int i = 0; i < n; ++i)
if ( a.get(i) == b.get(i) )
x [ i ] = x [ i ] + 100;
else
x [ i ] = x [ i ] - 2;
}
{
Condition
How many paths if n = 2?
+ 100
line
© D. Deugo 2003 – 2008
-2
line
4004 L2 - 22
About Coincidental Correctness
(from Binder section 3.3.3)
• x + x and x * x both work for x = 2…
• consider
int scale (int j) {
j = j - 1;
//should be j = j + 1;
j = j / 30000;
return j; }
According to Binder:
For j = -30001, -30000, -1, 0, 29999, and 30000 the answer is wrong!
That is 99.9908% of the input space works!
Amusingly enough, Binder is wrong!!!
© D. Deugo 2003 – 2008
4004 L2 - 23
About Coincidental Correctness
• a subclass may override a method and in doing so
introduce a fault in a method it inherits.
public class Account extends Object {
Protected Date lastTxDate, today;
// …
Int quartersSinceLastTx() {
return (90/daysSinceLastTx());
}
public class TimeDepositAccount extends Account
int daysSinceLastTx() {
return (today.day() –
lastTxDay.txDate);
}
Int daysSinceLastTx() {
return (today.day() –
lastTxDay.txDate + 1);
}
}
© D. Deugo 2003 – 2008
4004 L2 - 24
Bugs that Testing Can Catch
• Figure 3.6 p.60: Some faults (bugs) and when they can be
addressed.
© D. Deugo 2003 – 2008
4004 L2 - 25
Fault Models
(Binder chapter 4)
© D. Deugo 2003 – 2008
4004 L2 - 26
Fault Model
• Any rational testing strategy is guided by a fault model
• Answers question: Why do the features called out by a
technique warrant our effort
–
–
–
–
–
Common sense
Experience
Suspicion
Analysis
Experiment
• Identifies relationships and components of the system under
test that are most likely to have faults.
• Software testing strategies are effective to the extent that their
fault models is a good predictor of faults.
© D. Deugo 2003 – 2008
4004 L2 - 27
Fault Models
• Conformance-directed testing
– Conformance to requirements or specifications
– Relies on nonspecific fault model
– Establish a test suite that is sufficiently representative of the
requirements of the system
– Should be fault sufficient (exercise specified features)
• Fault-directed testing
– Seeks to reveal implementation faults
– A specific fault model is required to direct potentially large probing
of the implementation
– Should be fault efficient (high probability of revealing a fault)
© D. Deugo 2003 – 2008
4004 L2 - 28
Bug Hazards of OOP
• New written code is 48.8 times more likely to have a
bug [Basili+96a]
– Verbatim reused code, 0.125 faults per KLOC (thousand lines of
code)
– Code slightly modified, 1.500 faults per KLOC
– Code extensively modified, 4.89 faults per KLOC
– No reused, newly written code, 6.11 faults per KLOC
• On average bug found in every 150 lines of code
[Fiedler 89]
• Classes
– That send more messages to instance variables and message
parameter objects are more likely to be buggy
– That have more superclasses and higher specialization are more
likely to be buggy
© D. Deugo 2003 – 2008
4004 L2 - 29
Bug Hazards of OOP
• Encapsulation
– Obstacle for testing
• Inheritance
–
–
–
–
–
–
Weakens encapsulation, creating global data problems
Overloading, reuse, specialization
Incorrect Initialization and Forgotten Methods
Inheritance structure
Multiple Inheritance
Abstract classes, Interfaces.
© D. Deugo 2003 – 2008
4004 L2 - 30
Bug Hazards of OOP
• Polymorphism
–
–
–
–
–
Dynamically bound messages hard to understand, error-prone
Can’t change polymorphic server without regards to client
Code is deceptively simple, but complex
Can produce strange results class hierarchy not defined well
Messages can be bound to wrong server
• Dynamic Binding
– Many classes may use the same method name creates bugs
– Methods are typically small
• Message Sequence and State
– Cooperative control bugs
– Delocalization bugs
© D. Deugo 2003 – 2008
4004 L2 - 31
Bug Lists/Errors and Failures
•
•
•
•
Errors and Failures: Binder, p.87
Method Scope Fault Taxonomy: Binder, p.88-89
Class Scope Fault Taxonomy: Binder, p.90-91
Cluster/Subsystem Scope Fault Taxonomy: Binder,
p.92
© D. Deugo 2003 – 2008
4004 L2 - 32
An OO Testing Manifesto
• Binder, p.103-107
© D. Deugo 2003 – 2008
4004 L2 - 33
About Test Models
(Binder chapter 5)
© D. Deugo 2003 – 2008
4004 L2 - 34
Model-Based Testing
• Test models must ideally support:
– the systematic enumeration of input and state combinations
– automated, systematic and repeatable generation of tests
• But Beizer reduces models to intuition joggers:
– “ it does not matter that they are imperfect as long as the resulting
tests are good… ”
– even checklists are considered by some to be test models...
• If testing is to proceed from models, then models must be
validated and verified:
– validation: tracing back to stakeholders’ goals
– verification:
» intra-model: syntax and semantics are ok (wrt a meta-model)
» inter-model: a model is consistent with the others of this iteration
• inconsistency allows the derivation of a statement AND of its negation
» inter-iteration: a model is traceable to its previous version
• Verification also requires tracing back the code to the models of
its iteration and to its previous version.
© D. Deugo 2003 – 2008
4004 L2 - 35
Binder ’s Model-Based Testing
Consider Figure 5.1 p.115:
• A meta-model is the definition of a modeling technique:
symbols used in its notation, rules for using these symbols,
concepts associated with the symbols, and composition of
symbols.
• Consistency checking (i.e., model verification) requires a
meta-model and a traceability model. It is not part of
Binder ’s concerns.
• Nor is model validation, which also requires a traceability
model.
• Binder ’s definition of verification is restricted to code and
is not addressed...
– « Verification attempts to show that implementation is correct with
respect to its representation, without executing it. This effort may be
either informal (using a checklist) or formal (constructing a proof).
© D. Deugo 2003 – 2008
4004 L2 - 36
About Cartoon-Based Testing
• Binder (p.116):
– “Most OOA/D methodologies provide a loose graphical syntax
and symbol set. This is accompanied by minimal guidance for
impressionistic rendering of behavior and structure that
happen to come to the designer’s attention. These are
cartoons: they do not demand complete information,
consistent usage […] Cartoons are useful for sketching,
refining and documenting solutions, but they are not testready: they lack content and consistency necessary to produce
executable test cases.”
– Most OOA/D methods and models are ambiguous, fragmentary,
and incomplete: “no explicit definition exists for the necessary
components of a well-formed behavior model”.
– “CASE tools contribute to this problem. Nearly all CASE
implementations of methodologies are incorrect, distorted, and
incomplete.
© D. Deugo 2003 – 2008
4004 L2 - 37
Requirements for a testable model
From Binder p.117:
• It is a complete and accurate reflection of the kind of
implementations to be tested. The model must represent all
features to be exercised.
• It abstracts details that would make the cost of testing prohibitive.
• It preserves detail that is essential for revealing faults and
demonstrate conformance
• It represents all events (of a state model) so that we can generated
these events, typically as messages sent to the IUT.
• It represents all actions (of the state model) so that we can
determine whether a required action has been produced.
• It represents state so that we have an executable means to
determine what state has (or has not) been achieved.
From this perspective, Barber argues for the
usefulness of formal specifications!
© D. Deugo 2003 – 2008
4004 L2 - 38