Experimental Evaluation of SDL and One-Op Mutation for C Marcio E. Delamaro, Lin Deng, Vinicius H.

Download Report

Transcript Experimental Evaluation of SDL and One-Op Mutation for C Marcio E. Delamaro, Lin Deng, Vinicius H.

Experimental Evaluation of SDL
and One-Op Mutation for C
Marcio E. Delamaro, Lin Deng, Vinicius H. S.
Durelli, Nan Li, and Jeff Offutt
Universidade de São Paulo & George Mason University
Brazil
&
USA
www.cs.gmu.edu/~offutt/
[email protected]
of 19
Mutation Analysis
Mutation Analysis—A Generic View
We perform mutation analysis when we …
… use well defined rules
… defined on syntactic descriptions
… to make systematic changes
… to the syntax or to objects developed from the syntax
Mutation Analysis—For Programs
… use mutation operators
… defined on language syntax
… to create mutated versions
… of programs or program components
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
2 of 19
Mutation Testing
Mutation Testing
We use mutation analysis for testing to :
1) Help testers design high quality tests
or …
2) Evaluate the quality of existing tests
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
3 of 19
Why Mutation Works
Fundamental Premise of Mutation Testing
If the software contains a fault, there will
usually be a set of mutants that can only be
killed by a test case that also detects that fault
• This is not an absolute !
• The mutants guide the tester to an effective set of tests
• A very challenging problem :
– Find a fault and a set of mutation-adequate tests that do not find
the fault
• Of course, this depends on the mutation operators …
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
4 of 19
Benefits & Cost
• Numerous studies have found mutation-adequate tests to
be very effective at finding faults in programs
• These studies have also shown mutation is relatively costly
1. Lots of test requirements (mutants)
2. Lots of tests are needed
3. Lots of equivalent mutants to detect
• Three general strategies for reducing costs
1. Do-fewer : Fewer mutants
2. Do-smarter : Techniques to reduce the amount of work
3. Do-faster : Algorithms and tools to speed up process
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
5 of 19
Do-Fewer Mutants
M1
M2
M3
M4
M5
M6
M7
M8
M9
M10
…
Sampling : Use X%
of the mutants
M11
M12
M13
M13’
M14
M14’
M15
Selective : Use
certain mutation
operators
Refining : Redefine
mutation operators
to create fewer
mutants
Untch, ACM-SE 2009
Deng et al., ICST 2013
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
SDL : Use one
operator—delete
each statement
6 of 19
Two Studies
1.
Evaluate statement deletion (SDL) in C
Compare with previous results in Java
2.
Generalize statement deletion to
one-op mutation
Try each operator by itself
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
7 of 19
Statement Deletion in C
• Systematically remove each statement as well as all inner
statements
• Ensures that each statement affects behavior
– This is far more than just statement coverage
void test ()
{
int a, b, c, t, i;
if (a == 0)
{
b = 3;
}
for (i = 0; i < 5; i++)
t = t + b + c;
}
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
M1
M2
M3
M4
M5
M6
8 of 19
Advantages of SDL
1. Few mutants : O(LOC)
2. Every program generates SDL mutants
3. Relatively few equivalent mutants
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
9 of 19
Evaluating 1-Op Mutation
• Tool : Proteum—75 mutation operators
• Subjects (P) : 39 C programs
• Mutants (MS) : All mutants for all programs (63,275
mutants)
• Test Universes (TS) : Created a “universe” of tests for each
subject—each test set killed all mutants for its program
• Equivalent Mutants (ES) : Identified by hand while creating
tests (7518 equivalent mutants)
• Test Subsets (TSop) : For each program, found a subset of
tests that killed all mutants from each mutation operator
– 10 sets of tests for each operator, averaged
• Experimental procedure …
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
10 of 19
Experimental Procedure
P
M
Add tests until
MS = 100
T
Mop1
P
Mop2
Mop75
TTop1
Top2
op10
TTop2
op1
T
op10
Top75
TTop1
M
M
MS of
tests for
each op
on all
mutants
M
op10
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
11 of 19
SDL Results
Mutants
Equivalent
Tests
SSDL
Mutants
All
Mutants
2062
176 (8.54%)
203
63,275
7518 (11.88%)
815
: 96%
Total mutants killed by SSDL tests
Mean of all mutants killed over the 39 programs : 92%
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
12 of 19
One-Op Results
• 52 Proteum operators (out of 75) generated mutants for at
least one program
• We computed the mutation score for each operator
– Averaged over 10 test sets, all drawn from “universe”
– Averaged over all 39 software subjects
CRCR,Vsrr
Low
Mutation score
Relative test set size
% of all mutants
% equivalent
High
SSDL
26.00%
2.04%
0.04%
96%
53%
14%
92.0%
28.8%
3.8%
0.00%
80%
5.1%
How to balance effectiveness (mutation score)
with cost (tests, mutants, equivalent mutants) ?
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
VDTR
OCOR, SGLR
OIPM
OAEA, OASA,
OBNG, OIPM,
SBRn, SCRB
13 of 19
Cost vs. Effectiveness
• Evaluating mutation operators balances two things :
– Benefit : Mutation score
– Cost : Number of tests, equivalent mutants, and mutants
– Number of mutants affects machine cost, which is minor
• We first create a cost function :
Cost (op) = (%NormalizedTests (op) * Wt ) +
(%NormalizedEquivMutants (op) * We )
– Normalization function is given in the paper
– Wt and We are weights based on the relative costs of creating
tests and analyzing mutants for equivalence
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
14 of 19
Cost vs. Effectiveness (2)
• We next create a cost-effectiveness measure
• Mutation Operator Cost Effectiveness Analysis (MOCEA)
MOCEA (op) = Cost (op)
MS (op)
• Note that low values are more cost-effective
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
15 of 19
Cost-Effectiveness Scores
Assumes tests
Assumes
& tests
Assumes tests are free
detecting equivalent
are cheaper
mutants cost
the samewith different weights
Cost-effectiveness
scores
Operator
CRCR
Ccsr
SSDL
ORBN
Wt = 1 Wt = .5 Wt = 0 Wt = 1 Wt = 1
We = 1 We = 1 We = 1 We = .5 We = 0
.73
.38
.03
.72
.70
.66
.35
.03
.65
.64
.57
.69
.31
.44
.06
.18
.54
.59
CRCR : Required Constant Replacement
Ccsr : Constant for Scalar Replacement
SSDL : Statement Deletion
ORBN : Relational Operator by Bitwise Operator
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
.51
.50
No ORBN mutants
in many programs
16 of 19
Contributions & Conclusions
Generalized SDL-mutation to one-op mutation
Evaluated SDL-mutation in C
• 39 C programs
• Mutation score of SDL-adequate tests on all mutants
Evaluated one-op mutation in C
• 75 Proteum mutation operators
• Mutation scores of OP-adequate tests on all mutants
Introduced a cost-effectiveness measure
• Tests, equivalent mutants, mutation score
SDL best if generating tests and detecting
equivalent mutants cost the same
If tests are free, CRCR or Ccsr is best
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
17 of 19
Future Directions
Two-op mutation ?
Cost effectiveness formula should include
whether mutants appear in all programs
What is the true minimum number of mutants
needed to achieve near 100% mutation score ?
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
18 of 19
Contacts
Marcio Delamaro
[email protected]
http://www.icmc.usp.br/pessoas/delamaro/
Jeff Offutt
[email protected]
http://cs.gmu.edu/~offutt/
ICST 2014
© Delamaro, Deng, Durelli, Li, & Offutt
19 of 19