Improving Cluster Selection Techniques of Regression

Download Report

Transcript Improving Cluster Selection Techniques of Regression

Improving Cluster Selection
Techniques of Regression Testing
by Slice Filtering
Yongwei Duan, Zhenyu Chen, Zhihong Zhao, Ju Qian and
Zhongjun Yang
Software Institute, Nanjing University, Nanjing, China
http://software.nju.edu.cn/zychen
1
Outline
•
•
•
•
Introduction
Our Approach
Experiment and Evaluation
Future Work
2
Introduction
• Test selection techniques
• Cluster selection techniques
• Problems
3
Test selection techniques
• Rerunning all of the existing test
cases is costly in regression testing
• Test selection techniques : choose a
subset of test cases to rerun
4
Cluster Selection
Run Test Cases
Collection
Execution Profiles
(Basic block level)
Clustering
Clusters of
Test Cases
Sampling
A reduced test suite
Cluster selection overview
5
Problems
• Too much data to cluster
– Huge amount of execution traces
– Always a high dimension
Just focus on the code fragments that are
actually relevant to the program
modification!!!
6
Our approach
•
•
•
•
Overview
Slice filtering
Clustering analysis
Sampling
7
Our approach
• Overview
Running test
cases
Execution
traces
traces
Trace filtering
Cluster analysis
clusters
Reduced test
suite
sampling
8
Slice filtering
• The execution traces are too detailed to
be used in clustering analysis
• We use program slice to filter out
fragments that are irrelevant to
program modification.
9
Slice filtering cont’d
• Statement 2 is changed
from ‘if(m<n)’ to
‘if(m<=n)’
• We compute a program
slice with respect to
statement 2 and
intersect it with each
execution trace.
• Given 3 test cases, we
compare their
execution traces and
filtered execution
traces.
if(m<=n){
10
Slice filtering cont’d
Test
cases
t1
Input
m
n
1
0
t2
-1
0
t3
-1
1
Execution trace
(Statement no.)
1,2,4,5,6,7,8,9,10,
11,12,13,14
1,2,3,5,6,7,8,9,10,
11,12,13,14
1,2,3,5,6,7,8,9
Statement no. by
filtering
2,4,5,6,7,8
2,3,5,6,7,8
2,3,5,6,7,8
• Execution traces are much smaller after program slice
filtering.
• Traces of t2 and t3 are the same by filtering while the
difference between t1 and t2 is magnified.
• To condense the traces further, adjacent statements within a
basic block is combined into one statement.
• Patterns are easy to reveal with simple execution traces.
11
Slice filtering cont’d
• But the amount of test cases is still
large.
• If a trace is too small (below a
threshold) after intersection with the
program slice, it is unlikely to be a faultrevealing test case, so we remove it
from the test suite.
12
Slice filtering cont’d
• Filtering rate
– We define filtering rate FR as: if the
threshold is M and the size of the program
slice is N, then the filtering rate FR = M / N
* 100%.
– When FR gets lower, the effect of filtering
diminishes i.e. fewer features can be
eliminated.
13
Slice filtering cont’d
• Why not just use Dynamic slicing
– The computing of dynamic slicing is
complex and time consuming
– Effective dynamic slicing tools are hard to
come by
14
Clustering analysis
•Distance measure
– For a filtered trace fi = <ai1,ai2,…,ain>,
where aij is the execution count of a
basic block. The distance between two
filtered trace fi and fj is:
D( f i , f j ) 

m
k 1
(aik  a jk )
2
15
Sampling
•We use adaptive sampling in our
approach
– We first sample a certain number of test
cases. If a test case is fault-revealing,
the entire cluster from which the test
cases are sampled is selected. This
strategy favors small clusters and has
high probability to select fault-revealing
test cases.
16
Experiment & Evaluation
• Subject program
– space, from SIR(Software-artifact
Infrastructure Repository )
– 5902 LOC
– 1533 basic-blocks
– 38 modified versions (a real fault is
augmented for each version )
– 13585 test cases
17
Experiment & Evaluation
•
•
•
•
Subject program
Measurements
Experimental results
Observations
18
Experiment & Evaluation
• 3 measurements
– Precision
– Reduction
– Recall
19
Experiment & Evaluation
• Precision
– if in a certain run the technique selects a
subset of N test cases, in which M test
cases are fault-revealing. The precision of
the technique is: M / N * 100%.
– Precision measures the extent to which a
selection method omits non-faultrevealing test cases in a run
20
Experiment & Evaluation
• Reduction
– if a selection technique selects M test cases
out of all N existing test cases in a certain
run, the reduction of the technique is: M /
N * 100%.
– Reduction measures the extent to which a
technique can reduce the size of the
original test suite.
– A low reduction means a selection
technique greatly reduce the original test
suite.
21
Experiment & Evaluation
• Recall
– if a selection technique selects M faultrevealing test cases out of N existing faultrevealing test cases in a certain run, the
recall of the technique is: M / N * 100%.
– Recall measures the extent to which a
selection technique can include faultrevealing test cases.
– Recall indicates the fault detecting
capability of a technique. A safe selection
technique achieves 100% recall.
22
Experiment & Evaluation
• Experimental results
– A comparison between our approach and
Dejavu. Dejavu is known as an effective
algorithm in its high precision of test
selection.
– A comparison between 2 different filtering
rate: FR = 0.3 and FR = 0.5
23
Experiment & Evaluation
Comparison of precision between our approach when FR=0.3 and Dejavu
24
24
Experiment & Evaluation
Comparison of reduction between our approach when FR=0.3 and Dejavu
25
25
Experiment & Evaluation
We achieve certain improvement except version 13, 25, 26, 35, 37, 38.
Comparison of recall between our approach when FR=0.3 and Dejavu
26
26
Experiment & Evaluation
• Analysis
– The key to our approach is to isolate the faultrevealing test cases into small clusters
– Failures detected on version 13, 25, 26, 35, 37,
38 are mostly memory access violation failures.
Those failures cause premature termination of
the execution flows.
– Program slicing cannot predict runtime
execution flow changes and therefore cannot
provide enough information to differentiate
these test cases and lump them into different
clusters.
27
Experiment & Evaluation
Comparison of precision between FR=0.3 and FR=0.5
28
28
Experiment & Evaluation
Comparison of reduction between FR=0.3 and FR=0.5
29
29
Experiment & Evaluation
If we raise FR to 0.5, certain improvement on precision, reduction and recall can
be achieved
Comparison of recall between FR=0.3 and FR=0.5
30
30
Experiment & Evaluation
• Observations
– for most versions, our approach has
higher precision and lower reduction
(lower is better) than Dejavu. It means
that we can select fault-revealing test
cases from the original test suite and
select relatively few non-fault-revealing
test cases
31
Experiment & Evaluation
• Observations
– the effectiveness of our approach
depends largely on the level of isolations
of fault-revealing test cases. By choosing
appropriate parameters such as filtering
rate, sampling rate, initial cluster number
etc., we can enhance the level of
isolation.
32
Future work
• We will try to answer the following
questions in our future work
– How do distance metrics and cluster
algorithms affect the result of cluster
selection techniques?
– Given a program, how to find the best
filtering rate and other parameters?
33
Q&A
34