Feedback directed random test generation

Transcript Feedback directed random test generation

Feed Back Directed Random Test
Generation
Carlos Pacheco1, Shuvendu K. Lahiri2, Michael D. Ernst1, and
Thomas Ball2
1MIT CSAIL, 2Microsoft Research
Presented By :Aliasgar Kagalwala
Outline:
1. Concept
2. Technique
3. Evaluation
4. Related work
5. Conclusion
Concept
• This paper incorporates technique that improves random test
generation by incorporating feedbacks obtained from
executing test inputs.
• The feedback obtained from executing sequence that guide
the search towards sequences that yield new and legal object
states.
• In puts creating redundant states are never extended they
prune the search space.
• This input is checked against set of contracts and filters
Concept…
• The work addresses random generation of unit tests for object-oriented
programs. Implemented the technique in (RANDOOP) .
• RANDOOP is fully automatic requires no input from the user except name
of binary for .NET or class directory for the Java.
• RANDOOP has found serious errors in widely deployed commercial
software.
Test Case:
Test case for java.util
public static void test1() {
LinkedList l1 = new LinkedList();
Object o1 = new Object();
l1.addFirst(o1);
TreeSet t1 = new TreeSet(l1);
Set s1 = Collections.unmodifiableSet(t1);
// This assertion fails
Assert.assertTrue(s1.equals(s1));
• Test case shows violation of equals contract.
• The set s1 returned by unmodifiable Set(Set) returns false for
s1.equals(s1) this violates reflexivity of equals as specified in Suns API doc.
• The other error is in TreeSet(Collection) contsructor which failed to throw
ClassCastException as required by specification
An obj oriented unit test
consist of sequence of method
calls that set up a state (such
as creating and mutating obj)
Extending Sequence:
m is a method with formal
parameters of type T1,..Tk
Seqs is the list of sequence
Vals is a list of values .either
its primitive or return value
s.i of the ith method call.
Feedback directed
algorithm for sequence.
It builds seq incrementally
It has four inputs.
Selecting public method of
classes for seq creation.
randomSeqsAndVals()
builds list of seq and values
newSeq is result of
applying extension
operator
execute method executes
each method call in seq
and checks contract .
List of default contracts
checked by RANDOOP.
RANDOOP outputs two
input sets nonErrorSeqs
and ErrorSeqs as Junit
/Nunit test along with
assertion representing the
contract checked.
Filtering
• Determines which values of a sequence are extensible and should be used
as the input to next method call.
• As a result of applying filter to a sequence s the filter may set some s.i
extensible flags to false, so that this value will not be used as input to a
new method call.
• There are basically three filters that RANDOOP uses by default:
• EQUALITY,NULL,EXCEPTION.
• Equality:
• This filter uses equals() method to determine if the resulting obj has been
created.
• The filter maintains a set allobjs of all extensible objects that have been
created by algorithm across all sequence of execution.
• This heuristic prunes any object with the same abstract value as a
previously created value even if their concrete representation differ.
• This might cause RANDOOP to miss an error if method calls on them
behave differently.
• NULL:
• Null dereference exception occur in absence of null value in i/p it signifies
some internal problem with the method.
• Null arguments are hard to detect statically because arguments in the
sequence themselves are output of other sequence.
• Instead null filter checks the values computed by execution of specific
sequence.
• Exception:
Exception frequently correspond to pre condition of violation for a method.
Extension of the sequence would lead to exception before the execution
completes.
Repetition.
Repeated calls to add may be necessary to reach the code that increases the
capacity of container object.
Or
Repeated calls may be required to create two equivalent objects that can cause a
method like equals to go down certain branches.
•
•
Thus repetition is build in the generator. As follows
When generating a new sequence, with probabilityN, instead of appending a
single call of a chosen method m to create a new sequence, the generator
appends M calls, where M is chosen uniformly at random between
0 and some upper limit max. (max andN are user-settable;
the default values are max = 100 and N = 0:1.)
Evaluation.
Evaluation…
Container class have been used to evaluate input generation technique.
Four container classes: a binary tree (BinTree,154 LOC), a
binomialheap(BHeap, 355 LOC), a fibonacci heap (FibHeap, 286 LOC), and
a red-black tree (TreeMap,
580 LOC).
They compared the coverage achieved by six techniques.
(1) model checking,
(2) model checking with state matching,
(3) model checking with abstract state matching,
(4) symbolic execution
(5) symbolic execution with abstract state matching,
(6) undirected random generation.
Evaluation…
• For each < technique, container > pair we report the
maximum coverage achieved , and the time when maximum
coverage was reached as shown by experiment.
Checking API Contract:
Checking API Contract.
• In this experiment, they used feedback-directed random generation,
undirected random generation, and systematic generation to create test
suites for 14 widely-used libraries comprising a total of 780KLOC.
• To reduce the amount of test cases they had to inspect, they implemented
a test runner called REDUCE.
• REDUCE only shows subset of failing test.
• REDUCE partitions the failing test into equivalence classes, two test fall
into the same class if their execution leads to contract violation.
• They ran RANDOOP on a library, specifying all the public classes as targets
for testing. Using RANDOOP's default parameters The output of this test
suite was.??
Test cases generated. The size of the test suite (number
of unit tests) output by RANDOOP.
Violation-inducing test cases. The number of violationinducing
test cases output by RANDOOP.
REDUCE reported test cases. The number of violation inducing
test cases reported by REDUCE
Errors. The number of distinct errors uncovered by the
error-revealing test cases. We count two errors as distinct
if fixing them would involve modifying different source
code.
Errors per KLOC. The number of distinct errors divided
by the KLOC count for the library.
Feedback-directed random
generation
• Errors Discovered were
RANDOOP
created a total of 4200 distinct violation-inducing test cases.
Of those, REDUCE reported approximately 10% .Out of the 424
tests that REDUCE reported, 254 were error-revealing. The
other 170 were illegal uses of the libraries
Undirected Random Testing
• RANDOOP was tested using the same parameters, but disabling the user
of filters or contracts to guide generation.
• The result obtained was.
Undirected generation did not find any errors in java.util or javax.xml, and
was unable to create the sequence that uncovered the infinite loop in
System.Xml.
Regression and compliance testing
•
feedback-directed random testing to find inconsistencies between
different implementations of the same API.
• RANDOOP guesses observer methods using a simple strategy: a method is
an observer if all of the following hold:
• (i) it has no parameters,
• (ii) it is public and non-static
• (iii) it returns values of primitive type (or String), and
• (iv) its name is size,
• count, length, toString, or begins with get or is.
Related Work:
• Automatic test generation is active research area we focus on
input generation technique that create method sequence.
Related Work…
• Random Testing:
• JCrasher : Creates test inputs using a parameter graph whose values can
serve a as an input parameters. RANDOOP uses component set of
previously created sequence.
• Feedback directed test generation was introduced by the Eclat tool.
Eclat's performance is sensitive to the quality of the sample execution
given as an input to the tool. Since RANDOOP does not require a sample
execution, it is not sensitive to this parameter.
• An experimental comparison of Eclat and RANDOOP is an interesting
avenue for the future work.
Related Work…
Systematic Testing:
• Bounded exhaustive generation has been implemented in Rostra and JPF
and RANDOOP with some differences.
• An alternative to bounded exhaustive approach is symbolic execution
implemented in Symtra etc.
• Check n Crash creates abstract constraints over input that cause
exceptional behavior and uses constraint solver to derive test inputs.
• Combining Random and Systematic Testing.
• DART a symbolic execution approach that integrates random input
generation.
• RANDOOP is closer to random-systematic spectrum though it is random
input generator it uses systemization to be more effective.
Conclusion
• Feedback directed random testing scales to a large
extend and finds errors quickly to heavily tested
application.
• Combining random testing and systematic testing
gives advantage of both.
• Notion of exploration using a component set or state
matching when there are many object can be
translated into exhaustive test domain.
References:
• C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random
test generation. In Proc. 29th ACM/IEEE International Conference on
Software Engineering (ICSE), pages 75{84. IEEE,May 2007.
• W. Visser, C. S. Pasareanu, and R. Pel´anek. Test input generation for Java
containers using state matching. In ISSTA, pages 37.48, July 2006.
• C. Pacheco and M. D. Ernst. Eclat: Automatic generation and classication of
test inputs. In ECOOP, pages 504.527, July 2005.
• T. Xie, D. Marinov, and D. Notkin. Rostra: A framework fordetecting
redundant object-oriented unit tests. In ASE, pages 196.205, Nov. 2004.
• T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra:A framework for
generating object-oriented unit tests using symbolic execution. In TACAS,
pages 365.381, Apr. 2005.
• Question????
• THANK YOU..