A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs Sebastian Burckhardt Madanlal Musuvathi Microsoft Research Microsoft Research Pravesh Kothari Santosh Nagarakatte Indian Institute of Technology, Kanpur University of Pennsylvania.

Download Report

Transcript A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs Sebastian Burckhardt Madanlal Musuvathi Microsoft Research Microsoft Research Pravesh Kothari Santosh Nagarakatte Indian Institute of Technology, Kanpur University of Pennsylvania.

A Randomized Scheduler
with Probabilistic Guarantees
of Finding Bugs
Sebastian Burckhardt
Madanlal Musuvathi
Microsoft Research
Microsoft Research
Pravesh Kothari
Santosh Nagarakatte
Indian Institute of Technology, Kanpur
University of Pennsylvania
What is Concurrency Testing?

Whether a test finds a bug depends on
◦ the configuration
◦ the inputs
◦ the schedule

Concurrency bugs are bugs that
surface only for some schedules

The Concurrency Testing Problem
◦ How to cover buggy schedules as best we can?
◦ Testing all schedules is infeasible!
Idea: Randomize the Schedule
Parent
Child
void* p = 0;
RandDelay();
CreateThd(child);
Init();
Start(child);
p
= malloc(…);
CreateThd(child);
RandDelay();
DoMoreWork();
RandDelay();
RandDelay();
p = malloc(…);
p = malloc(…);
DoMoreWork();
p->f
++;
RandDelay();
RandDelay();
p->f ++;
++;
p->f
1. Instrument code
with calls to
insert random
delays
2. If we are lucky,
delay exposes
bugs
3. But: how long to
delay? where not
to delay?
What is a Randomized Algorithm?

A randomized algorithm:
◦ “An algorithm that makes nondeterministic choices”
◦ An algorithm using a random source with a
precisely defined distribution

A probabilistic guarantee:
◦ “A guarantee that doesn’t always hold”
◦ A lower bound on the probability of success
What we did / Talk Outline
1. Define bug depth in such a way that common
bugs have low depth
2. Develop PCT algorithm (probabilistic
concurrency testing), a randomized scheduling
algorithm with a good probabilistic guarantee
to find bugs of low depth
3. Build it into Cuzz, a concurrency fuzzing tool
that improves the efficiency of stress testing
Part I
BUG DEPTH
Bug Depth
Bug Depth = the number of ordering
constraints a schedule has to satisfy to find
the bug.
More constraints means more things have to
go “just right” to find the bug.
Conjecture: many typical bugs have low depth.
Let’s look at 3 examples.
Ordering Violation Example:
A Bug of Depth 1
Bug depth = the number of ordering constraints sufficient to find the bug.
Parent Thread
…
start(child);
p = malloc();
…
Child Thread
…
do_init();
p->f ++;
…
All schedules that satisfy the “” find the bug.
Atomicity Violation Example:
A Bug of Depth 2
Bug depth = the number of ordering constraints sufficient to find the bug.
Parent Thread
p = malloc();
start(child);
…
If (p != null)
p->f++
…
Child Thread
…
p = null;
…
All schedules that satisfy both “” find the bug.
Deadlock Example:
A Bug of Depth 2
Bug depth = the number of ordering constraints sufficient to find the bug.
Parent Thread
…
Lock(A);
…
Lock(B);
…
Child Thread
…
Lock(B);
…
Lock(A);
…
All schedules that satisfy both “” find the bug.
Part II
THE PCT ALGORITHM
PCT Algorithm:
Randomly Assign & Change Thread Priorities
Input:
int k;
int d;
// no. of steps - guessed from previous runs
// target bug depth - randomly chosen
State:
int pri[];
int change[];
int stepCnt;
// thread priorities
// when to change priorities
// current step count
PCT::Init() {
}
PCT::RandDelay( tid ) {
stepCnt = 0;
stepCnt ++;
foreach tid
pri[tid] = rand() + d;
if stepCnt == change[i] for some i
pri[tid] = i;
for( i=0; i<d-1; i++ )
change[i] = rand() % k;
if (tid is not highest pri enabled thread)
spin;
}
The PCT Guarantee

Given a program with
◦ n threads
◦ k steps
◦ a bug of depth d

(~tens)
(~millions)
(1,2)
Each run PCT finds the bug with a
probability of at least
1
p
n k d 1
(this is a worst-case guarantee)
Part III
THE CUZZ TOOL
& RESULTS
How it Works
Cuzz
Randomized
Algorithm
binary
instrumentation
for data accesses
(optional)
Program
Win32 API
Kernel
Scheduler

Intercept at synchronization points
◦ Detour win32 synchronization calls
◦ Optionally instrument data accesses
◦ No manual instrumentation required
Some
Results
Practice Beats Worst-Case

Measured Probability often significantly
better than worst-case guaranteed
probability
Why Does Practice Beat Worst-Case?


Worst-case guarantee applies to
hardest-to-find bug of given depth
If bugs can be found in multiple ways,
probabilities add up!
Example: Increasing the number of threads helps:
0.02
Measured Probability

0.018
0.016
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
2
3
5
9
17
Number of Threads
33
65
Internal Tool Status
 The
Cuzz tool is available internally
at Microsoft
 We
are working with several
product groups that actively use
Cuzz to improve their stress
testing
DEMO
Demo Conclusion

Measure probabilities on cluster
◦ Without Cuzz:
◦ With Cuzz:
◦ Resource Savings:
1 Fail in 238’820 runs
ratio = 0.000004817
12 Fails in 320 runs
ratio = 0.0375
factor 7,800
1 day of stress testing
= 11 seconds of Cuzz testing
Conclusions

Bug depth is a useful metric to focus
testing efforts

Systematic randomization improves
concurrency testing

No reason not to use Cuzz