An Case for an Interleaving Constrained Shared

Download Report

Transcript An Case for an Interleaving Constrained Shared

An Case for an Interleaving Constrained
Shared-Memory Multi-Processor
Jie Yu and Satish Narayanasamy
University of Michigan
Why is Parallel Programming Hard?
• Is single-threaded programming relatively easy?
– Verification is NP-hard
– BUT, properties such as a function’s pre/post-conditions,
loop invariants are verifiable in polynomial time
• Parallel programming is harder
– Verifying properties for even small code regions is NP-hard
– Reason: Unbounded number of legal thread interleavings
exposed to the parallel runtime
– Impractical to test/verify properties for all legal interleavings
Too much freedom
given to parallel runtime?
Incorrect interleavings eliminated by
adding synchronization constraints
Legal Thread Interleavings
Tested
Correct
Interleavings
Untested interleavings
- cause for concurrency bugs
Incorrect interleavings found
during testing
Solution : Limit Freedom
Interleaving constraints
from correct test runs are
encoded in the program
binary
Programmer tests
as many legal interleavings
as practically possible
Runtime System Avoids
Untested Interleavings
i.e. avoid corner cases
Result of Constraining Interleavings
• A majority of the concurrency bugs are
avoidable
– Data races, atomicity violations, and also order
violations
• Performance overhead is low
– Untested interleavings in well-tested programs
are likely to manifest rarely
– Processor support helps reduce the cost of
enforcing interleaving constraints
Challenges
• How to encode tested interleavings
in a program’s binary?
– Predecessor Set (PSet)
interleaving constraints
• How to efficiently enforce interleaving
constraints at runtime?
• Detect violations of PSet constraints
using processor support
• Avoid violations by stalling or using
rollback-and-re-execution support
Encoding Tested Interleavings
• Interleaving Constraints from Test Runs
– Too specific to a test input  Performance loss for a
different input
– Too generic  Might allow untested interleavings
• Predecessor Set (Pset)
– PSet(m)defined for each static memory operation
m
– pred e PSet(m), if m is immediately and remotely
memory dependent on pred in at least one tested
execution
A Test Run
Thread 1
{ W1 }
R2
{}
R4
Thread 2
W1
{}
R1
{}
Thread 3
R3
PSet(W1)
PSet(R1)
PSet(R2)
PSet(R3)
PSet(R4)
PSet(W2)
PSet(W3)
{}
{}
{W1}
{W1}
{}
{R3,R4}
{W2}
{ W1 }
W2
{ R3, R4 }
=
=
=
=
=
=
=
R2
W3
W1
{ W2 }
R4
Enforcing Tested Interleaving
• Processor support for detecting and avoiding PSet constraints
• Detecting PSet constraint violations
– For each memory location, track its last accessor
• Cache extension
– Detect PSet constraint violation
• Piggyback cache coherence reply with last accessor
• Processor executes PSet membership test by executing additional microops
• Overcoming a PSet Constraint violation
– Stall
– Re-execute using checkpoint-and-rollback support
• E.g. SafetyNet, ReVive, etc.
Two Case Studies
• Case Study 1
– An Atomicity Violation Bug in MySQL
– Avoided using stall
• Case Study 2
– An order violation bug in Mozilla
• neither a data race nor an atomicity violation
– Avoided using rollback and re-execution
Two Case Studies
• Case Study 1
– An Atomicity Violation Bug in MySQL
– Avoided using stall
• Case Study 2
– An order violation bug in Mozilla
• neither a data race nor an atomicity violation
– Avoided using rollback and re-execution
An Atomicity Violation Bug in MySQL
Thread 1
MYSQL_LOG::new_file()
{
…
…
log_status = LOG_CLOSED; W1
close();
…
open(…);
…
…
}
W2
log_status = LOG_OPEN;
…
sql/log.cc
Thread 2
mysql_insert(…)
{
R1
…
if (log_status != LOG_CLOSED)
{
// write into a log file
}
…
}
sql/sql_insert.cc
Correct Interleaving #1
-- “frequent”, therefore likely to be tested
Thread 1
Thread 2
{}
W1
R1
log_status != LOG_CLOSED ?
log_status = LOG_CLOSED
{ R1 }
W2
{}
log_status = LOG_OPEN
PSet(W1) = {R1}
PSet(W2) = {}
PSet(R1) = {}
Correct Interleaving #2
-- “frequent”, therefore likely to be tested
Thread 1
W1
Thread 2
log_status = LOG_CLOSED
{ R1 }
W2
log_status = LOG_OPEN
{}
R1
log_status != LOG_CLOSED ?
{ W2
{} }
PSet(W1) = {R1}
PSet(W2) = {}
PSet(R1) = {W2}
Incorrect Interleaving
-- rare, and therefore likely to be untested
Thread 1
W1
Thread 2
log_status = LOG_CLOSED
{ R1 }
W2
{}
Constraint Violation
log_status = LOG_OPEN
W1 PSet(R1)
W2 PSet(R1)
R1
log_status != LOG_CLOSED ?
{ W2 }
Two Case Studies
• Case Study 1
– An Atomicity Violation Bug in MySQL
– Avoided using stall
• Case Study 2
– An order violation bug in Mozilla
• neither a data race nor an atomicity violation
– Avoided using rollback and re-execution
Correct Test Run
W
TimerThread::Run() {
...
Lock(lock);
mProcessing = TRUE;
while (mProcessing) {
...
mWaiting = TRUE;
Wait(cond, lock);
mWaiting = FALSE;
}
Unlock(lock);
...
}
TimerThread.cpp
R
TimerThread::Shutdown() {
...
Lock(lock);
mProcessing = FALSE;
if (mWaiting)
Notify(cond, lock);
Unlock(lock);
...
mThread->Join();
return NS_OK;
}
TimerThread.cpp
Thread 1
Thread 2
W mWaiting = TRUE
{}
R
if (mWaiting) ?
{W}
PSet(W) = {}
PSet(R) = {W}
Avoiding Order Violation
W
TimerThread::Run() {
...
Lock(lock);
mProcessing = TRUE;
while (mProcessing) {
...
mWaiting = TRUE;
Wait(cond, lock);
mWaiting = FALSE;
}
Unlock(lock);
...
}
Thread 1
R
R
TimerThread.cpp
if (mWaiting) ?
{W}
Rollback
Constraint Violation
TimerThread.cpp
TimerThread::Shutdown() {
...
Lock(lock);
mProcessing = FALSE;
if (mWaiting)
Notify(cond, lock);
Unlock(lock);
...
mThread->Join();
return NS_OK;
}
Thread 2
R PSet(W)
W
{}
mWaiting = TRUE
Methodology
• Pin based analysis
• 17 documented bugs analyzed
– MySQL, Apache, Mozilla, pbzip, aget, pfscan
+ Parsec, Splash for performance study
• Applications tested using regression test
suites when available or random test input
PSet Constraints from Test Runs
160
6200
5800
5400
MySQL
5000
Total PSet Pairs Learnt
Total PSet Pairs Learnt
6600
140
120
100
80
FFT
60
0
20
40
60
80
Number of Test Runs
100
0
10
20
30
40
50 60
Number of Test Runs
70
Total PSet Pairs Learnt
210
200
• Concurrent workload
190
180
170
160
150
Pbzip2
140
0
5
10
15
20
25
30
Number of Test Runs
35
– MySQL: run regression test
suite in parallel with OSDB
– FFT, pbzip2: random test
input
Bug Avoidance Capability
• 17 bugs from MySQL, Apache, Mozilla, pbzip, aget, pfscan
• 15/17 bugs avoided by enforcing PSet contraints
– Including a bug that is neither a data race nor an
atomicity violation bug
• 2/17 false negatives
– a multi-variable atomicity violation
– a context sensitive deadlock bug
• 6 bugs are avoided using stalling mechanism. Other
require rollback mechanism.
# Pset Violations per Billion Inst.
50
PSet violations in
Bug Free Execution
Cannot Resolve
40
Rollback
30
Stall
20
10
0
• 2 PSet constraint violations in MySQL not
avoided
– MySQL, bmove512 unrolls a loop 128 times
PSet Size of Instructions
pbzip2
aget

Over 95% of the inst.
have PSets of size zero

Less than 2% of static
memory inst. have a
PSet of size greater than
two
pfscan
apache
0
mysql
1
fft
2~5
fmm
6 ~ 10
lu
> 10
radix
blackscholes
canneal
90%
95%
100%
Summary
• Multi-threaded programming is hard
– Existing shared-memory programming model exposes too
many legal interleavings to the runtime
– Most interleavings remain untested in production code
• Interleaving constrained shared-memory
multiprocessor
– Avoids untested (rare) interleavings to avoid concurrency
bugs
• Predecessor Set interleaving constraints
– 15/17 concurrency bugs are avoidable
– Acceptable performance and space overhead
Thanks
• Q&A
Memory Space Overhead
Program
App. Size
# PSet
Pairs
Overhead
w.r.t App.
Pbzip2
39KB
201
2.16%
Aget
90KB
365
1.69%
Pfscan
17KB
295
7.34%
Apache
2435KB
4119
0.69%
MySQL
4284KB
6604
0.64%
FFT
24KB
158
2.74%
FMM
73KB
1764
10.13%
LU
24KB
244
4.31%
Radix
21KB
255
5.00%
Blackscholes
54KB
41
0.32%
Canneal
59KB
752
5.24%
Space Overhead
 In the worst case, 10% code
size increase