Transcript slides
Dataflow Analysis for
Datarace-Free Programs
(ESOP ‘11)
Arnab De
Joint work with Deepak D’Souza
and Rupesh Nasre
Indian Institute of Science, Bangalore
Why Datarace-Free Programs?
Java, C++, …
programs
Racy
programs
DRF
programs
Very weak
guarantees
Sequentially
consistent
semantics
Dataraces are often indicators of
bugs.
SC for DRF
Verifier
No
DRF?
Bug/Memory model
specific reasoning
required
Yes
Analysis for
DRF programs!
assume DRF
Perform
optimization
Compiler
Optimized
code
Datarace-Free Programs
In an execution, a release action
synchronizes-with (sw) all acquire actions on
same variable after it.
In an execution, happens-before (hb)
relation is reflexive, transitive closure of
synchronizes-with and program-order.
In all SC executions, all conflicting accesses
must be ordered by happens-before.
Datarace-Free Programs
t1++;
lock l;
x = 1;
unlock l;
po edge
t2++;
lock l;
x = 2;
unlock l;
t++;
lock l;
x = 1;
unlock l;
sw edge
t2++;
lock l;
x = 2;
unlock l;
po edge
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn(“cons”);
prod () {
while (1) {
lock (l);
oldv = *p->data;
free (p->data);
newv = nextv (oldv);
p->data = new (...);
*p->data = newv;
unlock (l);
}
}
cons () {
while (1) {
lock (l);
v = *p->data;
unlock (l);
}
}
Dataflow Analysis for
Concurrent Programs
Kill dataflow facts conservatively.
– More precise.
Track interleavings precisely.
– More efficient.
Handle simple program constructs.
– Handle modern language constructs.
Handle simple analyses.
– Handle more complex analyses.
p
p,p->data
p,p->data
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn (“cons”);
prod () {
while (1) {
p,p->data lock (l);
p,p->data oldv = *p->data;
p,p->data free (p->data);
p newv = nextv (oldv);
p p->data = new (...);
p,p->data *p->data = newv;
p.p->data unlock (l);
}
}
cons () {
while (1) {
lock (l);
v = *p->data;
unlock (l);
}
}
p,p->data
p,p->data
p,p->data
p,p->data
p
p,p->data
p,p->data
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn (“cons”);
prod () {
while (1) {
p,p->data lock (l);
p,p->data oldv = *p->data;
p,p->data free (p->data);
p newv = nextv (oldv);
p p->data = new (...);
p,p->data *p->data = newv;
p.p->data unlock (l);
}
}
cons () {
while (1) {
lock (l);
v = *p->data;
unlock (l);
}
}
p,p->data
p,p->data
p,p->data
p,p->data
p
p,p->data
p,p->data
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn (“cons”);
prod () {
while (1) {
p,p->data lock (l);
p,p->data oldv = *p->data;
p,p->data free (p->data);
p unlock (l);
p newv = nextv (oldv);
p lock (l);
p p->data = new (...);
p,p->data *p->data = newv;
p.p->data unlock (l);
}
}
cons () {
while (1) {
lock (l);
v = *p->data;
unlock (l);
}
}
p,p->data
p
p
p
Our Algorithm for Lifting Sequential
Analyses for Concurrent Programs
Build sync-CFG: add may-synchronize-edges
from release to corresponding acquire
instructions, if they can run in parallel.
– From fork to first instruction of child thread.
– From unlock to lock instructions on same lock
variable.
– From last instruction of a child thread to join
instruction waiting for it.
– …
– May need to over-approximate the edges.
Our Algorithm for Lifting Sequential
Analyses for Concurrent Programs
Sequential analysis on sync-CFG:
– Consider flow function for
synchronization instructions as id.
– Construct flow equations on sync-CFG.
– Compute least fixed point (lfp) of flow
equations.
Restrictions on Analysis
Value Set analysis:
– Collects set of values for each lvalue at
each program point, loses the correlation.
– l := e : evaluate e on the input value
set and update the value set of l.
– if(e) : propagate values that can make e
true to true branch, similarly for false
branch.
– Join operation is point-wise union.
– Treats aliases conservatively.
Restrictions on Analysis (2)
Abstractions of value set analysis:
– A is an abstraction of VS if there are α and
γ such that α(lfp of VS) ≤ lfp of A and lfp of
VS ≤ γ(lfp of A).
– Null-pointer analysis, Interval analysis,
Constant propagation, May pointer
analysis…
Interpreting the Result
We assume that the value set of an
lvalue (or its abstraction) is relevant
only at those program points where
that lvalue is read.
– Result of NPA is important only where the
pointer is dereferenced.
– Result of CP is important only where that
variable is read.
Our result is sound only for relevant
lvalues at a given program point.
Why does it work?
For Value Set analysis:
– LFP of sequential analysis overapproximates join-over-all-paths in syncCFG.
– It is enough to show that if an execution
produces a value v for an lvalue l relevant
at a program point E, then there is a path
in sync-CFG that includes v in VS(l) at E.
Path in Sync-CFG
W: x = y
• Induction over execution length.
• W and R are related by hb.
• hb = (po U sw)*
• Flow functions of po edges overapproximate execution behavior.
• Flow functions of sw edges are identity.
R: … = x
Context-Sensitive Analysis
Analysis domain:
– call string -> abstract state
On a call site c,
– [s -> a] -> [sc -> a]
On return to call site c,
– [sc -> a] -> [s -> a]
Context-Sensitive Analysis for
Concurrent Programs
Use a summary component at each
may-synchronize-with edge.
Join all the states at acquire and put in
summary.
Join the summary with all (nonbottom) states at release.
% of dereferences proved safe
Results
100
90
80
70
60
50
40
30
20
10
0
our technique
sequential
analysis
jdbf
jtds
jdbm
all derefs
seq analysis
our analysis
actually safe
Comparison with RADAR
Sources of Imprecision
Alias analysis, may happen in parallel
analysis, …
Representation of multiple dynamic
threads by a single static thread.
Paths in sync-CFG that do not
correspond to any real execution.
main() {
foo() {
lock l;
fork(foo);
x++;
…
unlock l;
fork(foo);
}
}
bar() {
baz() {
lock l;
lock l;
x++;
x++;
unlock l;
}
unlock l;
}
Conclusion
A dataflow analysis technique for DRF
programs.
Defined the conditions for soundness.
Demonstrated scalability and
precision.