Transcript slides

Dataflow Analysis for
Datarace-Free Programs
(ESOP ‘11)
Arnab De
Joint work with Deepak D’Souza
and Rupesh Nasre
Indian Institute of Science, Bangalore
Why Datarace-Free Programs?

Java, C++, …
programs
Racy
programs
DRF
programs
Very weak
guarantees
Sequentially
consistent
semantics
 Dataraces are often indicators of
bugs.
SC for DRF
Verifier
No
DRF?
Bug/Memory model
specific reasoning
required
Yes
Analysis for
DRF programs!
assume DRF
Perform
optimization
Compiler
Optimized
code
Datarace-Free Programs
 In an execution, a release action
synchronizes-with (sw) all acquire actions on
same variable after it.
 In an execution, happens-before (hb)
relation is reflexive, transitive closure of
synchronizes-with and program-order.
 In all SC executions, all conflicting accesses
must be ordered by happens-before.
Datarace-Free Programs
t1++;
lock l;
x = 1;
unlock l;
po edge
t2++;
lock l;
x = 2;
unlock l;
t++;
lock l;
x = 1;
unlock l;
sw edge
t2++;
lock l;
x = 2;
unlock l;
po edge
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn(“cons”);
prod () {
while (1) {
lock (l);
oldv = *p->data;
free (p->data);
newv = nextv (oldv);
p->data = new (...);
*p->data = newv;
unlock (l);
}
}
cons () {
while (1) {
lock (l);
v = *p->data;
unlock (l);
}
}
Dataflow Analysis for
Concurrent Programs
Kill dataflow facts conservatively.
– More precise.
Track interleavings precisely.
– More efficient.
Handle simple program constructs.
– Handle modern language constructs.
Handle simple analyses.
– Handle more complex analyses.
p
p,p->data
p,p->data
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn (“cons”);
prod () {
while (1) {
p,p->data lock (l);
p,p->data oldv = *p->data;
p,p->data free (p->data);
p newv = nextv (oldv);
p p->data = new (...);
p,p->data *p->data = newv;
p.p->data unlock (l);
}
}
cons () {
while (1) {
lock (l);
v = *p->data;
unlock (l);
}
}
p,p->data
p,p->data
p,p->data
p,p->data
p
p,p->data
p,p->data
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn (“cons”);
prod () {
while (1) {
p,p->data lock (l);
p,p->data oldv = *p->data;
p,p->data free (p->data);
p newv = nextv (oldv);
p p->data = new (...);
p,p->data *p->data = newv;
p.p->data unlock (l);
}
}
cons () {
while (1) {
lock (l);
v = *p->data;
unlock (l);
}
}
p,p->data
p,p->data
p,p->data
p,p->data
p
p,p->data
p,p->data
buf *p; lock l;
p = new (...);
p->data = new (...);
*p->data = VAL;
spawn (“prod”); spawn (“cons”);
prod () {
while (1) {
p,p->data lock (l);
p,p->data oldv = *p->data;
p,p->data free (p->data);
p unlock (l);
p newv = nextv (oldv);
p lock (l);
p p->data = new (...);
p,p->data *p->data = newv;
p.p->data unlock (l);
}
}
cons () {
while (1) {
lock (l);
v = *p->data;
unlock (l);
}
}
p,p->data
p
p
p
Our Algorithm for Lifting Sequential
Analyses for Concurrent Programs
 Build sync-CFG: add may-synchronize-edges
from release to corresponding acquire
instructions, if they can run in parallel.
– From fork to first instruction of child thread.
– From unlock to lock instructions on same lock
variable.
– From last instruction of a child thread to join
instruction waiting for it.
– …
– May need to over-approximate the edges.
Our Algorithm for Lifting Sequential
Analyses for Concurrent Programs
Sequential analysis on sync-CFG:
– Consider flow function for
synchronization instructions as id.
– Construct flow equations on sync-CFG.
– Compute least fixed point (lfp) of flow
equations.
Restrictions on Analysis
Value Set analysis:
– Collects set of values for each lvalue at
each program point, loses the correlation.
– l := e : evaluate e on the input value
set and update the value set of l.
– if(e) : propagate values that can make e
true to true branch, similarly for false
branch.
– Join operation is point-wise union.
– Treats aliases conservatively.
Restrictions on Analysis (2)
Abstractions of value set analysis:
– A is an abstraction of VS if there are α and
γ such that α(lfp of VS) ≤ lfp of A and lfp of
VS ≤ γ(lfp of A).
– Null-pointer analysis, Interval analysis,
Constant propagation, May pointer
analysis…
Interpreting the Result
We assume that the value set of an
lvalue (or its abstraction) is relevant
only at those program points where
that lvalue is read.
– Result of NPA is important only where the
pointer is dereferenced.
– Result of CP is important only where that
variable is read.
Our result is sound only for relevant
lvalues at a given program point.
Why does it work?
For Value Set analysis:
– LFP of sequential analysis overapproximates join-over-all-paths in syncCFG.
– It is enough to show that if an execution
produces a value v for an lvalue l relevant
at a program point E, then there is a path
in sync-CFG that includes v in VS(l) at E.
Path in Sync-CFG
W: x = y
• Induction over execution length.
• W and R are related by hb.
• hb = (po U sw)*
• Flow functions of po edges overapproximate execution behavior.
• Flow functions of sw edges are identity.
R: … = x
Context-Sensitive Analysis
Analysis domain:
– call string -> abstract state
On a call site c,
– [s -> a] -> [sc -> a]
On return to call site c,
– [sc -> a] -> [s -> a]
Context-Sensitive Analysis for
Concurrent Programs
Use a summary component at each
may-synchronize-with edge.
Join all the states at acquire and put in
summary.
Join the summary with all (nonbottom) states at release.
% of dereferences proved safe
Results
100
90
80
70
60
50
40
30
20
10
0
our technique
sequential
analysis
jdbf
jtds
jdbm
all derefs
seq analysis
our analysis
actually safe
Comparison with RADAR
Sources of Imprecision
Alias analysis, may happen in parallel
analysis, …
Representation of multiple dynamic
threads by a single static thread.
Paths in sync-CFG that do not
correspond to any real execution.
main() {
foo() {
lock l;
fork(foo);
x++;
…
unlock l;
fork(foo);
}
}
bar() {
baz() {
lock l;
lock l;
x++;
x++;
unlock l;
}
unlock l;
}
Conclusion
A dataflow analysis technique for DRF
programs.
Defined the conditions for soundness.
Demonstrated scalability and
precision.