Dynamic Data Race Detection

Download Report

Transcript Dynamic Data Race Detection

Dynamic Data Race Detection
Sources
• Eraser: A Dynamic Data Race Detector for
Multithreaded Programs
– Stefan Savage, Michael Burrows, Greg
Nelson, Patric Sobalvarro, Thomas Anderson,
ACM Transactions on Computer Systems,
Vol. 15, No. 4, November 1997
• RaceTrack: Efficient Detection of Data
Race Conditions via Adaptive Tracking;
– Yuan Yu, Tom Rodeheffer, Wei Chen,
Proceedings SOSP ’05, copyright 2005 ACM
The Shared Problem
• Problem: Data race detection in multithreaded
programs. (Implies shared memory)
• Solution: a tool that automates the problem of
detecting potential data races
– Each paper describes a different method or technique
• Basic idea: look for “unprotected” accesses to
shared variables.
• Why important: synchronization errors based on
data races are
– Timing dependent
– Hard to find
Data Race
• “A data race occurs when two concurrent
threads access a shared variable and …
– at least one access is a write and
– the threads use no explicit mechanism to
prevent the accesses from being
simultaneous”
• In other words, a data race can lead to a
potential violation of mutual exclusion.
Data Race Example: Threads with
unsynchronized access to a shared array
Thread 1
int i;
…
for (i = 1;
i < MAX;
i++)
{
cin >> x;
A[i] = 2*x;
}
…
Thread 2
int i;
…
for (i = 1;
i < MAX;
i++)
{
if (A[i] < B[i])
B[i] = A[i];
}
…
Static Data Race Detection
• Static data race detection can be done at
compile time.
– Type-based methods; a language-level approach
– Path analysis of code; a compile-time approach
• Hard to apply to dynamically allocated date
• Doesn’t scale well to large programs
• Many false positives – it’s hard to reason
about execution behavior.
Dynamic Data Race Detection
• Dynamic detection is done by code that
monitors the software during execution.
– The program may be “instrumented” with
additional instructions
– The additions don’t change program
functionality but are used to monitor
conditions of interest - in this case, access to
shared variables and synchronization
operations.
Dynamic Detection
• Post mortem or on-the-fly analysis of code
traces
• Problems:
– Can only check paths that are actually executed
– Adds overhead at runtime
• Techniques
– Happens-before (earliest dynamic technique)
– Lockset analysis (Eraser)
– Various hybrids (RaceTrack)
Dynamic Race Detection Using
happens-before
• Definition of happens-before for data race
detection uses accesses to
synchronization objects (locks) to
synchronize separate threads.
– Compare to use of messages to synchronize
separate processes in previous applications.
• In a single thread, happens-before reflects
the temporal order of event occurrence (as
always).
Happens-before Relation
• Between threads, events can be causally
connected when a lock is accessed in a
thread (A) and the next access to that lock is
in a different thread B (lock access replaces
message exchange)
• Accesses must obey the semantics of locks:
– only the owner of a lock can unlock it,
– two threads can’t hold the same lock
simultaneously.
Happens-before Relation
• Let event a be in thread A and event b be
in thread B.
– If a = unlock(mu) and b = lock(mu) then
a → b (a happens-before b)
• Data races between threads are possible if
accesses to shared variables are not
ordered by happens-before.
EXAMPLE: Fig. 1
Thread 1
Thread 2
lock(mu);
v = v + 1;
unlock(mu);
The arrows represent happens-before.
The events represent an actual execution of
the two threads.
Instead of a logical clock, each thread might
maintain a “most recent event” variable. In
T1, the most recent event is unlock(mu);
when T2 executes lock(mu) the system can
establish the happens-before relation.
lock(mu);
v = v + 1;
unlock(mu);
EXAMPLE: Fig. 2
Thread 1
Thread 2
y = y + 1;
lock(mu);
v = v + 1;
unlock(mu);
Accesses to both y and v are ordered by
happens-before, so no data race occurred.
But … a different execution ordering could
get different results.
Happens-before only detects data races if
the incorrect order shows up in an execution
trace.
lock(mu);
v = v + 1;
unlock(mu);
y = y + 1;
EXAMPLE: Fig. 2
Thread 2
lock(mu);
Thread 1
v = v + 1;
Accesses to y are “concurrent” since
neither a b nor b a
y = y + 1;
unlock(mu);
y = y + 1;
...
lock(mu);
v = v + 1;
unlock(mu);
If Thread 2 executes before Thread 1,
happens-before no longer holds between
the two accesses to y, so the possibility of
a data race occurs and should be notified
to the programmer.
Problems with happens-before
• Eraser would find the error for any test case that
included both code paths, regardless of the
order; happens-before analysis only works if the
dangerous schedule is executed.
• Since there are many possible interleavings, you
can’t be sure to test them all so you might miss a
potential error.
• Eraser might miss some data races, but it will
catch more than tools based only on happensbefore.
Lockset Analysis Background
• Lock: a synchronization object that is
either available, or owned (by a thread).
– Operations: lock(mu) and unlock(mu).
– No explicit initialize operation.
• Compare to binary semaphore
– Lock( ) ~ P( ); Unlock ~ V( )
– The Lock( ) operation is blocking if the lock is
owned by another thread.
Background
• Simple mutex locks are not the only kind.
• Some systems provide others:
– Read/write locks permit multiple readers, but
only one writer.
• Some shared-memory accesses don’t
need locks at all
– Read only data: intialized and then never
written again.
Basic Premise of Eraser
• Observe all instances where a shared
variable is accessed by a thread.
• If there is a chance that a data race can
occur, be sure the shared variable is
protected by a lock.
– Simple algorithm – basic locks
– Advanced algorithm – reader/writer locks
• If variable isn’t protected, issue a warning.
How Eraser Works
• Requires each shared variable to be protected
by a lock. (the same lock for all threads)
• Eraser will monitor all reads and writes (loads
and stores) of a variable as the program runs.
• Eraser must deduce which locks protect each
shared variable.
• Eraser assumes that it knows the full set of locks
in advance (they must be declared in the code).
• Protects at the word level; i.e., a word is
considered to be a variable.
How It Works
(see Section 2)
• For each variable v build a set of locks
C(v) that holds candidate locks (locks that
may be protecting v).
– l is in C(v) if every thread that has accessed v
so far was holding l at the time of access.
• Lockset refinement: C(v) is adjusted every
time v is accessed.
• If C(v) becomes empty, the variable is
assumed to be unprotected.
The First Lockset Algorithm
• Let locks_held(t) be the set of locks held by
thread t. (a per-thread structure)
• For each v, initialize C(v) to the set of all
locks. (a per-variable structure)
• Lock sets change over time.
• Each time a thread ti accesses variable v
– Set C(v) = C(v) ∩ locks_held(t)
– If C(v) = { Φ } issue a warning
Example (Fig. 3)
• If a program has two locks, mu1 and mu2,
then C(v) is initially {mu1, mu2}.
• If the first access to v is in a thread holding
mu1 then C(v) ∩ locks_held(t) = mu1.
• If the second access to v is in a thread
holding mu2 then C(v) ∩ locks_held(t) =
{Φ}.
Refining the Lockset Algorithm
• The previous algorithm is correct, but flags some
situations as potential race conditions when in
fact they aren’t: False alarms
– Variable initialization (restricted to one thread)
– Shared variables that are read-only
• Sections 2.2 and 2.3 discuss refinements to the
algorithm for avoiding some false alarms and
handling read-write locks as well as simple
locks.
Refinements
• Until a variable is accessed by a second
thread, there’s no danger of a data race so
no need to monitor
State transitions for a memory location, based
on whether it has been accessed at all,
accessed by more than one thread, accessed in
read mode only, etc.
virgin
write
exclusive
Write, new thread
Shared-modified
Read, new thread
Write
shared
Figure 4
Race conditions are reported
only for locations in the sharedmodified state.
Implementing Eraser
• Eraser instruments the program binary by
inserting calls to the Eraser runtime
functions.
• Each load and store is instrumented if it
accesses global or heap data. Stack data
is assumed not to be shared.
• The storage allocator is also instrumented
to initialize C(v) for dynamic data.
Implementing Eraser
• Each call to the lock operation is
instrumented to keep locks_held(t)
updated.
• When a race is suspected (reference to a
shared variable that isn’t protected by a
lock) Eraser indicates the file and line #
plus other information that can help the
programmer locate the problem.
Conclusions
• A number of systems (AltaVista, the Petal
distributed file system) were used as
testbeds.
• Undergraduate programs were also
tested.
• Eraser found a number of potential race
conditions and had a few false alarms.
• Experienced programmers did better than
undergraduates!
Summary/Review
• Data race detection can be done statically
or dynamically
– Static: compile time analysis, examine all paths
or modify language type system to include
synchronization relationships
– Dynamic: run-time analysis, can only catch
errors if they are observed – can’t examine all
paths
• Eraser does a better job than happens-before
methods; will detect all potential races in monitored
code
EXAMPLE: Fig. 2
Thread 1
Thread 2
y = y + 1;
lock(mu);
v = v + 1;
unlock(mu);
Eraser would notice that y is unprotected by
a lock and thus detect a data race, even
though happens-before would not.
lock(mu);
v = v + 1;
unlock(mu);
y = y + 1;
Summary/Review
• Two earlier techniques
– Lockset analysis (Eraser): enforces the requirement
that every shared variable is protected by a lock
• Possible false positives, slow, not “sound” , but relatively
unaffected by execution order.
• May miss some races if a dangerous path is not tested.
– Happens-before analysis: based on Lamport’s
relation, establish partial ordering of statements
based on synchronization events
• No false positives, but may have false negatives
RaceTrack
• Claim: Improves on lockset analysis by
only looking for data races when shared
data is being accessed concurrently.
– Eraser does this too, but in a limited fashion
• Able to handle locks as well as fork-join
parallelism
• Monitors library code also
• Is sensitive to execution traces
Fork-Join
• A way to achieve parallelism
– Parent thread creates (forks) several subthreads
– Parent thread pauses
– Forked threads report results, parent thread
resumes execution (the join) and combines
child results
• Similar to UNIX approach with processes,
but finer granularity
RaceTrack
• Does not claim to detect all concurrent
accesses (i.e., there may be false
negatives)
• Why: to detect all instances of concurrency
the tool would have to keep a complete
access history for each shared variable
– RaceTrack uses estimation techniques to
prune the threadset (set of accesses) and the
lockset.
Tool Environment
• Large multithreaded OO programs running
on the .NET platform
– All code is translated into an intermediate
language (IL) which is later compiled into
platform specific code by the JIT compiler in
the Common Language Runtime (CLR) Fig. l
• The CLR manages all runtime activities:
object allocation, thread creation, garbage
collection, exception handling.
Tool Environment
• RaceTrack instruments at the virtual
machine level (CLR ~ JVM)
– The JIT compiler in the CRL inserts calls to
RaceTrack tools as it generates native code
• RaceTrack is language independent, as
applications run directly on the modified
runtime environment.
Race Track versus Lockset
• Lockset-based detection does not
consider fork/join operations (if only one
thread exists no data race is possible) and
asynchronous calls (non-blocking).
– Result: false alarms
• Observation: data race can occur only if
several threads are currently accessing
the variable
RaceTrack Approach
• RaceTrack maintains a lockset, Cx for each shared
variable x, but it also maintains a current threadset,
Sx .
• Threadset = a set of concurrent accessess, where
concurrent is defined in terms of vector clocks.
– A thread’s virtual clock ticks at certain synchronization
operations;
– Synchronization ops transfer information about clock
values to other threads, which use it to update their own
vector clocks (just as messages are used in earlier
examples).
Threadsets
• Whenever a thread Tj accesses a shared
variable, it adds an entry (label) to that variable’s
threadset.
– Label = (thread id, timestamp of the access)
• Tj then uses happens-before analysis, based on
the vector clocks, to “prune” the threadset.
– Any label Li in the threadset which “happens-before”
the current access made by Tj is removed
– Any remaining accesses are considered “concurrent”
• Races are not considered to be a threat if the
threadset is a singleton
Basic Algorithm - threads
• Each thread t has a lockset Lt (locks-held)
and a vector clock Bt.
– Lockset: contains currently held locks
– Vector clock: most recent information about
the logical clocks of t and all other threads
• Lock and unlock operations update Lt
• Fork/join also update vector clocks.
• Local clock is set to 1 at thread creation,
lockset is set to null
Basic Algorithm - variables
• Each variable x has a lockset Cx and a
threadset Sx, where Cx is the set of locks
that are (potentially) currently protecting x
and Sx is the current set of concurrent
accesses to x.
– Initially, Sx is the empty set { } and Cx is
intialized to the set of all possible locks
RaceTrack Approach
• Adjusts monitoring granularity from object
level to field level based on program
conditions.
• Issues warnings on-the-fly and then
performs a more careful analysis during a
post-mortem
RaceTrack Benefits
• Coverage: JIT compiler enables any code to be
instrumented and monitored.
• Accuracy: Ability to monitor at a low granularity
(field, individual array element) improves
detection accuracy.
• Happens-before analysis filters out some false
positives that would be flagged by lockset
analysis alone.
• Performance: Monitoring is adaptive – reduce
level when races are unlikely
• Scalability: good, due to low overhead and ease
of instrumentation.
Future Work (RaceTrack)
• Add deadlock detection mechanisms to
flag lock acquisitions that are ordered
incorrectly.
Example of Potential Deadlock:
global variables x, y; semaphores sx = sy = 1
Thread 1
Thread 2
P(sx);
P(sy);
x = f1(x,y);
y = f2(x,y);
V(sy);
V(sx);
P(sy);
P(sx);
x = p1(x,y);
y = p2(x,y);
V(sx);
V(sy);