Transcript 2016

Architectural Support for Detecting Data
Races and Atomicity Violations
Abdullah Muzahid, Dario Suarez, Shanxiang Qi, Josep Torrellas
University Of Illinois at Urbana-Champaign
http://iacoma.cs.uiuc.edu
Motivation
Signature Based Race Detection
• Multicore era has arrived
• It combines signature and happened-before
algorithm
• Each processor keeps a vector clock
– More parallel programs will be written
– More concurrency bugs will occur
• Two important concurrency bugs
• RDM compares incoming timestamp
with buffered timestamps of each
processor
• For unordered blocks, it intersects their
signatures
– It is updated at synchronization operation
– Data Race
– Atomicity Violation
• The value of the clock during an epoch is
its timestamp
• Existing concurrency bug detection
proposals
– Null intersection indicates a potential data race
• To filter false positives due to aliasing
in signatures
– Epoch is from one synch point to next synch
point
– Epoch is further divided into blocks
– Either do it in software resulting huge
overhead
– Or, do it in hardware with small
coverage
– Lack in generality
– Checkpoint is taken periodically
– Processors rollback once a potential race is found
– Detailed analysis is done during that time with
support from software
• Block is a fixed number of dynamic instruction
• Timestamp and signature are sent to Race
Detection Module (RDM) after each block
• RDM is an on-chip module
• We propose to detect these bugs in
hardware using signatures
– It has one FIFO buffer of timestamp and signature for
each processor
If T1 & TJ unordered
R1 Ո WJ
P1
Data Race
W1 Ո RJ
Else stop
T1 R1 W1
Block TS1
Epoch
• Two threads access the same variable
without intervening synchronization and
• at least one is a write
T1
T2
• Common bug
lock L
Sig1
RDM
Q1
Q2
Chip
Race Detection
Module
x++
unlock L
Extracting Atomic Regions and Detecting Violations
x++
• Two steps
– Infer atomic regions without programmer’s
annotations
– Find violations of them at runtime
Detecting Atomicity Violations
Inferring Atomic Regions
• It detects violations with a given set of
atomic regions
• It tries to serialize concurrent atomic
regions based on dependence directions
• If serialization is possible, then there is
no violation
• Otherwise, there is violation
• Input:
– Traces of multiple correct runs
– Each trace is a total order of memory accesses
during one execution
Atomicity Violation Bug
• Harder problem than data races
• Might occur even if there is no data race
lock (L)
• Most existing
x=…
…
proposals require
unlock (L)
lock (L)
explicit annotations
x=…
…
• No uniform solution
unlock (L)
lock (L)
for all types of atomic … = x
unlock (L)
regions
• Hardware based
solutions also lack in
generality
TJ RJ WJ
T1 R1 W1
T0 R0 W0
sync
• Software based solutions
impose overhead
• Hardware based solutions complicate
cache and suffer from low coverage
• Algorithm:
– Greedy approach
– Consider each thread’s trace in turn
– Join successive references of a thread into an atomic
region if its atomicity is not violated by other threads
AR1
T1 T2
Rd X
Wr Y
Rd Y
Wr X
Rd X
Rd X
Wr X
Wr Y
Wr Y
Rd Y
Wr Y
T1
T2
Wr Y
Rd X, Rd Y
Wr X
Rd X
Rd X
Wr X
Wr Y
Wr Y
Rd Y
Wr Y
T1
T2
T1
T2
Wr Y
Rd X, Rd Y
Wr Y
AR1
AR2
Wr X
Rd X
Wr Y
Rd X, Wr X, Wr Y
Rd Y
Wr Y
No
Violation
AR2
Rd X, Rd Y
Wr X, Rd X, Wr Y
Rd X, Wr X, Wr Y
Rd Y
Wr Y
AR2
AR1
Signature
No
Violation
AR2
AR1
Conclusions
ROB
…
W1 Ո WJ
sync
• Root cause of many system failures
including the blackout of 2003
• Hard to detect and
reproduce
P2
ld/st
addr
…
[Ceze et al, ISCA06]
• Hardware bloom filter to store addresses
• Logical AND for intersection, OR for
union
• Has false positives but no false negatives
• We propose to detect data races and
atomicity violations in hardware
• Our experiments show that our scheme
– Finds 29% more existing races in SPLASH2
applications that a state-of-the-art cache based
scheme
– Detects 5 out of 5 representative atomicity bugs
from real world applications
• Both schemes require similar type of
hardware
AR1
AR2
Violation