Transcript 2016
Architectural Support for Detecting Data Races and Atomicity Violations Abdullah Muzahid, Dario Suarez, Shanxiang Qi, Josep Torrellas University Of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Motivation Signature Based Race Detection • Multicore era has arrived • It combines signature and happened-before algorithm • Each processor keeps a vector clock – More parallel programs will be written – More concurrency bugs will occur • Two important concurrency bugs • RDM compares incoming timestamp with buffered timestamps of each processor • For unordered blocks, it intersects their signatures – It is updated at synchronization operation – Data Race – Atomicity Violation • The value of the clock during an epoch is its timestamp • Existing concurrency bug detection proposals – Null intersection indicates a potential data race • To filter false positives due to aliasing in signatures – Epoch is from one synch point to next synch point – Epoch is further divided into blocks – Either do it in software resulting huge overhead – Or, do it in hardware with small coverage – Lack in generality – Checkpoint is taken periodically – Processors rollback once a potential race is found – Detailed analysis is done during that time with support from software • Block is a fixed number of dynamic instruction • Timestamp and signature are sent to Race Detection Module (RDM) after each block • RDM is an on-chip module • We propose to detect these bugs in hardware using signatures – It has one FIFO buffer of timestamp and signature for each processor If T1 & TJ unordered R1 Ո WJ P1 Data Race W1 Ո RJ Else stop T1 R1 W1 Block TS1 Epoch • Two threads access the same variable without intervening synchronization and • at least one is a write T1 T2 • Common bug lock L Sig1 RDM Q1 Q2 Chip Race Detection Module x++ unlock L Extracting Atomic Regions and Detecting Violations x++ • Two steps – Infer atomic regions without programmer’s annotations – Find violations of them at runtime Detecting Atomicity Violations Inferring Atomic Regions • It detects violations with a given set of atomic regions • It tries to serialize concurrent atomic regions based on dependence directions • If serialization is possible, then there is no violation • Otherwise, there is violation • Input: – Traces of multiple correct runs – Each trace is a total order of memory accesses during one execution Atomicity Violation Bug • Harder problem than data races • Might occur even if there is no data race lock (L) • Most existing x=… … proposals require unlock (L) lock (L) explicit annotations x=… … • No uniform solution unlock (L) lock (L) for all types of atomic … = x unlock (L) regions • Hardware based solutions also lack in generality TJ RJ WJ T1 R1 W1 T0 R0 W0 sync • Software based solutions impose overhead • Hardware based solutions complicate cache and suffer from low coverage • Algorithm: – Greedy approach – Consider each thread’s trace in turn – Join successive references of a thread into an atomic region if its atomicity is not violated by other threads AR1 T1 T2 Rd X Wr Y Rd Y Wr X Rd X Rd X Wr X Wr Y Wr Y Rd Y Wr Y T1 T2 Wr Y Rd X, Rd Y Wr X Rd X Rd X Wr X Wr Y Wr Y Rd Y Wr Y T1 T2 T1 T2 Wr Y Rd X, Rd Y Wr Y AR1 AR2 Wr X Rd X Wr Y Rd X, Wr X, Wr Y Rd Y Wr Y No Violation AR2 Rd X, Rd Y Wr X, Rd X, Wr Y Rd X, Wr X, Wr Y Rd Y Wr Y AR2 AR1 Signature No Violation AR2 AR1 Conclusions ROB … W1 Ո WJ sync • Root cause of many system failures including the blackout of 2003 • Hard to detect and reproduce P2 ld/st addr … [Ceze et al, ISCA06] • Hardware bloom filter to store addresses • Logical AND for intersection, OR for union • Has false positives but no false negatives • We propose to detect data races and atomicity violations in hardware • Our experiments show that our scheme – Finds 29% more existing races in SPLASH2 applications that a state-of-the-art cache based scheme – Detects 5 out of 5 representative atomicity bugs from real world applications • Both schemes require similar type of hardware AR1 AR2 Violation