Concurrency Bug Detection

Download Report

Transcript Concurrency Bug Detection

Race Detection for
Event-driven Mobile Applications
Chun-Hung Hsiao
Jie Yu
Satish Narayanasamy
Ziyun Kong
Cristiano Pereira
Gilles Pokam
Peter Chen
Jason Flinn
University of Michigan
University of Michigan / Twitter
University of Michigan
University of Michigan
Intel
Intel
University of Michigan
University of Michigan
Rise of Event-Driven Systems
Mobile apps
Web apps
Data-centers
Lack tools for finding concurrency errors in these systems
2
Why Event-Driven
Programming Model?
Need to process asynchronous input from a rich set of sources
3
Events and Threads in Android
Looper Thread
Event Queue
Threads
Regular Threads
send(
signal(m)
wr(x)
wait(m)
)
onServiceConnected() {
...
}
onClick() {
...
}
rd(x)
4
Conventional Race Detection
e.g., FastTrack [PLDI’09]
Looper Thread
onClick() {
...
}
Regular Threads
Causal order: happenssend(
signal(m)
)
before (
) defined by
synchronization operations
wr(x)
onServiceConnected() {
...
}
Conflict: Read-Write or
wait(m)
rd(x)
Write-Write data accesses
to same location
Race (
): Conflicts that
are not causally ordered
5
Conventional Race Detection: Problem
Looper Thread
onClick() {
send(
Regular Threads
);
}
onReceive() {
*p;
}
onDestroy() {
NullPointerException!
p = null;
}
Conventional race detectors cannot find such errors in Android
Problem: Causality model is too strict
Should not assume program order between events
6
Model Events as Threads?
Event
Event
onReceive() {
onDestroy() {
}
}
Event
onClick() {
send(
}
);
Regular Threads
p = null;
*p;
Race
7
Events as Threads: Problem
Regular Threads
Event
onServiceConnected() {
Event
*p;
}
send(
)
send(
)
onDestroy() {
False race
p = null;
}
Missing causal order!
Problem: Causality model is too weak
Android system guarantees certain causal orders
between events
8
Challenge 1: Modeling Causality
Goal: Precisely infer causal order between events
that programmers can assume
Looper Thread
A
onClick() {
send(
}
B
B );
onReceive() {
*p;
}
C
A→B
C || B
onDestroy() {
p = null;
}
9
Challenge 2: Not All Races are Bugs
Races between events
(e.g., ~9000 in ConnectBot)
Order
violations
Events
Atomicity
violations
Events
Not a problem
in
p = new T;
Android
events!
p = null;
*p;
p = null;
*p;
One looper thread executes all
events non-preemptively
Solution: Commutativity analysis identifies races
that cause order violations
10
Outline
• Causality Model
• Commutativity Analysis
• Implementation & Results
11
Conventional causal order; Event atomicity; Event queue order
Causality Model
• Android uses both thread-based and eventbased models
• Causal order is derived based on following
rules:
1. Conventional causal order;
order in thread-based model
2. Event atomicity;
atomicity
3. Event queue order
12
Conventional causal order; Event atomicity; Event queue order
Looper Thread
begin(A)
fork(thread)
Fork-join
Regular Thread
begin(thread)
end(A)
Program order
send(B)
begin(B)
Send
fork(thread) → begin(thread)
end(thread) → join(thread)
signal(m) → wait(m)
signal(m)
end(B)
Signal-wait
wait(m)
send(event) → begin(event)
13
Conventional causal order; Event atomicity; Event queue order
One looper thread executes all events non-preemptively
=> events are atomic
Looper Thread
begin(A)
Regular Thread
fork(thread)
begin(thread)
end(A)
begin(B)
Ordered due to
event atomicity
send(B)
begin(A) → end(B)
end(A) → begin(B)
end(B)
14
Conventional causal order; Event atomicity; Event queue order
Looper Thread
Regular Thread
Event Queue
send(A)
A
send(B)
B
begin(A)
send(A) → send(B)
end(A)
begin(B)
Ordered due to
FIFO queue order
end(A) → begin(B)
end(B)
15
Conventional causal order; Event atomicity; Event queue order
It’s Not That Simple…
Special send APIs can overrule the FIFO order
– Event with execution delay
– Prioritize an event
• sendAtFront(event): inserts event to queue’s front
Special event queue rules handle these APIs.
See paper for details.
16
Event Orders due to External Input
Looper Thread
A
onClick() {
send(
}
B
Assume all events generated
by the external environment
are ordered
B );
onReceive() {
*p;
}
C
onDestroy() {
p = null;
}
17
What is External Input?
External Environment
surfaceflinger
context_manager
IPC
system_server
App
18
Outline
• Causality Model
• Commutativity Analysis
• Implementation & Results
19
Problem: Not All Races are Bugs
Races between events
Order
violations
Atomicity
violations
Not a problem in
Android events!
20
Order Violations in Events
Looper Thread
Looper Thread
onReceive() {
*p;
}
onDestroy() {
p = null;
}
Race between non-commutative events
=> order violation
21
Races in Commutative Events
Looper Thread
Looper Thread
onLayout() {
if(!flag)
return;
resize();
}
onPause() {
flag = false;
}
racy events are commutative
=> not a race bug
Hard to determine if events are commutative!
22
Solution: Commutativity Analysis
Report races between known non-commutative
operations -- uses & frees
Looper Thread
A
onClick() {
send(
}
Heuristics to handle
commutative events with
uses and frees.
See paper for details.
B
B );
onReceive() {
*p;
}
C
onDestroy() {
p = null;
}
Use
Free
23
Outline
• Causality Model
• Commutativity Analysis
• Implementation & Results
24
CAFA: Race Detection Tool for Android
App
surfaceflinger
context_manager
Java Libs
system_server
Java Libs
Dalvik VM
Dalvik VM
Native Libs
Android Kernel
Native Libs
IPC Binder
Logger
CAFA
Analyzer
Offline
Also
Logger
device
race
the
detector
system
in the
kernel
service
based
for
on
Logs logs
data
access
operations
synchronization
operations
processes
trace
graph
collection
reachability
forinference
complete
related
to uses
andtest
freescausality
for causality
25
Tested Applications
26
Use-after-Free Races
115 races; 69 race bugs (67 unknown bugs)
32 benign races (27.8%):
Imprecise commutative
analysis
31 (27.0%)
46 (40.0%)
13 (11.3%)
38 (33.0%)
25 (21.7%)
 Races in conventional
causality model
 Races in Android
causality model
Between events
Between threads
 False positives
14 false races (12.2%):
Imprecise causal order:
-- Imperfect implementation
27
Performance Overhead
• Trace collection
– 2x to 6x; avg: ~3.2x
– Interactive performance is fair
• Offline analysis
– Depends on number of events
– 30 min. to 16 hrs. for analyzing ~3000 to ~7000 events
28
Summary
• Races due to asynchronous events is wide spread
• Contributions
– Causality model for Android events
– Commutativity analysis identifies races that can cause
order violations
– Found 67 unknown race bugs with 60% precision
• Future work
– Commutativity analysis for finding a broader set of order
violations
– Optimize performance
29