Presentation

Download Report

Transcript Presentation

DTHREADS: Efficient Deterministic
Multithreading
Insanity: Doing the same
thing over and over again
and expecting different
results.
Tongping Liu, Charlie Curtsinger, Emery Berger
In the Beginning…
2
There was the Core.
3
And it was Good.
4
It gave us our Daily Speed.
5
Until the Apocalypse.
6
And the Speed was no Moore.
7
And then came a False Prophet…
8
9
10
Want speed?
11
I BRING YOU THE GIFT OF PARALLELISM!
12
color = ; row = 0; // globals
void nextStripe(){
for (c = 0; c < Width; c++)
drawBox (c,row,color);
color = (color == )?  : ;
row++;
}
for (n = 0; n < 9; n++)
pthread_create(t[n], nextStripe);
for (n = 0; n < 9; n++)
pthread_join(t[n]);
JUST USE THREADS…
13
14
15
16
17
18
race conditions
pthreads
atomicity violations
deadlock
order violations
Salvation?
19
20
21
race conditions
pthreads
D
THREADS
atomicity violations
deadlock
order violations
deterministic
22
DTHREADS Enables…
Race-free Executions
Replay Debugging w/o Logging
Replicated State
Machines
runtime relative to pthreads
DTHREADS: Efficient Determinism
8.4
7.8
6
5
4
3
2
1
0
CoreDet
dthreads
pthreads
Usually faster than the state of the art
23
runtime relative to pthreads
DTHREADS: Efficient Determinism
8.4
7.8
6
5
4
3
2
1
0
CoreDet
dthreads
pthreads
Generally as fast or faster than pthreads
24
25
DTHREADS: Easy to Use
% g++ myprog.cpp –lpthread
26
Isolation
shared address space
disjoint address spaces
27
Performance: Processes vs. Threads
1.4
threads
processes
Normalized Execution Time
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1
2
1024
4
8
16
32
64
128
Thread Execution Time (ms)
256
512
28
Performance: Processes vs. Threads
1.4
threads
processes
Normalized Execution Time
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1
2
1024
4
8
16
32
64
128
Thread Execution Time (ms)
256
512
29
Performance: Processes vs. Threads
1.4
threads
processes
Normalized Execution Time
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1
2
1024
4
8
16
32
64
128
Thread Execution Time (ms)
256
512
30
“Shared Memory”
31
“Shared Memory”
Snapshot pages
before modifications
32
“Shared Memory”
Write back diffs
33
Update in Deterministic Time & Order
Parallel
Serial
mutex_lock
“Thread” 1
cond_wait
“Thread” 2
pthread_create
“Thread” 3
Para
runtime relative to pthreads
DTHREADS performance analysis
4
3
2
1
0
dthreads
pthreads
34
35
The Culprit: False Sharing
Core 2
Core 1
Thread 1
Thread 2
Invalidate
Main Memory
36
The Culprit: False Sharing
Core 2
Core 1
Thread 1
Thread 2
Invalidate
20x
Main Memory
37
DTHREADS: Eliminates False Sharing!
Core 1
Process 1
Process 1
Process 2
Global State
Core 2
Process 2
38
runtime relative to pthreads
DTHREADS: Detailed Analysis
6
5
4
3
2
1
0
ordering only
isolation only
dthreads
39
runtime relative to pthreads
DTHREADS: Detailed Analysis
6
5
4
3
2
1
0
ordering only
isolation only
dthreads
40
runtime relative to pthreads
DTHREADS: Detailed Analysis
6
5
4
3
2
1
0
ordering only
isolation only
dthreads
DTHREADS: Scalable Determinism
speedup of 8 cores over 2 cores
4
3
2
1
0
CoreDet
dthreads
pthreads
41
DTHREADS: Scalable Determinism
speedup of 8 cores over 2 cores
4
3
2
1
0
CoreDet
dthreads
pthreads
42
DTHREADS: Scalable Determinism
speedup of 8 cores over 2 cores
4
3
2
1
0
CoreDet
dthreads
pthreads
43
44
DTHREADS
% g++ myprog.cpp –l pthread
45