Transcript talk slides

Serialization Sets
A Dynamic Dependence-Based
Parallel Execution Model
Matthew D. Allen
Srinath Sridharan
Gurindar S. Sohi
University of Wisconsin-Madison
Motivation
• Multicore processors ubiquitous
– Performance via parallel execution
• Multithreaded programming is problematic
–
–
–
–
Dependences encoded statically
Difficult to reason about locks, synchronization
Many errors not found in sequential programs
Execution is nondeterministic
• Need better parallel execution models!
February 16, 2009
PPoPP 2009
2
Serialization Sets Overview
• Sequential program with annotations
– Identify potentially independent methods
– Associate a serializer with these methods
• Serializer groups dependent method invocations
into serialization sets
– Runtime executes in order to honor dependences
• Serializer attempts to map independent methods
invocations into different sets
– Runtime opportunistically parallelizes execution
February 16, 2009
PPoPP 2009
3
Serialization Sets Overview
• Sequential program with no locks and
no explicit synchronization
• Deterministic, race-free execution
• Comparable performance to multithreading
– Sometimes better!
February 16, 2009
PPoPP 2009
4
Outline
•
•
•
•
•
Overview
Serialization Sets Execution Model
Prometheus: C++ Library for SS
Experimental Evaluation
Related Work & Conclusions
February 16, 2009
PPoPP 2009
5
Running Example
# of transactions?
trans_t* trans;
while ((trans = get_trans ()) != NULL) {
account_t* account = trans->account;
Points to?
if (trans->type == DEPOSIT)
account->deposit (trans->amount);
if (trans->type == WITHDRAW)
account->withdraw (trans->amount);
Loop-carried
dependence?
}
Several static unknowns!
February 16, 2009
PPoPP 2009
6
Multithreading Strategy
trans_t* trans;
Oblivious to what accounts each
while ((trans = get_trans ()) != NULL) {
thread may access!
must
account_t*
account
= lock
trans[i]->account;
1) Read →
all Methods
transactions
into
anaccount
array to
ensureofmutual
exclusion
2) Divide→chunks
array among
multiple threads
if (trans->type == DEPOSIT)
account->deposit (trans->amount);
if (trans->type == WITHDRAW)
account->withdraw (trans->amount);
}
February 16, 2009
PPoPP 2009
7
Serialization Sets
• Potentially independent methods
–
–
–
–
Modify only data owned by object
Fields / Data members
Pointers to non-shared data
Consistent with OO practices (modularity,
encapsulation, information hiding)
• Modifying methods for independence
– Store return value in object, retrieve with accessor
– Copy pointer data
February 16, 2009
PPoPP 2009
8
Serialization Sets
• Divide program into isolation epochs
– Data partitioned into domains
• Privately writable: data that may be read or
written by a single serialization set
– Object or set of objects
– Serializer dynamically identifies serialization set
for each method invocation
• Shared read-only: data that may be read (but
not written) by any method
February 16, 2009
PPoPP 2009
9
Example with Serialization Sets
writable <account_t, acct_ser_t> pw_account_t;
Declare
privately-writable
account type: uses account number to
At execution,
delegate:
Serializer
begin_isolation
();
1) Executes
serializercompute serialization set
trans_t*
trans;
Begin isolation epoch
2) Identifies
serialization
set
while
((trans
= get_trans
()) != NULL) {
3)pw_account_t*
Inserts invocation
in serialization
set
account
= trans->account;
if (trans->type == DEPOSIT)
delegate(account, deposit, trans->amount);
if (trans->type == WITHDRAW)
delegate(account,
withdraw, trans->amount);
End isolation
epoch
}
Delegate indicates();
potentiallyend_isolation
independent operations
February 16, 2009
PPoPP 2009
10
Program
context
Delegate context
SS #100
SS #200
SS #300
delegate
deposit
acct=100
delegate
withdraw
$2000
delegate
acct=200
Serializer:
account number
withdraw
$1000
delegate computes SS with
ss_t ss = account->get_number();
acct=100
delegate
$50
delegate
withdraw
delegate
withdraw
acct=100
acct=200
delegate
$20
$1000
deposit
acct=100
$300
February 16, 2009
PPoPP 2009
withdraw
acct=300
$350
deposit
acct=300
$5000
11
Program
context
thread
delegate
delegate
delegate
delegate
delegate
delegate
delegate
delegate
Delegate threads
context
SS #100
DelegateSS
0 #200Delegate
SS1#300
depositdeposit
withdraw
acct=100
acct=100
acct=200
withdraw
$2000 $2000
$1000
acct=200
withdraw
withdraw
withdraw $1000 withdraw
acct=300
acct=100
acct=100
acct=300$350
$50
$50
$350
deposit
withdraw
withdraw
deposit
acct=300
withdraw
acct=100
acct=100
acct=300
$5000
acct=200
$20
$20
$5000
$1000
depositdeposit
withdraw
acct=100
acct=100
acct=200
$300
$300
$1000
Race-free, deterministic execution without synchronization!
February 16, 2009
PPoPP 2009
12
Parallel Execution w/o Sharing
1. Vary data in privately-writable/read-only
domains in alternating epochs
•
Outputs of one epoch become inputs of the next
2. Associative, commutative methods
•
•
Operate on local copy of state
Reduction to summarize result
3. Containers manipulated by program context
•
Delegate operations on underlying data
February 16, 2009
PPoPP 2009
13
Outline
•
•
•
•
•
Overview
Serialization Sets Execution Model
Prometheus: C++ Library for SS
Experimental Evaluation
Related Work & Conclusions
February 16, 2009
PPoPP 2009
14
Prometheus: C++ Library for SS
• Template library
– Compile-time instantiation of SS data structures
– Metaprogramming for static type checking
• Runtime orchestrates parallel execution
• Portable
– x86, x86_64, SPARC V9
– Linux, Solaris
February 16, 2009
PPoPP 2009
15
Prometheus Serializers
• Serializers
– Subclass serializer base class and override method
– Or use built-in serializer supplied by library
• Reducibles
– Subclass reducible base class and override virtual
reduce method
– Reduction automatically performed on first use
after isolation epoch ends
February 16, 2009
PPoPP 2009
16
Prometheus Runtime
Delegate
Thread 0
Program
Thread
Delegate
Thread 1
Delegate assignment:
SS % NUM_THREADS
Delegate
Thread 2
Communication queues:
Fast-Forward [PPoPP 2008] +
Polymorphic interface
February 16, 2009
PPoPP 2009
17
Debugging Support
• Tag all data accessed by serialization set
– Objects
– Smart pointers
• Any data accessed by multiple serialization
sets indicates programmer error
• Problem: can’t detect some kinds of missing
annotations
– Future work: static checking of annotations
February 16, 2009
PPoPP 2009
18
Debugging Support
• Deterministic model means we can simulate
SS execution in sequential program
– Prometheus support for compiling debug version
– Do all debugging on sequential program!
• Correct sequential → correct parallel
(caveat: for a given input)
February 16, 2009
PPoPP 2009
19
Outline
•
•
•
•
•
Overview
Serialization Sets Execution Model
Prometheus: C++ Library for SS
Experimental Evaluation
Related Work & Conclusions
February 16, 2009
PPoPP 2009
20
Evaluation Methodology
• Benchmarks
– Lonestar, NU-MineBench, PARSEC, Phoenix
• Conventional Parallelization
– pthreads, OpenMP
• Prometheus versions
– Port program to sequential C++ program
– Idiomatic C++: OO, inheritance, STL
– Parallelize with serialization sets
February 16, 2009
PPoPP 2009
21
Results Summary
Speedup relative to original sequential
program
Conventional Parallel
20
18
16
14
12
10
8
6
4
2
0
Serialization Sets
18.1
16.3
17.9 17.5
13.8
12.8
12.3
9.2
7.5 7.5
4.65.0
12.3
10.4
8.78.2
5.0
3.9
4 Socket AMD Barcelona (4-way multicore) = 16 total cores
February 16, 2009
PPoPP 2009
22
Results Summary
Conventional Parallel
Serialization Sets
Speedup reltaive to original
sequential program
10
8.7
9
8.2
8
7.0
7
6
5
4
8.3
6.4 6.1
4.0 3.8
3
2
1
0
AMD Barcelona AMD Barcelona Sun UltraSPARC Sun UltraSPARC
Multicore (4)
ccNUMA (16) T-1 Multicore (32) III+ SMP (8)
February 16, 2009
PPoPP 2009
23
Outline
•
•
•
•
•
Overview
Serialization Sets Execution Model
Prometheus: C++ Library for SS
Experimental Evaluation
Related Work & Conclusions
February 16, 2009
PPoPP 2009
24
Related Work
• Actors / Active Objects
– Hewitt [JAI 1977]
• MultiLisp
– Halstead [ACM TOPLAS 1985]
• Inspector-Executor
– Wu et al. [ICPP 1991]
• Jade
– Rinard and Lam [ACM TOPLAS 1998]
• Cilk
– Frigo et al. [PLDI 1998]
• OpenMP
February 16, 2009
PPoPP 2009
25
Conclusions
• Sequential program with annotations
– No explicit synchronization, no locks
• Programmers focus on keeping computation
private to object state
– Consistent with OO programming practices
• Dependence-based model
– Deterministic race-free parallel execution
• Performance close to, and sometimes better,
than multithreading
February 16, 2009
PPoPP 2009
26
Questions
February 16, 2009
PPoPP 2009
27