Software Transactions: A Programming-Languages Perspective Dan Grossman University of Washington 29 January 2008 Atomic An easier-to-use and harder-to-implement primitive void deposit(int x){ synchronized(this){ int tmp = balance; tmp +=

Download Report

Transcript Software Transactions: A Programming-Languages Perspective Dan Grossman University of Washington 29 January 2008 Atomic An easier-to-use and harder-to-implement primitive void deposit(int x){ synchronized(this){ int tmp = balance; tmp +=

Software Transactions:
A Programming-Languages Perspective
Dan Grossman
University of Washington
29 January 2008
Atomic
An easier-to-use and harder-to-implement primitive
void deposit(int x){
synchronized(this){
int tmp = balance;
tmp += x;
balance = tmp;
}}
void deposit(int x){
atomic {
int tmp = balance;
tmp += x;
balance = tmp;
}}
lock acquire/release
(behave as if)
no interleaved computation
29 January 2008
Dan Grossman, Software Transactions
2
Viewpoints
Software transactions good for:
• Software engineering (avoid races & deadlocks)
• Performance (optimistic “no conflict” without locks)
Research should be guiding:
• New hardware with transactional support
• Software support
– Semantic mismatch between language & hardware
– Prediction: hardware for the common/simple case
– May be fast enough without hardware
– Lots of nontransactional hardware exists
29 January 2008
Dan Grossman, Software Transactions
3
PL Perspective
Complementary to lower-level implementation work
Motivation:
– What is the essence of the advantage over locks?
Language design:
– Rigorous high-level semantics
– Interaction with rest of the language
Language implementation:
– Interaction with modern compilers
– New optimization needs
Answers urgently needed for the multicore era
29 January 2008
Dan Grossman, Software Transactions
4
Today, part 1
Language design, semantics:
1. Motivation: Example + the GC analogy [OOPSLA07]
2. Semantics: strong vs. weak isolation [PLDI07]* [POPL08]
3. Interaction w/ other features [ICFP05][SCHEME07][POPL08]
* Joint work with Intel PSL
29 January 2008
Dan Grossman, Software Transactions
5
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
5. Static optimizations for strong isolation [PLDI07]*
6. Multithreaded transactions
* Joint work with Intel PSL
29 January 2008
Dan Grossman, Software Transactions
6
Code evolution
void deposit(…) { synchronized(this) { … }}
void withdraw(…) { synchronized(this) { … }}
int balance(…) { synchronized(this) { … }}
29 January 2008
Dan Grossman, Software Transactions
7
Code evolution
void
void
int
void
deposit(…) {
withdraw(…) {
balance(…) {
transfer(Acct
synchronized(this) { … }}
synchronized(this) { … }}
synchronized(this) { … }}
from, int amt) {
if(from.balance()>=amt && amt < maxXfer) {
from.withdraw(amt);
this.deposit(amt);
}
}
29 January 2008
Dan Grossman, Software Transactions
8
Code evolution
void deposit(…) { synchronized(this) { … }}
void withdraw(…) { synchronized(this) { … }}
int balance(…) { synchronized(this) { … }}
void transfer(Acct from, int amt) {
synchronized(this) {
//race
if(from.balance()>=amt && amt < maxXfer) {
from.withdraw(amt);
this.deposit(amt);
}
}
}
29 January 2008
Dan Grossman, Software Transactions
9
Code evolution
void deposit(…) { synchronized(this) { … }}
void withdraw(…) { synchronized(this) { … }}
int balance(…) { synchronized(this) { … }}
void transfer(Acct from, int amt) {
synchronized(this) {
synchronized(from) { //deadlock (still)
if(from.balance()>=amt && amt < maxXfer) {
from.withdraw(amt);
this.deposit(amt);
}
}}
}
29 January 2008
Dan Grossman, Software Transactions
10
Code evolution
void deposit(…) { atomic { … }}
void withdraw(…) { atomic { … }}
int balance(…) { atomic { … }}
29 January 2008
Dan Grossman, Software Transactions
11
Code evolution
void
void
int
void
deposit(…) {
withdraw(…) {
balance(…) {
transfer(Acct
atomic { … }}
atomic { … }}
atomic { … }}
from, int amt) {
//race
if(from.balance()>=amt && amt < maxXfer) {
from.withdraw(amt);
this.deposit(amt);
}
}
29 January 2008
Dan Grossman, Software Transactions
12
Code evolution
void deposit(…) { atomic { … }}
void withdraw(…) { atomic { … }}
int balance(…) { atomic { … }}
void transfer(Acct from, int amt) {
atomic {
//correct and parallelism-preserving!
if(from.balance()>=amt && amt < maxXfer){
from.withdraw(amt);
this.deposit(amt);
}
}
}
29 January 2008
Dan Grossman, Software Transactions
13
But can we generalize
So transactions sure looks appealing…
But what is the essence of the benefit?
Transactional Memory (TM) is to
shared-memory concurrency
as
Garbage Collection (GC) is to
memory management
29 January 2008
Dan Grossman, Software Transactions
14
GC in 60 seconds
• Allocate objects in the heap
• Deallocate objects to reuse heap space
– If too soon, dangling-pointer dereferences
– If too late, poor performance / space exhaustion
roots
heap objects
Automate deallocation via reachability approximation
29 January 2008
Dan Grossman, Software Transactions
15
GC Bottom-line
Established technology with widely accepted benefits
• Even though it can perform arbitrarily badly in theory
• Even though you can’t always ignore how GC works
(at a high-level)
• Even though an active research area after 40 years
Now about that analogy…
29 January 2008
Dan Grossman, Software Transactions
16
The problem, part 1
concurrent programming
Why memory management is hard:
race conditions
Balance correctness (avoid dangling pointers)
loss of parallelism deadlock
And performance (no space waste or exhaustion)
Manual approaches require whole-program protocols
lock
Example: Manual reference count for each object
lock acquisition
• Must avoid garbage cycles
29 January 2008
Dan Grossman, Software Transactions
17
The problem, part 2
synchronization
Manual memory-management is non-modular:
• Caller and callee must know what each other access
or deallocate to ensure right memory is live
release
locks are held
• A small change can require wide-scale code changes
– Correctness requires knowing what data
subsequent computation will access
concurrent
29 January 2008
Dan Grossman, Software Transactions
18
The solution
Move whole-program protocol to language implementation
• One-size-fits-most implemented by experts
– Usually inside the compiler and run-time
TM
• GC system uses subtle invariants, e.g.:
– Object header-word bits
thread-shared
thread-local
– No unknown mature pointers to nursery objects
29 January 2008
Dan Grossman, Software Transactions
19
So far…
correctness
performance
automation
new objects
29 January 2008
memory management
dangling pointers
space exhaustion
garbage collection
nursery data
Dan Grossman, Software Transactions
concurrency
races
deadlock
transactional memory
thread-local data
20
Incomplete solution
memory conflict
TM
GC a bad idea when “reachable” is a bad approximation
of “cannot-be-deallocated”
run-in-parallel
Open nested txns
Weak pointers overcome this fundamental limitation
– Best used by experts for well-recognized idioms
(e.g., software caches)
unique id generation
In extreme, programmers can encode
locking
TM
manual memory management on top of GC
TM
– Destroys most of GC’s advantages
29 January 2008
Dan Grossman, Software Transactions
21
Circumventing TM
class SpinLock {
private boolean b = false;
}
29 January 2008
void acquire() {
while(true)
atomic {
if(b) continue;
b = true;
return;
}
}
void release() {
atomic { b = false; }
}
Dan Grossman, Software Transactions
22
It really keeps going (see the essay)
memory management
correctness
dangling pointers
performance
space exhaustion
automation
garbage collection
new objects
nursery data
key approximation
reachability
manual circumvention weak pointers
complete avoidance
object pooling
uncontrollable approx. conservative collection
eager approach
reference-counting
lazy approach
tracing
external data
I/O of pointers
forward progress
real-time
static optimizations
liveness analysis
29 January 2008
Dan Grossman, Software Transactions
concurrency
races
deadlock
transactional memory
thread-local data
memory conflicts
open nesting
lock library
false memory conflicts
update-in-place
update-on-commit
I/O in transactions
obstruction-free
escape analysis
23
Lesson
Transactional memory is to
shared-memory concurrency
as
garbage collection is to
memory management
Huge but incomplete help for correct, efficient software
Analogy should help guide transactions research
29 January 2008
Dan Grossman, Software Transactions
24
Today, part 1
Language design, semantics:
1. Motivation: Example + the GC analogy [OOPSLA07]
2. Semantics: strong vs. weak isolation [PLDI07]* [POPL08]
[Katherine Moore]
[
3. Interaction w/ other features [ICFP05][SCHEME07][POPL08]
* Joint work with Intel PSL
29 January 2008
Dan Grossman, Software Transactions
25
“Weak” isolation
initially y==0
atomic {
y = 1;
x = 3;
y = x;
}
x = 2;
print(y); //1? 2? 5577?
Widespread misconception:
“Weak” isolation violates the “all-at-once” property
only if corresponding lock code has a race
(May still be a bad thing, but smart people disagree.)
29 January 2008
Dan Grossman, Software Transactions
26
It’s worse
Privatization: One of several examples where lock code
works and weak-isolation transactions do not
initially ptr.f == ptr.g
sync(lk) {
r = ptr;
ptr = new C();
}
assert(r.f==r.g);
ptr
sync(lk) {
++ptr.f;
++ptr.g;
}
f
g
(Example adapted from [Rajwar/Larus] and [Hudson et al])
29 January 2008
Dan Grossman, Software Transactions
27
It’s worse
Every published weak-isolation system lets the
assertion fail!
ptr
• Eager-update or lazy-update
initially ptr.f == ptr.g
atomic {
r = ptr;
ptr = new C();
}
assert(r.f==r.g);
29 January 2008
f
g
atomic {
++ptr.f;
++ptr.g;
}
Dan Grossman, Software Transactions
28
The need for semantics
• Which is wrong: the privatization code or the
transactions implementation?
• What other “gotchas” exist?
– What language/coding restrictions suffice to avoid
them?
• Can programmers correctly use transactions without
understanding their implementation?
– What makes an implementation correct?
Only rigorous source-level semantics can answer
29 January 2008
Dan Grossman, Software Transactions
29
What we did
Formal operational semantics for a collection of similar
languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en
a’;H’;e1’|| … ||en’
Multiple languages, including:
29 January 2008
Dan Grossman, Software Transactions
30
What we did
Formal operational semantics for a collection of similar
languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en
a’;H’;e1’|| … ||en’
Multiple languages, including:
1. “Strong”: If one thread is in a transaction, no other
thread may use shared memory or enter a transaction
29 January 2008
Dan Grossman, Software Transactions
31
What we did
Formal operational semantics for a collection of similar
languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en
a’;H’;e1’|| … ||en’
Multiple languages, including:
2. “Weak-1-lock”: If one thread is in a transaction, no
other thread may enter a transaction
29 January 2008
Dan Grossman, Software Transactions
32
What we did
Formal operational semantics for a collection of similar
languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en
a’;H’;e1’|| … ||en’
Multiple languages, including:
3. “Weak-undo”: Like weak, plus a transaction may abort
at any point, undoing its changes and restarting
29 January 2008
Dan Grossman, Software Transactions
33
A family
Now we have a family of languages:
“Strong”: … other threads can’t use memory
or start transactions
“Weak-1-lock”: … other threads can’t start transactions
“Weak-undo”: like weak, plus undo/restart
So we can study how family members differ and
conditions under which they are the same
Oh, and we have a kooky, ooky name:
The AtomsFamily
29 January 2008
Dan Grossman, Software Transactions
34
Easy Theorems
Theorem: Every program behavior in strong is possible in
weak-1-lock
Theorem: weak-1-lock allows behaviors strong does not
Theorem: Every program behavior in weak-1-lock is
possible in weak-undo
Theorem: (slightly more surprising): weak-undo allows
behavior weak-1-lock does not
29 January 2008
Dan Grossman, Software Transactions
35
Hard theorems
Consider a (formally defined) type system that ensures
any mutable memory is either:
– Only accessed in transactions
– Only accessed outside transactions
Theorem: If a program type-checks, it has the same
possible behaviors under strong and weak-1-lock
Theorem: If a program type-checks, it has the same
possible behaviors under weak-1-lock and weak-undo
29 January 2008
Dan Grossman, Software Transactions
36
A few months in 1 picture
strong-undo
strong
29 January 2008
weak-1-lock
Dan Grossman, Software Transactions
weak-undo
37
Lesson
Weak isolation has surprising behavior;
formal semantics let’s us model the behavior and
prove sufficient conditions for avoiding it
In other words: With a (too) restrictive type system,
get semantics of strong and performance of weak
29 January 2008
Dan Grossman, Software Transactions
38
Today, part 1
Language design, semantics:
1. Motivation: Example + the GC analogy [OOPSLA07]
2. Semantics: strong vs. weak isolation [PLDI07]* [POPL08]
3. Interaction w/ other features [ICFP05][SCHEME07][POPL08]
* Joint work with Intel PSL
29 January 2008
Dan Grossman, Software Transactions
39
What if…
Real languages need precise semantics for all feature
interactions. For example:
•
•
•
•
•
Native Calls [Ringenburg]
Exceptions [Ringenburg, Kimball]
First-class continuations [Kimball]
Thread-creation [Moore]
Java-style class-loading [Hindman]
• Open: Bad interactions with memory-consistency model
See joint work with Manson and Pugh [MSPC06]
29 January 2008
Dan Grossman, Software Transactions
40
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
[Michael Ringenburg, Aaron Kimball]
5. Static optimizations for strong isolation [PLDI07]*
6. Multithreaded transactions
* Joint work with Intel PSL
29 January 2008
Dan Grossman, Software Transactions
41
Interleaved execution
The “uniprocessor (and then some)” assumption:
Threads communicating via shared memory don't
execute in “true parallel”
Important special case:
• Uniprocessors still exist
• Many language implementations assume it
(e.g., OCaml, Scheme48)
• Multicore may assign one core to an application
29 January 2008
Dan Grossman, Software Transactions
42
Implementing atomic
Key pieces:
• Execution of an atomic block logs writes
• If scheduler pre-empts during atomic, rollback the
thread
• Duplicate code or bytecode-interpreter dispatch so
non-atomic code is not slowed by logging
29 January 2008
Dan Grossman, Software Transactions
43
Logging example
int x=0, y=0;
void f() {
int z = y+1;
x = z;
}
void g() {
y = x+1;
}
void h() {
atomic {
y = 2;
f();
g();
}
}
29 January 2008
Executing atomic block:
• build LIFO log of old values:
y:0
z:?
x:0
y:2
Rollback on pre-emption:
• Pop log, doing assignments
• Set program counter and
stack to beginning of atomic
On exit from atomic:
• Drop log
Dan Grossman, Software Transactions
44
Logging efficiency
y:0
z:?
x:0
y:2
Keep the log small:
• Don’t log reads (key uniprocessor advantage)
• Need not log memory allocated after atomic entered
– Particularly initialization writes
• Need not log an address more than once
– To keep logging fast, switch from array to
hashtable after “many” (50) log entries
29 January 2008
Dan Grossman, Software Transactions
45
Evaluation
Strong isolation on uniprocessors at little cost
– See papers for “in the noise” performance
• Memory-access overhead
read
write
not in atomic
in atomic
none
none
none
log (2 more writes)
Recall initialization writes need not be logged
• Rare rollback
29 January 2008
Dan Grossman, Software Transactions
46
Lesson
Implementing transactions in software for a uniprocessor
is so efficient it deserves special-casing
Note: Don’t run other multicore services on a
uniprocessor either
29 January 2008
Dan Grossman, Software Transactions
47
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
5. Static optimizations for strong isolation [PLDI07]*
[Steven Balensiefer, Benjamin Hindman]
6. Multithreaded transactions
* Joint work with Intel PSL
29 January 2008
Dan Grossman, Software Transactions
48
Strong performance problem
Recall uniprocessor overhead:
read
write
not in atomic
none
none
in atomic
none
some
With parallelism:
read
write
29 January 2008
not in atomic
none iff weak
in atomic
some
none iff weak
some
Dan Grossman, Software Transactions
49
Optimizing away strong’s cost
Thread local
Not accessed
in transaction
Immutable
New: static analysis for not-accessed-in-transaction …
29 January 2008
Dan Grossman, Software Transactions
50
Not-accessed-in-transaction
Revisit overhead of not-in-atomic for strong isolation,
given information about how data is used in atomic
not in atomic
no atomic no atomic atomic
access
write
write
read
write
none
none
none
some
some
some
in atomic
some
some
Yet another client of pointer-analysis
29 January 2008
Dan Grossman, Software Transactions
51
Analysis details
•
Whole-program, context-insensitive, flow-insensitive
– Scalable, but needs whole program
•
Can be done before method duplication
– Keep lazy code generation without losing precision
•
Given pointer information, just two more passes
1. How is an “abstract object” accessed
transactionally?
2. What “abstract objects” might a non-transactional
access use?
29 January 2008
Dan Grossman, Software Transactions
52
Collaborative effort
• UW: static analysis using pointer analysis
– Via Paddle/Soot from McGill
• Intel PSL: high-performance STM
– Via compiler and run-time
Static analysis annotates bytecodes, so the compiler
back-end knows what it can omit
29 January 2008
Dan Grossman, Software Transactions
53
Benchmarks
6.00
5.00
Time (s)
4.00
3.00
2.00
1.00
0.00
1
2
4
8
16
# Threads
Synch
Weak Atom
Strong Atom No Opts
+JIT Opts
+DEA
+Static Opts
Tsp
29 January 2008
Dan Grossman, Software Transactions
54
Benchmarks
Average time per 10,000 ops (s)
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1
2
4
8
16
# Threads
Synch
Weak Atom
Strong Atom No Opts
+JIT Opts
+DEA
+Static Opts
JBB
29 January 2008
Dan Grossman, Software Transactions
55
Lesson
The cost of strong isolation is in the nontransactional code;
compiler optimizations help a lot
29 January 2008
Dan Grossman, Software Transactions
56
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
5. Static optimizations for strong isolation [PLDI07]*
6. Multithreaded transactions [Aaron Kimball]
Caveat: ongoing work
* Joint work with Intel PSL
29 January 2008
Dan Grossman, Software Transactions
57
Multithreaded Transactions
• Most implementations (hw or sw) assume code inside
a transaction is single-threaded
– But isolation and parallelism are orthogonal
– And Amdahl’s Law will strike with manycore
• Language design: need nested transactions
• Currently modifying Microsoft’s Bartok STM
– Key: correct logging without sacrificing parallelism
• Work perhaps ahead of the technology curve
– like concurrent garbage collection
29 January 2008
Dan Grossman, Software Transactions
58
Credit
Semantics: Katherine Moore
Uniprocessor: Michael Ringenburg, Aaron Kimball
Optimizations: Steven Balensiefer, Ben Hindman
Implementing multithreaded transactions: Aaron Kimball
Memory-model issues: Jeremy Manson, Bill Pugh
High-performance strong STM: Tatiana Shpeisman,
Vijay Menon, Ali-Reza Adl-Tabatabai,
Richard
Hudson, Bratin Saha
wasp.cs.washington.edu
29 January 2008
Dan Grossman, Software Transactions
59
Please read
High-Level Small-Step Operational Semantics for Transactions
(POPL08) Katherine F. Moore and Dan Grossman
The Transactional Memory / Garbage Collection Analogy
(OOPSLA07) Dan Grossman
Software Transactions Meet First-Class Continuations
(SCHEME07) Aaron Kimball and Dan Grossman
Enforcing Isolation and Ordering in STM
(PLDI07) Tatiana Shpeisman, Vijay Menon,
Ali-Reza Adl-Tabatabai, Steve Balensiefer, Dan Grossman,
Richard Hudson, Katherine F. Moore, and Bratin Saha
Atomicity via Source-to-Source Translation
(MSPC06) Benjamin Hindman and Dan Grossman
What Do High-Level Memory Models Mean for Transactions?
(MSPC06) Dan Grossman, Jeremy Manson, and William Pugh
AtomCaml: First-Class Atomicity via Rollback
(ICFP05) Michael F. Ringenburg and Dan Grossman
29 January 2008
Dan Grossman, Software Transactions
60
Lessons
1. Transactions: the garbage collection of shared memory
2. Semantics lets us prove sufficient conditions for
avoiding weak-isolation anomalies
3. Must define interaction with features like exceptions
4. Uniprocessor implementations are worth special-casing
5. Compiler optimizations help remove the overhead in
nontransactional code resulting from strong isolation
6. Amdahl’s Law suggests multithreaded transactions,
which we believe we can make scalable
29 January 2008
Dan Grossman, Software Transactions
61