software transactions: a programming-languages perspective
DESCRIPTION
Software Transactions: A Programming-Languages Perspective. Dan Grossman University of Washington 26 March 2008. Atomic. An easier-to-use and harder-to-implement primitive. void deposit (int x ){ synchronized (this){ int tmp = balance; tmp += x; balance = tmp; }}. - PowerPoint PPT PresentationTRANSCRIPT
Software Transactions: A Programming-Languages Perspective
Dan GrossmanUniversity of Washington
26 March 2008
26 March 2008 Dan Grossman, Software Transactions 2
Atomic
An easier-to-use and harder-to-implement primitive
lock acquire/release (behave as if)no interleaved computation
void deposit(int x){synchronized(this){ int tmp = balance; tmp += x; balance = tmp;}}
void deposit(int x){atomic { int tmp = balance; tmp += x; balance = tmp; }}
26 March 2008 Dan Grossman, Software Transactions 3
Viewpoints
Software transactions good for:• Software engineering (avoid races & deadlocks)• Performance (optimistic “no conflict” without locks)
Research should be guiding:• New hardware with transactional support• Software support
– Semantic mismatch between language & hardware– Prediction: hardware for the common/simple case– May be fast enough without hardware– Lots of nontransactional hardware exists
26 March 2008 Dan Grossman, Software Transactions 4
PL Perspective
Complementary to lower-level implementation work
Motivation:– The essence of the advantage over locks
Language design: – Rigorous high-level semantics– Interaction with rest of the language
Language implementation: – Interaction with modern compilers – New optimization needs
Answers urgently needed for the multicore era
26 March 2008 Dan Grossman, Software Transactions 5
Today, part 1
Language design, semantics:
1. Motivation: Example + the GC analogy [OOPSLA07]
2. Semantics: strong vs. weak isolation [PLDI07]* [POPL08]
3. Interaction w/ other features [ICFP05][SCHEME07][POPL08]
* Joint work with Intel PSL
26 March 2008 Dan Grossman, Software Transactions 6
Today, part 2
Implementation:
4. On one core [ICFP05][SCHEME07]
5. Static optimizations for strong isolation [PLDI07]*
6. Multithreaded transactions
* Joint work with Intel PSL
26 March 2008 Dan Grossman, Software Transactions 7
Code evolution
void deposit(…) { synchronized(this) { … }}void withdraw(…) { synchronized(this) { … }}int balance(…) { synchronized(this) { … }}
26 March 2008 Dan Grossman, Software Transactions 8
Code evolution
void deposit(…) { synchronized(this) { … }}void withdraw(…) { synchronized(this) { … }}int balance(…) { synchronized(this) { … }}
void transfer(Acct from, int amt) {
if(from.balance()>=amt && amt < maxXfer) { from.withdraw(amt); this.deposit(amt); } }
26 March 2008 Dan Grossman, Software Transactions 9
Code evolution
void deposit(…) { synchronized(this) { … }}void withdraw(…) { synchronized(this) { … }}int balance(…) { synchronized(this) { … }}
void transfer(Acct from, int amt) { synchronized(this) { //race if(from.balance()>=amt && amt < maxXfer) { from.withdraw(amt); this.deposit(amt); } }}
26 March 2008 Dan Grossman, Software Transactions 10
Code evolution
void deposit(…) { synchronized(this) { … }}void withdraw(…) { synchronized(this) { … }}int balance(…) { synchronized(this) { … }}
void transfer(Acct from, int amt) { synchronized(this) { synchronized(from) { //deadlock (still) if(from.balance()>=amt && amt < maxXfer) { from.withdraw(amt); this.deposit(amt); } }}}
26 March 2008 Dan Grossman, Software Transactions 11
Code evolution
void deposit(…) { atomic { … }}void withdraw(…) { atomic { … }}int balance(…) { atomic { … }}
26 March 2008 Dan Grossman, Software Transactions 12
Code evolution
void deposit(…) { atomic { … }}void withdraw(…) { atomic { … }}int balance(…) { atomic { … }}
void transfer(Acct from, int amt) { //race if(from.balance()>=amt && amt < maxXfer) { from.withdraw(amt); this.deposit(amt); } }
26 March 2008 Dan Grossman, Software Transactions 13
Code evolution
void deposit(…) { atomic { … }}void withdraw(…) { atomic { … }}int balance(…) { atomic { … }}
void transfer(Acct from, int amt) { atomic { //correct and parallelism-preserving! if(from.balance()>=amt && amt < maxXfer){ from.withdraw(amt); this.deposit(amt); } }}
26 March 2008 Dan Grossman, Software Transactions 14
But can we generalize
So transactions sure looks appealing…
But what is the essence of the benefit?
Transactional Memory (TM) is to shared-memory concurrency
as Garbage Collection (GC) is to
memory management
26 March 2008 Dan Grossman, Software Transactions 15
GC in 60 seconds
Automate deallocation via reachability approximation
roots heap objects
• Allocate objects in the heap• Deallocate objects to reuse heap space
– If too soon, dangling-pointer dereferences– If too late, poor performance / space exhaustion
26 March 2008 Dan Grossman, Software Transactions 16
GC Bottom-line
Established technology with widely accepted benefits
• Even though it can perform arbitrarily badly in theory
• Even though you can’t always ignore how GC works (at a high-level)
• Even though an active research area after 40+ years
Now about that analogy…
26 March 2008 Dan Grossman, Software Transactions 17
The problem, part 1
Why memory management is hard:
Balance correctness (avoid dangling pointers)
And performance (space waste or exhaustion)
Manual approaches require whole-program protocols Example: Manual reference count for each object
• Must avoid garbage cycles
race conditions
concurrent programming
loss of parallelism deadlock
lock
lock acquisition
26 March 2008 Dan Grossman, Software Transactions 18
The problem, part 2
Manual memory-management is non-modular:
• Caller and callee must know what each other access or deallocate to ensure right memory is live
• A small change can require wide-scale code changes – Correctness requires knowing what data
subsequent computation will access
synchronization
locks are heldrelease
concurrent
26 March 2008 Dan Grossman, Software Transactions 19
The solution
Move whole-program protocol to language implementation• One-size-fits-most implemented by experts
– Usually inside the compiler and run-time
• GC system uses subtle invariants, e.g.:– Object header-word bits
– No unknown mature pointers to nursery objectsthread-shared thread-local
TM
26 March 2008 Dan Grossman, Software Transactions 20
So far…
memory management concurrencycorrectness dangling pointers racesperformance space exhaustion deadlockautomation garbage collection transactional memorynew objects nursery data thread-local data
26 March 2008 Dan Grossman, Software Transactions 21
Incomplete solution
GC a bad idea when “reachable” is a bad approximation of “cannot-be-deallocated”
Weak pointers overcome this fundamental limitation– Best used by experts for well-recognized idioms
(e.g., software caches)
In extreme, programmers can encode
manual memory management on top of GC– Destroys most of GC’s advantages
TM memory conflict
run-in-parallelOpen nested txns
unique id generation
locking TM
TM
26 March 2008 Dan Grossman, Software Transactions 22
Circumventing TM
class SpinLock { private boolean b = false; void acquire() { while(true)
atomic { if(b) continue; b = true; return; } } void release() { atomic { b = false; } }}
26 March 2008 Dan Grossman, Software Transactions 23
It really keeps going (see the essay)
memory management concurrencycorrectness dangling pointers racesperformance space exhaustion deadlockautomation garbage collection transactional memorynew objects nursery data thread-local data
key approximation reachability memory conflictsmanual circumvention weak pointers open nestingcomplete avoidance object pooling lock libraryuncontrollable approx. conservative collection false memory conflictseager approach reference-counting update-in-placelazy approach tracing update-on-commitexternal data I/O of pointers I/O in transactionsforward progress real-time obstruction-freestatic optimizations liveness analysis escape analysis
26 March 2008 Dan Grossman, Software Transactions 24
Lesson
Transactional memory is to
shared-memory concurrency
as
garbage collection is to
memory management
Huge but incomplete help for correct, efficient software
Analogy should help guide transactions research
26 March 2008 Dan Grossman, Software Transactions 25
Today, part 1
Language design, semantics:
1. Motivation: Example + the GC analogy [OOPSLA07]
2. Semantics: strong vs. weak isolation [PLDI07]* [POPL08]
[[Katherine Moore]
3. Interaction w/ other features [ICFP05][SCHEME07][POPL08]
* Joint work with Intel PSL
26 March 2008 Dan Grossman, Software Transactions 26
“Weak” isolation
Widespread misconception:
“Weak” isolation violates the “all-at-once” property only if corresponding lock code has a race
(May still be a bad thing, but smart people disagree.)
atomic { y = 1; x = 3; y = x;}
x = 2;print(y); //1? 2? 666?
initially y==0
26 March 2008 Dan Grossman, Software Transactions 27
It’s worse
Privatization: One of several examples where lock code works and weak-isolation transactions do not
(Example adapted from [Rajwar/Larus] and [Hudson et al])
sync(lk) { r = ptr; ptr = new C();}assert(r.f==r.g);
sync(lk) { ++ptr.f; ++ptr.g;}
initially ptr.f == ptr.gptr
f g
26 March 2008 Dan Grossman, Software Transactions 28
It’s worse
(Almost?) every published weak-isolation system lets the assertion fail!
• Eager-update or lazy-update
atomic { r = ptr; ptr = new C();}assert(r.f==r.g);
atomic { ++ptr.f; ++ptr.g;}
initially ptr.f == ptr.g
ptr
f g
26 March 2008 Dan Grossman, Software Transactions 29
The need for semantics
• Which is wrong: the privatization code or the transactions implementation?
• What other “gotchas” exist? – What language/coding restrictions suffice to avoid
them?
• Can programmers correctly use transactions without understanding their implementation?– What makes an implementation correct?
Only rigorous source-level semantics can answer
26 March 2008 Dan Grossman, Software Transactions 30
What we did
Formal operational semantics for a collection of similar languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en a’;H’;e1’|| … ||en’
Multiple languages, including:
26 March 2008 Dan Grossman, Software Transactions 31
What we did
Formal operational semantics for a collection of similar languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en a’;H’;e1’|| … ||en’
Multiple languages, including:
1. “Strong”: If one thread is in a transaction, no other thread may use shared memory or enter a transaction
26 March 2008 Dan Grossman, Software Transactions 32
What we did
Formal operational semantics for a collection of similar languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en a’;H’;e1’|| … ||en’
Multiple languages, including:
2. “Weak-1-lock”: If one thread is in a transaction, no other thread may enter a transaction
26 March 2008 Dan Grossman, Software Transactions 33
What we did
Formal operational semantics for a collection of similar languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en a’;H’;e1’|| … ||en’
Multiple languages, including:
3. “Weak-undo”: Like weak, plus a transaction may abort at any point, undoing its changes and restarting
26 March 2008 Dan Grossman, Software Transactions 34
A family
Now we have a family of languages:
“Strong”: … other threads can’t use memory or start transactions
“Weak-1-lock”: … other threads can’t start transactions “Weak-undo”: like weak, plus undo/restart
So we can study how family members differ and conditions under which they are the same
Oh, and we have a kooky, ooky name:
The AtomsFamily
26 March 2008 Dan Grossman, Software Transactions 35
Easy Theorems
Theorem: Every program behavior in strong is possible in weak-1-lock
Theorem: weak-1-lock allows behaviors strong does not
Theorem: Every program behavior in weak-1-lock is possible in weak-undo
Theorem (slightly more surprising): weak-undo allows behavior weak-1-lock does not
26 March 2008 Dan Grossman, Software Transactions 36
Hard theorems
Consider a (formally defined) type system that ensures any mutable memory is either:– Only accessed in transactions– Only accessed outside transactions
Theorem: If a program type-checks, it has the same possible behaviors under strong and weak-1-lock
Theorem: If a program type-checks, it has the same possible behaviors under weak-1-lock and weak-undo
26 March 2008 Dan Grossman, Software Transactions 37
A few months in 1 picture
strong weak-1-lock weak-undo
strong-undo
26 March 2008 Dan Grossman, Software Transactions 38
Lesson
Weak isolation has surprising behavior;
formal semantics let’s us model the behavior and
prove sufficient conditions for avoiding it
In other words: With a (too) restrictive type system, get semantics of strong and performance of weak
26 March 2008 Dan Grossman, Software Transactions 39
Today, part 1
Language design, semantics:
1. Motivation: Example + the GC analogy [OOPSLA07]
2. Semantics: strong vs. weak isolation [PLDI07]* [POPL08]
3. Interaction w/ other features [ICFP05][SCHEME07][POPL08]
* Joint work with Intel PSL
26 March 2008 Dan Grossman, Software Transactions 40
What if…
Real languages need precise semantics for all feature interactions. For example:
• Native Calls [Ringenburg]• Exceptions [Ringenburg, Kimball]• First-class continuations [Kimball]• Thread-creation [Moore]• Java-style class-loading [Hindman]
• Open: Bad interactions with memory-consistency model
See joint work with Manson and Pugh [MSPC06]
26 March 2008 Dan Grossman, Software Transactions 41
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
[Michael Ringenburg, Aaron Kimball]
5. Static optimizations for strong isolation [PLDI07]*
6. Multithreaded transactions
* Joint work with Intel PSL
26 March 2008 Dan Grossman, Software Transactions 42
Interleaved execution
The “uniprocessor (and then some)” assumption:
Threads communicating via shared memory don't execute in “true parallel”
Important special case:• Uniprocessors still exist• Many language implementations assume it
(e.g., OCaml, Scheme48)• Multicore may assign one core to an application
26 March 2008 Dan Grossman, Software Transactions 43
Implementing atomic
Key pieces:
• Execution of an atomic block logs writes
• If scheduler pre-empts during atomic, rollback the thread
• Duplicate code or bytecode-interpreter dispatch so non-atomic code is not slowed by logging
26 March 2008 Dan Grossman, Software Transactions 44
Logging example
Executing atomic block:• build LIFO log of old values:
y:0 z:? x:0 y:2
Rollback on pre-emption:• Pop log, doing assignments• Set program counter and
stack to beginning of atomic
On exit from atomic: • Drop log
int x=0, y=0;void f() { int z = y+1; x = z;}void g() { y = x+1;}void h() { atomic { y = 2; f(); g(); }}
26 March 2008 Dan Grossman, Software Transactions 45
Logging efficiency
Keep the log small:• Don’t log reads (key uniprocessor advantage)• Need not log memory allocated after atomic entered
– Particularly initialization writes• Need not log an address more than once
– To keep logging fast, switch from array to hashtable after “many” (50) log entries
y:0 z:? x:0 y:2
26 March 2008 Dan Grossman, Software Transactions 46
Evaluation
Strong isolation on uniprocessors at little cost – See papers for “in the noise” performance
• Memory-access overhead
Recall initialization writes need not be logged
• Rare rollback
not in atomic in atomic
read none none
write none log (2 more writes)
26 March 2008 Dan Grossman, Software Transactions 47
Lesson
Implementing transactions in software for a uniprocessor is so efficient it deserves special-casing
Note: Don’t run other multicore services on a uniprocessor either
26 March 2008 Dan Grossman, Software Transactions 48
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
5. Static optimizations for strong isolation [PLDI07]*
[Steven Balensiefer, Benjamin Hindman]
6. Multithreaded transactions
* Joint work with Intel PSL
26 March 2008 Dan Grossman, Software Transactions 49
Strong performance problem
Recall uniprocessor overhead:
not in atomic in atomic
read none none
write none some
With parallelism:
not in atomic in atomic
read none iff weak some
write none iff weak some
26 March 2008 Dan Grossman, Software Transactions 50
Optimizing away strong’s cost
Thread local
Immutable
Not accessed in transaction
New: static analysis for not-accessed-in-transaction …
26 March 2008 Dan Grossman, Software Transactions 51
Not-accessed-in-transaction
Revisit overhead of not-in-atomic for strong isolation, given information about how data is used in atomic
Yet another client of pointer-analysis
in atomic
no atomic access
no atomic write
atomic write
read none none some some
write none some some some
not in atomic
26 March 2008 Dan Grossman, Software Transactions 52
Analysis details
• Whole-program, context-insensitive, flow-insensitive– Scalable, but needs whole program
• Can be done before method duplication– Keep lazy code generation without losing precision
• Given pointer information, just two more passes
1. How is an “abstract object” accessed transactionally?
2. What “abstract objects” might a non-transactional access use?
26 March 2008 Dan Grossman, Software Transactions 53
Collaborative effort
• UW: static analysis using pointer analysis– Via Paddle/Soot from McGill
• Intel PSL: high-performance STM – Via compiler and run-time
Static analysis annotates bytecodes, so the compiler back-end knows what it can omit
26 March 2008 Dan Grossman, Software Transactions 54
Benchmarks
0.00
1.00
2.00
3.00
4.00
5.00
6.00
1 2 4 8 16
# Threads
Tim
e (
s)
Synch Weak Atom Strong Atom No Opts +JIT Opts +DEA +Static Opts
Tsp
26 March 2008 Dan Grossman, Software Transactions 55
Benchmarks
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1 2 4 8 16
# Threads
Ave
rag
e ti
me
per
10,
000
op
s (s
)
Synch Weak Atom Strong Atom No Opts +JIT Opts +DEA +Static Opts
JBB
26 March 2008 Dan Grossman, Software Transactions 56
Lesson
The cost of strong isolation is in the nontransactional code; compiler optimizations help a lot
26 March 2008 Dan Grossman, Software Transactions 57
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
5. Static optimizations for strong isolation [PLDI07]*
6. Multithreaded transactions [Aaron Kimball]
Caveat: ongoing work
* Joint work with Intel PSL
26 March 2008 Dan Grossman, Software Transactions 58
Multithreaded Transactions
• Most implementations (hw or sw) assume code inside a transaction is single-threaded– But isolation and parallelism are orthogonal– And Amdahl’s Law will strike with manycore
• Language design: need nested transactions
• Currently modifying Microsoft’s Bartok STM– Key: correct logging without sacrificing parallelism
• Work perhaps ahead of the technology curve– like concurrent garbage collection
26 March 2008 Dan Grossman, Software Transactions 59
Credit
Semantics: Katherine MooreUniprocessor: Michael Ringenburg, Aaron KimballOptimizations: Steven Balensiefer, Ben HindmanImplementing multithreaded transactions: Aaron Kimball
Memory-model issues: Jeremy Manson, Bill PughHigh-performance strong STM: Tatiana Shpeisman,
Vijay Menon, Ali-Reza Adl-Tabatabai, Richard Hudson, Bratin Saha
wasp.cs.washington.edu
26 March 2008 Dan Grossman, Software Transactions 60
Please read
High-Level Small-Step Operational Semantics for Transactions [POPL08] Katherine F. Moore, Dan Grossman
The Transactional Memory / Garbage Collection Analogy [OOPSLA07] Dan Grossman
Software Transactions Meet First-Class Continuations [SCHEME07] Aaron Kimball, Dan Grossman
Enforcing Isolation and Ordering in STM [PLDI07] Tatiana Shpeisman, Vijay Menon, Ali-Reza Adl-Tabatabai, Steve Balensiefer, Dan Grossman, Richard Hudson, Katherine F. Moore, Bratin Saha Atomicity via Source-to-Source Translation [MSPC06] Benjamin Hindman and Dan GrossmanWhat Do High-Level Memory Models Mean for Transactions?
[MSPC06] Dan Grossman, Jeremy Manson, William PughAtomCaml: First-Class Atomicity via Rollback [ICFP05] Michael F. Ringenburg, Dan Grossman
26 March 2008 Dan Grossman, Software Transactions 61
Lessons
1. Transactions: the garbage collection of shared memory
2. Semantics: prove sufficient conditions for avoiding weak-isolation anomalies
3. Must define interaction with features like exceptions
4. Uniprocessor implementations are worth special-casing
5. Compiler optimizations help remove the overhead in nontransactional code resulting from strong isolation
6. Amdahl’s Law suggests multithreaded transactions, which we believe we can make scalable