making sense of transactional memory
DESCRIPTION
Making sense of transactional memory. Tim Harris (MSR Cambridge) Based on joint work with colleagues at MSR Cambridge, MSR Mountain View, MSR Redmond, the Parallel Computing Platform group, Barcelona Supercomputing Centre, and the University of Cambridge Computer Lab. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/1.jpg)
Making sense of transactional memory
Tim Harris (MSR Cambridge)
Based on joint work with colleagues at MSR Cambridge, MSR Mountain View, MSR Redmond, the Parallel Computing Platform group, Barcelona Supercomputing
Centre, and the University of Cambridge Computer Lab
![Page 2: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/2.jpg)
Example: double-ended queueLeft sentinel
Thread 110 X
Thread 230 X20
Right sentinel
• Support push/pop on both ends• Allow concurrency where possible• Avoid deadlock
![Page 3: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/3.jpg)
Implementing this: atomic blocks
Class Q { QElem leftSentinel; QElem rightSentinel;
void pushLeft(int item) { atomic { QElem e = new QElem(item); e.right = this.leftSentinel.right; e.left = this.leftSentinel; this.leftSentinel.right.left = e; this.leftSentinel.right = e; } }
...}
![Page 4: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/4.jpg)
Design questions
Class Q { QElem leftSentinel; QElem rightSentinel;
void pushLeft(int item) { atomic { QElem e = new QElem(item); e.right = this.leftSentinel.right; e.left = this.leftSentinel; this.leftSentinel.right.left = e; this.leftSentinel.right = e; } }
...}
“What happens to this object if
the atomic block is rolled back?
“What happens if this fails with an
exception; are the other updates rolled
back?“What if another thread tries to access one of
these fields without being in an atomic block?
“What if another atomic block updates one of these fields? Will I see the value change mid-way through
my atomic block?
“What about I/O?
“What about memory access violations,
exceptions, security error logs, ...?
![Page 5: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/5.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = true; x = 0;
![Page 6: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/6.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = true; x = 0;x_shared == true
![Page 7: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/7.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = true; x = 0;
Old val x=0
x_shared == true
![Page 8: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/8.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 0;
Old val x=0
x_shared == true
![Page 9: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/9.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 1;
Old val x=0
x_shared == true
![Page 10: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/10.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 100;
Old val x=0
x_shared == true
![Page 11: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/11.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 0;
Old val x=0
![Page 12: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/12.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 0;
![Page 13: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/13.jpg)
The main argument
Language implementation
Program Threads,atomic blocks
TM
StartTx, CommitTxTxRead, TxWrite
1. We need a methodical way to define these constructs.
2. We should focus on defining this programmer-visible interface, rather than the internal “TM” interface.
![Page 14: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/14.jpg)
An analogy
Language implementation
Program Garbage collected“infinite” memory
GC
Low-level, broad,platform-specific API,no canonical def.
![Page 15: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/15.jpg)
Defining “atomic”, not “TM”
Implementing atomic over TM
Current performance
![Page 16: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/16.jpg)
Strong semantics: a simple interleaved model
1 2 3 4 5
Sequential interleaving of operations by threads.
No program transformations (optimization, weak memory, etc.)
Thread 5 enters an atomic block: prohibits the interleaving of
operations from other threads
![Page 17: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/17.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = true; x = 0;
![Page 18: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/18.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = true; x = 0;
Exec
ution
1
![Page 19: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/19.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = true; x = 100;
Exec
ution
1
![Page 20: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/20.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 100;
Exec
ution
1
![Page 21: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/21.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 101;
Exec
ution
1
![Page 22: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/22.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = true; x = 0;
![Page 23: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/23.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = true; x = 0;
Exec
ution
2
![Page 24: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/24.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 0;
Exec
ution
2
![Page 25: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/25.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 0;
Exec
ution
2
![Page 26: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/26.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = false; x = 1;
Exec
ution
2
![Page 27: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/27.jpg)
Pragmatically, do we care about...
atomic { x = 100; x = 200;}
temp = x;Console.WriteLine(temp);
x = 0;
![Page 28: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/28.jpg)
How: strong semantics for race-free programs
Strong semantics: simple interleaved model of multi-threaded execution
T
1 2 3 4 5
Thread 4 in an atomic blockData race: concurrent accesses
to the same location, at least one a write
Race-free: no data races (under strong semantics)
Write(x)
Write(x)
![Page 29: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/29.jpg)
Hiding TM from programmers
Programming discipline(s)
What does it mean for a program to use the
constructs correctly?Low-level semantics & actual implementations
Transactions, lock inference, optimistic concurrency, program transformations, weak memory
models, ...
Strong semantics atomic, retry, ..... what, ideally,
should these constructs do?
![Page 30: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/30.jpg)
Example: a privatization idiom
atomic { if (x_shared) { x = 100; }}
atomic { x_shared = false;}x++;
x_shared = true; x = 0;
Correctly synchronized: no concurrent access to “x” under strong semantics
![Page 31: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/31.jpg)
Example: a “racy” publication idiom
atomic { x = new Foo(...); x_shared = true;}
if (x_shared) { // Use x}
x_shared = false; x = null;
Not correctly synchronized: race on “x_shared” under strong semantics
![Page 32: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/32.jpg)
What about...• ...I/O?• ...volatile fields?• ...locks inside/outside atomic blocks?• ...condition variables?
Methodical approach: what happens under the simple, interleaved model?
1. Ideally, what does it do?2. Which uses are race-free?
![Page 33: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/33.jpg)
What about I/O?
atomic { Console.WriteLine(“What is your name?“); x = Console.ReadLine(); Console.WriteLine(“Hello “ + x);}
The entire write-read-write sequence should run (as if) without
interleaving with other threads
![Page 34: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/34.jpg)
What about C#/Java volatile fields?
volatile int x, y = 0;
atomic { x = 5; y = 10; x = 20;}
r1 = x;
r2 = y;
r3 = x;
r1=20, r2=10, r3=20
r1=0, r2=10, r3=20
r1=0, r2=0, r3=20
r1=0, r2=0, r3=0
![Page 35: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/35.jpg)
What about locks?
atomic { lock(obj1); x = 42; unlock(obj1);}
lock(obj1);x = 42;unlock(obj1);
Correctly synchronized: both threads would need “obj1” to access “x”
![Page 36: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/36.jpg)
What about locks?
atomic { x = 42;}
lock(obj1);x = 42;unlock(obj1);
Not correctly synchronized: no consistent synchronization
![Page 37: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/37.jpg)
What about condition variables?
atomic { lock(buffer); while (!full) buffer.wait(); full = true; ... unlock(buffer);}
Correctly synchronized: ...and works OK in this example
![Page 38: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/38.jpg)
What about condition variables?
Correctly synchronized: ...but program doesn’t work in this example
atomic { lock(barrier); waiters ++; while (waiters < N) { barrier.wait(); } unlock(barrier);}
Should run before waiting
Should run after waiting
Programmer says must run atomically
![Page 39: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/39.jpg)
Defining “atomic”, not “TM”
Implementing atomic over TM
Current performance
![Page 40: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/40.jpg)
Division of responsibilityDesired semantics
atomic blocks, retry, ...
STM primitivesStartTx, CommitTx, ReadTx, WriteTx, ...
Hardware primitivesConventional h/w: read, write, CAS
Lets us keep a very relaxed view of what the STM must do...
zombie tx, etc
Build strong guarantees by segregating tx /
non-tx in the runtime system
![Page 41: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/41.jpg)
Implementation 1: “classical” atomic blocks on TM
Language implementation
ProgramThreads,atomic blocks,retry, OrElse
Strong TM
Simple transformation
Lazy update, opacity,ordering guarantees...
![Page 42: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/42.jpg)
Language implementation
Program Threads,atomic blocks
StartTx, CommitTx,ValidateTx,ReadTx(addr)->val,WriteTx(addr, val)
Implementation 2: very weak TM
Very weak STM
Sandboxing for
zombies
Isolation of tx via MMU
Program analyses
GC support
![Page 43: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/43.jpg)
Implementation 3: lock inference
Language implementation
ProgramThreads,atomic blocks,retry, OrElse
LocksLock, unlock
Lock inference analysis
![Page 44: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/44.jpg)
Integrating non-TM features• Prohibit• Directly execute over TM• Use irrevocable execution• Integrate it with TM
Normal mutable state in STM-Haskell
“Dangerous” feature combinations, e.g, condition variables inside atomic blocks
![Page 45: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/45.jpg)
Integrating non-TM features• Prohibit• Directly execute over TM• Use irrevocable execution• Integrate it with TM
e.g., an “ordinary” library abstraction used in an atomic block
Is this possible?Will it scale well?
Will this be correctly synchronized?
![Page 46: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/46.jpg)
Integrating non-TM features• Prohibit• Directly execute over TM• Use irrevocable execution• Integrate it with TM
Prevent roll-back, ensure the transaction wins all conflicts.
Fall-back case for I/O operations.Use for rare cases, e.g., class initializers
![Page 47: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/47.jpg)
Integrating non-TM features• Prohibit• Directly execute over TM• Use irrevocable execution• Integrate it with TM
Provide conflict detection, recovery, etc, e.g. via 2-phase commit
Low-level integration of GC, memory management, etc.
![Page 48: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/48.jpg)
Defining “atomic”, not “TM”
Implementing atomic over TM
Current performance
![Page 49: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/49.jpg)
Performance figures depend on...• Workload : What do the atomic blocks do? How long is spent
inside them?• Baseline implementation: Mature existing compiler, or prototype?• Intended semantics: Support static separation? Violation freedom
(TDRF)? • STM implementation: In-place updates, deferred updates,
eager/lazy conflict detection, visible/invisible readers?• STM-specific optimizations: e.g. to remove or downgrade
redundant TM operations• Integration: e.g. dynamically between the GC and the STM, or
inlining of STM functions during compilation• Implementation effort: low-level perf tweaks, tuning, etc.• Hardware: e.g. performance of CAS and memory system
![Page 50: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/50.jpg)
Labyrinth
s1
e1
• STAMP v0.9.10• 256x256x3 grid• Routing 256 paths• Almost all execution inside
atomic blocks• Atomic blocks can attempt
100K+ updates• C# version derived from
original C• Compiled using Bartok, whole
program mode, C# -> x86 (~80% perf of original C with VS2008)
• Overhead results with Core2 Duo running Windows Vista
“STAMP: Stanford Transactional Applications for Multi-Processing”Chi Cao Minh, JaeWoong Chung, Christos Kozyrakis, Kunle Olukotun , IISWC 2008
![Page 51: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/51.jpg)
STM Dynamic filtering
Dataflow opts
Filter opts Re-use logs0
2
4
6
8
10
12
14
11.86
3.141.99 1.71 1.71
1-th
read
, nor
mal
ized
to se
q. b
asel
ine
Sequential overheadSTM implementation supporting static
separationIn-place updates
Lazy conflict detectionPer-object STM metadata
Addition of read/write barriers before accesses
Read: log per-object metadata wordUpdate: CAS on per-object metadata word
Update: log value being overwritten
![Page 52: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/52.jpg)
Sequential overhead
STM Dynamic filtering
Dataflow opts
Filter opts Re-use logs0
2
4
6
8
10
12
14
11.86
3.141.99 1.71 1.71
1-th
read
, nor
mal
ized
to se
q. b
asel
ine
Dynamic filtering to remove redundant logging
Log size grows with #locations accessedConsequential reduction in validation time
1st level: per-thread hashtable (1024 entries)2nd level: per-object bitmap of updated fields
![Page 53: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/53.jpg)
Sequential overhead
STM Dynamic filtering
Dataflow opts
Filter opts Re-use logs0
2
4
6
8
10
12
14
11.86
3.141.99 1.71 1.71
1-th
read
, nor
mal
ized
to se
q. b
asel
ine Data-flow optimizations
Remove repeated log operationsOpen-for-read/update on a per-object basis
Log-old-value on a per-field basisRemove concurrency control on newly-allocated
objects
![Page 54: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/54.jpg)
Sequential overhead
STM Dynamic filtering
Dataflow opts
Filter opts Re-use logs0
2
4
6
8
10
12
14
11.86
3.141.99 1.71 1.71
1-th
read
, nor
mal
ized
to se
q. b
asel
ine
Inline optimized filter operations
Re-use table_base between filter operationsAvoids caller save/restore on filter hits
mov eax <- obj_addrand eax <- eax, 0xffcmov ebx <- [table_base + eax]cmp ebx, obj_addr
![Page 55: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/55.jpg)
Sequential overhead
STM Dynamic filtering
Dataflow opts
Filter opts Re-use logs0
2
4
6
8
10
12
14
11.86
3.14 1.99000000000001 1.71 1.71
1-th
read
, nor
mal
ized
to se
q. b
asel
ine
Re-use STM logs between transactions
Reduces pressure on per-page allocation lock
Reduces time spent in GC
![Page 56: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/56.jpg)
Scaling – Genome
1 2 3 4 5 6 7 80.00.20.40.60.81.01.21.41.61.82.0
#Threads
Exec
ution
tim
e /
seq.
bas
elin
e Static separationStrong atomicity
![Page 57: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/57.jpg)
Scaling – Labyrinth
1 2 3 4 5 6 7 80.00.20.40.60.81.01.21.41.61.82.0
#Threads
Exec
ution
tim
e /
seq.
bas
elin
e
Static separationStrong atomicity
1.0 = wall-clock execution time of sequential code
without concurrency control
![Page 58: Making sense of transactional memory](https://reader035.vdocuments.site/reader035/viewer/2022062814/568167a0550346895ddceabb/html5/thumbnails/58.jpg)
Making sense of TM
• Focus on the interface between the language and the programmer– Talk about atomicity, not TM– Permit a range of tx and non-tx
implementations• Define idealized “strong semantics” for the
language (c.f. sequential consistency)• Define what it means for a program to be
“correctly synchronized” under these semantics
• Treat complicated cases methodically (I/O, locking, etc)