logtm: log-based transactional memory kevin e. moore, jayaram bobba, michelle j. moravan, mark d....
Post on 22-Dec-2015
218 views
TRANSCRIPT
LogTM: Log-Based Transactional Memory
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood
Presented by Colleen Lewis
Credits
Animations from the original LogTM HPCA presentation
Original graphs modified for readability
Big Picture
Hardware transaction motivation Per thread log Optimize commits (Hardware)
Design Decisions
Version Management Eager – write in place Lazy – write on commit
Conflict Detection Eager – detect at read/write time Lazy – detect at commit time
Transaction Logs
Pointer to the beginning of the log Pointer to the end of the log Read and Write bits for each cache line
2/15/06 HPCA-12
12--------------
--------------23
34--------------
0
Transaction Log Example
00
40
C0
1000
1040
1080
• Initial State• LogBase = LogPointer• TM count > 0
Data BlockVA
Log Base
Log Ptr
TM count
1000
1000
1
0 0
R W
0 0
0 0
2/15/06 HPCA-12
10001048--
34------------
12--------------
--------------23
34-------------- 0
Transaction Log Example
56--------------
00
40
C0
1000
1040
1080
• Store r2, (c0) /* r2 = 56 */– Set W bit for block (c0)
– Store address (c0) and old data on the log
– Increment Log Ptr to 1048
– Update memory
Data BlockVA
Log Base
Log Ptr
TM count
1000
1
0 0
R W
0 0
0 1
c0
2/15/06 HPCA-12
12--------------
--------------23
56--------------
Transaction Log Example
00
40
C0
1000
1040
1080
• Commit transaction– Clear R & W for all blocks– Reset Log Ptr to Log Base
(1000)– Clear TM count
Data BlockVA
Log Base
Log Ptr
TM count
1000
1000
0
0 0
R W
0 0
0 0
34------------c0
--
0
0 0
1
1
1048
2/15/06 HPCA-12
1
1090
Transaction Log Example
12--------------
--------------23
34--------------
00
40
C0
1000
1040
1080
• Abort transaction– Replay log entries to “undo”
the transaction– Reset Log Ptr to Log Base
(1000)– Clear R & W bits for all
blocks– Clear TM count
Data BlockVA
Log Base
Log Ptr
TM count
1000
1048
0
0 0
R W
0 0
0 0
c0 34------------
--
0
0 0
156--------------
1000
Conflict Detection
Checked at every read/write Directory forwards read requests Directory can have “sticky” data Individual nodes responsible for detecting
conflicts Needs
Transaction mode bit Overflow bit
2/15/06 HPCA-12
I [old]M@P0 [old]
I (--) [none]M (--) [old]M (-W) [new]
Conflict Detection (example)
Directory
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
1
• P0 store– P0 sends get exclusive
(GETX) request
– Directory responds with data (old)
– P0 executes store
P0
GETX DATA
2/15/06 HPCA-12
M (-W) [new]M (-W) [new]
Conflict Detection (example)
Directory
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
• In-cache transaction conflict– P1 sends get shared
(GETS) request
– Directory forwards to P0
– P0 detects conflict and sends NACK
P0
GETS
Fwd_GETS
Conflict!
NACK
2/15/06 HPCA-12
M (-W) [new]I (--) [none]
M@P0 [old]Msticky@P0 [new]
Conflict Detection (example)
Directory
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
1
• Cache overflow– P0 sends put exclusive
(PUTX) request
– Directory acknowledges
– P0 sets overflow bit
– P0 writes data back to memory
P0
PUTX ACK DATA
1
2/15/06 HPCA-12
Conflict Detection (example)
Directory
I (--) [none]
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
• Out-of-cache conflict– P1 sends GETS request
– Directory forwards to P0
– P0 detects a (possible) conflict
– P0 sends NACK
P0
M (--) [old]M (-W) [new]
Msticky@P0 [new]
I (--) [none]
1
GETS
Fwd_GETS
Conflict!
NACK
1
2/15/06 HPCA-12
Conflict Detection (example)
Directory
I (--) [none]
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
• Commit– P0 clears TM mode and
Overflow bits
P0
M (--) [old]M (-W) [new]
Msticky@P0 [new]
I (--) [none]
1
0
0
2/15/06 HPCA-12
Msticky@P0 [new]S(P1) [new]
0
0
0
Conflict Detection (example)
Directory
I (--) [none]
TM modeOverflow
0 P1
I (--) [none]
TM modeOverflow
0
0
• Lazy cleanup– P1 sends GETS request
– Directory forwards request to P0
– P0 detects no conflict, sends CLEAN
– Directory sends Data to P1
P0
M (--) [old]M (-W) [new]I (--) [none]
GETS
Fwd_GETSCLEAN DATA
S (--) [new]
False Positives?
What if P0 has started a new transaction without cleaning the sticky data?
M (-W) [new]I (--) [none]
M@P0 [old]Msticky@P0 [new]
False Positive Example
Directory
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
1
Cache overflow P0 sends put exclusive
(PUTX) request Directory acknowledges P0 sets overflow bit P0 writes data
back to memory P0
PUTX ACK DATA
1
False Positive Example
Directory
I (--) [none]
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
Commit P0 clears TM mode and
Overflow bits
Start New Transaction P0 set TM mode Eventually overflow Set overflow bits P0
M (--) [old]M (-W) [new]
Msticky@P0 [new]
I (--) [none]
1
0
0
1
1
Conflict Detection (example)
Directory
I (--) [none]
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
Out-of-cache conflict P1 sends GETS request Directory forwards to P0 P0 detects a (possible)
conflict P0 sends NACK
P0
M (--) [old]M (-W) [new]
Msticky@P0 [new]
I (--) [none]
1
GETSFwd_GETS
Conflict!
NACK
1
Conflict Resolution and Deadlock Avoidance
Options Wait – risk deadlock? Abort – risk livelock?
Current Behavior Wait Abort if waiting on a logically younger process
Future Behavior? Software contention manager
Evaluation
32 SPARC processors Solaris 9 OS SIMICS – full system simulator
Magic no-ops Tests
Micro-benchmarks SPLASH suite
Microbenchmarks
High Contention / Short Transactions
Comparing: EXP - TTS locks with exponential backoff MCS – SW Queue based locks
BEGIN_TRANSACTION();
new_total = total.count + 1; private_data[id].count++; total.count = new_total;
COMMIT_TRANSACTION();
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35
Threads (on 32 Processors)
Execu
tion
Tim
e (
in m
illion
s o
f cycle
s)
EXPMCSLogTM
SPLASH2 Benchmark Results
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Execu
tion
Tim
e (
in m
illion
s o
f cycle
s)
4.18
2.68
SPLASH2 Benchmark Results
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Execu
tion
Tim
e (
in m
illion
s o
f cycle
s)
4.18
2.68
Data presented as:PARMACS locks execution time
LogTM execution time Modified version:
LogTM execution time
PARMACS locks execution time
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Execu
tion T
ime (
in m
illio
ns
of
cycl
es) 4.18
2.68
0
10
20
30
40
50
60
70
80
90
100
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Sp
eed
up
62.7%
24.6%
10.9%
18.6% 18.3%
4.3%
76.1%
0
10
20
30
40
50
60
70
80
90
100
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Sp
eed
up
62.7%
24.6%
10.9%
18.6% 18.3%
4.3%
76.1%
SPLASH2 Benchmark Results
Conclusions
Optimize commits Aborts handled by software Stall to avoid wasting work Allow sticky data because overflow is rare Good performance on microbenchmark False sharing has a big impacts on LogTM
Questions?