hardware transactional memory
DESCRIPTION
Hardware Transactional Memory. Shimin Chen (LBA Reading Group). Outline. Transaction Concept A simple HTM Common Case Transaction Behaviors HTM Research Directions Description of Papers Summary. Transaction. A finite sequence of instructions Atomicity: all or nothing - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/1.jpg)
Hardware Transactional Memory
Shimin Chen(LBA Reading Group)
![Page 2: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/2.jpg)
Outline
Transaction Concept A simple HTM Common Case Transaction
Behaviors HTM Research Directions Description of Papers Summary
![Page 3: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/3.jpg)
Transaction A finite sequence of instructions Atomicity: all or nothing Serializability (Isolation): steps of one
transaction never appear to be interleaved with the steps of another.
A and B cannot be concurrent if ReadSet(A) WriteSet(B) , or WriteSet(A) ReadSet(B) , or WriteSet(A) WriteSet(B)
![Page 4: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/4.jpg)
A simple HTMNew hardware mechanisms to checkpoint register state
Checkpoint register renaming table buffer transactional writes
in private cache record transactional read-set and write-set
R bit and W bit per cache line Or dedicated state buffer on the side
detect conflict leverage cache coherence protocol
resolve conflict e.g. requester wins
![Page 5: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/5.jpg)
Simple HTM Operations TxBegin
Checkpoint register state Load/Store
Set state bits in cache; abort upon cache eviction Incoming coherence message
Check conflicts with state bits; abort if conflicted TxCommit
Flash clear state bits Abort
Flash invalidate write sets and read sets Restore register checkpoint
![Page 6: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/6.jpg)
Outline
Transaction Concept A simple HTM Common Case Transaction
Behaviors HTM Research Directions Description of Papers Summary
![Page 7: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/7.jpg)
“The Common Case Transactional Memory Behavior of Multithreaded Programs”. Stanford Team (Kozyrakis, Olukotun, and their students: Chung, Chafi, Minh, McDonald, Carlstrom). HPCA 2006.
Studied 35 applications Java, C+Pthread, C+OpenMP,
Parallel Processing Macros Assume high level parallelism
structure remains the same: convert lock/unlock into begin/end etc.
Trace-based analysis
![Page 8: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/8.jpg)
Non-blocking synchronization
![Page 9: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/9.jpg)
ReadSet and WriteSet Size
For 95% of transactions, RS < 4KB, WS<1KB
Weighted by time: 52KB RS, 30KB WS needed for covering 80% time
(assuming 32B cache lines)
![Page 10: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/10.jpg)
Nesting
Nesting distance could be high Partial rollback may be needed
Two-level of nests are common
![Page 11: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/11.jpg)
Speculative Parallelization
![Page 12: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/12.jpg)
Outline
Transaction Concept A simple HTM Common Case Transaction
Behaviors HTM Research Directions Description of Papers Summary
![Page 13: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/13.jpg)
Directions
Dealing with overflows Virtualizing HTM
Mixing HTM with STM Two code paths Use hardware mechanisms to speed
up STM
![Page 14: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/14.jpg)
Terminology
Conflict Detection Eager: at coherence message Lazy: at commit time
Version Management Eager: save old version, update in
place Lazy: buffer updates
Conflict Resolution
![Page 15: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/15.jpg)
Outline
Transaction Concept A simple HTM Common Case Transaction
Behaviors HTM Research Directions Description of Papers Summary
![Page 16: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/16.jpg)
• “Transactional Memory: Architectural Support for Lock-Free Data Structures.” Herlihy (DEC) & Moss (UMass). ISCA 1993.
• “Multiple Reservations and the Oklahoma Update.” Stone, Stone, Heidelberger, Turek (IBM). IEEE Parallel & Distributed Technology. 1993.
• “Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution.” Rajwar & Goodman. (Wisconsin). ISCA 2001.
• “Transactional Lock-Free Executionof Lock-Based Programs.” Rajwar & Goodman. (Wisonsin). ASPLOS 2002.
• “Transactional Memory Coherence and Consistency.” Stanford team. ISCA 2004.
• “Unbounded Transactional Memory.” Ananian, Asanovic, Kuszmaul, Leiserson, Lie (MIT). HPCA 2005.
• “Virtualizing Transactional Memory.” Rajwar, Herlihy, Lai. (Intel & Brown). ISCA 2005.
• “LogTM: Log-based Transactional Memory.” Moore, Bobba, Moravan, Hill, Wood. (Wisconsin team). HPCA 2006.
• “Hybrid Transactional Memory.” Kumar, Chu, Hughes, Kundu, Nguyen. PPoPP 2006.
• “Architectural Semantics for Practical Transactional Memory.” Stanford team. ISCA 2006.
![Page 17: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/17.jpg)
• “Bulk Disambiguation of Speculative Threads in Multiprocessors.” Ceze, Tuck, Cascaval, Torrellas. (UIUC). ISCA 2006.
• “Supporting Nested Transactional Memory in LogTM.” Wisconsin team. ASPLOS 2006.
• “Unbounded Page-Based Transactional Memory.” Chuang, Narayanasamy, Venkatesh, Sampson, Biesbrouck, Pokam, Colavin, Calder. (UCSD, ST Microelectronics, Microsoft). ASPLOS 2006.
• “Tradeoffs in Transactional Memory Virtualization.” Stanford team. ASPLOS 2006.
• “Hybrid Transactional Memory.” Damron, Fedorova, Lev, Luchangco, Moir, Nussbaum. (Sun). ASPLOS 2006.
• “Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory.” Blundell, Devietti, Lewis, Martin. (UPenn, VMware). ISCA 2007.
• “An Effective Hybrid Transactional Memory System with Strong Isolation guarantees.” Stanford team. ISCA 2007.
• “An Integrated Hardware-Software Approach to Flexible Transactional Memory.” Shriraman, Spear, Hossain, Marathe, Dwarkadas, Scott. (U Rochester). ISCA 2007.
• “Performance Pathologies in Hardware Transactional Memory.” Wisconsin team. ISCA 2007.
![Page 18: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/18.jpg)
Non-overflowed HTM
![Page 19: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/19.jpg)
“Transactional Memory: Architectural Support for Lock-Free Data Structures.” Herlihy (DEC) & Moss (UMass). ISCA 1993.
First HTM paper Simple HTM like
Transactional cache along L1D Abort, roll-back: not fully automatic
HW discards transactional updates SW jumps back and retries transaction (w/ exp
backoffs)
Conflict detection: eager (coherence) Conflict resolution: requester aborts
![Page 20: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/20.jpg)
“Multiple Reservations and the Oklahoma Update.” Stone, Stone, Heidelberger, Turek (IBM). IEEE Parallel & Distributed Technology. 1993.
Single reservation: LL-SC Multiple reservations: all or nothing,
transactions w/ read-modified-writes Oklahoma update (In a musical “Oklahoma!”, there is
a song titled “All er Nothin”) Simple HTM like
Batch updates and detection at commit time
![Page 21: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/21.jpg)
“Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution.” Rajwar & Goodman. (Wisconsin). ISCA 2001. (SLE)
Idea: speculate lock-unlock critical section
while eliding locks using simple HTM fall back to locking upon conflicts &
overflows Novelty: recognizing lock and unlock
Lock: LL-SC with predictors Unlock: a store to restore value
changed by LL-SC
![Page 22: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/22.jpg)
“Transactional Lock-Free Executionof Lock-Based Programs.” Rajwar & Goodman. (Wisonsin). ASPLOS 2002. (TLR)
SLE + resolve conflicts Timestamp
<# of commited TLR on the local cpu, cpu ID> Stall or Abort the younger transaction upon conflicts
Non-trivial addition to cache coherence protocol for avoiding deadlocks
![Page 23: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/23.jpg)
“Transactional Memory Coherence and Consistency.” Stanford team. ISCA 2004. (TCC)
Conflict detection: lazy Novelty: propose to use
transactional memory to replace cache coherence Illusion of shared memory Batch communication like message
passing
![Page 24: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/24.jpg)
“Bulk Disambiguation of Speculative Threads in Multiprocessors.” Ceze, Tuck, Cascaval, Torrellas. (UIUC). ISCA 2006.
Conflict Detection: lazy Use bloom filter signature to do batch
detection 2000 bit bloom filter, avg 70 read lines and 20
write lines per transaction
![Page 25: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/25.jpg)
Virtualizing HTM
![Page 26: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/26.jpg)
How?
Generally: save transaction states in virtual memory Read set, write set Or readers, writers per block in
memory Conflict detection needs to check this
structure Question: how to make it efficient?
![Page 27: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/27.jpg)
“Unbounded Transactional Memory.” Ananian, Asanovic, Kuszmaul, Leiserson, Lie (MIT). HPCA 2005.
First paper on overflowed transactions UTM (“Unbounded TM”):
Idealized (very complicated) LTM (“Large TM”):
Lazy versioning Limitations: less than a time slice, no
migration, smaller than physical memory
![Page 28: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/28.jpg)
“Virtualizing Transactional Memory.” Rajwar, Herlihy, Lai. (Intel & Brown). ISCA 2005. (VTM)
A fairly complete description Novelty:
XSW: transaction status word load/store entries point to XSW; can change transaction state with a single atomic
update Filter for conflict detection
Lazy versioning (buffer updates) Eager conflict detection
![Page 29: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/29.jpg)
“LogTM: Log-based Transactional Memory.” Moore, Bobba, Moravan, Hill, Wood. (Wisconsin team). HPCA 2006.
Overflow handling Eager versioning: per-thread undo log
Update in place, save old values in log Favors commits
Eager conflict detection Cache has a single overflow bit Use directory to remember the transactional access
to a line even if the line is evicted from cache
![Page 30: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/30.jpg)
“Architectural Semantics for Practical Transactional Memory.” Stanford team. ISCA 2006.
Provide support to call software callbacks
Commit, abort, violation Nested transactions
Flatterning: a violation rolls back to the beginning of the top-most transaction
Closed nesting: allow partial roll-backs
Open nesting: allow partial commits
![Page 31: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/31.jpg)
“Supporting Nested Transactional Memory in LogTM.” Wisconsin team. ASPLOS 2006.
Undo log is organized as transaction log frames
(just like stack frames) LIFO
![Page 32: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/32.jpg)
“Unbounded Page-Based Transactional Memory.” Chuang, Narayanasamy, Venkatesh, Sampson, Biesbrouck, Pokam, Colavin, Calder. (UCSD, ST Microelectronics, Microsoft). ASPLOS 2006.
Shadow page + home page Conflict detection: special cache for overflow
info before traversing memory structure
![Page 33: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/33.jpg)
“Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory.” Blundell, Devietti, Lewis, Martin. (UPenn, VMware). ISCA 2007.
Making the fast case common: Permission-only cache Cache RW bits for overflowed cache lines
Making the uncommon case simple: Allow only a single overflowed transaction OneTM-serialized: stall all other Xactions OneTM-concurrent: allow other non-
overflowed xactions Each block in memory requires a RW bits +
transaction ID
![Page 34: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/34.jpg)
“Performance Pathologies in Hardware Transactional Memory.” Wisconsin team. ISCA 2007.
Seven pathological scenarios that different HTMs may do poorly
Livelock cases, starvation, convoy, futile stalling for a xaction that eventually aborts
Enhances: Conflict resolution: back-offs, priorities Predicting writes in a transactions, so that one can
get ownership at reads
![Page 35: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/35.jpg)
Combining HTM and STM
![Page 36: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/36.jpg)
“Hybrid Transactional Memory.” Kumar, Chu, Hughes, Kundu, Nguyen. PPoPP 2006.
Enhance the Dynamic STM (Herlihy et al: wrap objects with indirection/replication)
HTM mode STM mode Tries HTM first
A trick for conflict detection between HTM and STM:
STM also starts a hardware xaction But only access a single state word transactionally Perform all other actions nontransactionally
![Page 37: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/37.jpg)
“Tradeoffs in Transactional Memory Virtualization.” Stanford team. ASPLOS 2006. (XTM)
Two modes: all in hardware, all in software If HTM overflows, aborts it and runs it in
software mode Software mode:
Per-transaction page table Copy-on-firstaccess: check if read data is
not changed at commit Copy-on-write: buffer transactional writes
![Page 38: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/38.jpg)
“Hybrid Transactional Memory.” Damron, Fedorova, Lev, Luchangco, Moir, Nussbaum. (Sun). ASPLOS 2006.
Compiler generates two code paths, choose at runtime:
STM HTM
Word-based Metadata access per memory operation
required even for HTM (to detect conflict with STM)
![Page 39: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/39.jpg)
“An Effective Hybrid Transactional Memory System with Strong Isolation guarantees.” Stanford team. ISCA 2007.
SigTM: Enhance a STM system with hardware
signatures
![Page 40: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/40.jpg)
“An Integrated Hardware-Software Approach to Flexible Transactional Memory.” Shriraman, Spear, Hossain, Marathe, Dwarkadas, Scott. (U Rochester). ISCA 2007. (RTM)
Two hardware mechanisms to improve a STM (RSTM) performance:
Alert-on-update: allow software callbacks for invalidation and eviction of selected cache lines
Programmable data isolation: control cache to hold transactional blocks
![Page 41: Hardware Transactional Memory](https://reader036.vdocuments.site/reader036/viewer/2022062407/56812a45550346895d8d7c1e/html5/thumbnails/41.jpg)
Summary
Simple HTM is nice Major complexity comes in
because of space and time limitations Logs, shadow pages, filters, caches,
etc. Combine HTM and STM