memory consistency models - courses.cs.washington.edu€¦ · memory consistency models the short...
TRANSCRIPT
![Page 1: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/1.jpg)
Memory Consistency ModelsCSE 451 James Bornholt
![Page 2: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/2.jpg)
Memory consistency modelsThe short version: • Multiprocessors reorder memory
operations in unintuitive, scary ways • This behavior is necessary for performance • Application programmers rarely see this
behavior • But kernel developers see it all the time
![Page 3: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/3.jpg)
Multithreaded programs
Initially A = B = 0
Thread 1 Thread 2A = 1 if (B == 0) print “Hello”;
B = 1 if (A == 0) print “World”;
What can be printed?• “Hello”?
![Page 4: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/4.jpg)
Multithreaded programs
Initially A = B = 0
Thread 1 Thread 2A = 1 if (B == 0) print “Hello”;
B = 1 if (A == 0) print “World”;
What can be printed?• “Hello”?• “World”?
![Page 5: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/5.jpg)
Multithreaded programs
Initially A = B = 0
Thread 1 Thread 2A = 1 if (B == 0) print “Hello”;
B = 1 if (A == 0) print “World”;
What can be printed?• “Hello”?• “World”?• Nothing?
![Page 6: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/6.jpg)
Multithreaded programs
Initially A = B = 0
Thread 1 Thread 2A = 1 if (B == 0) print “Hello”;
B = 1 if (A == 0) print “World”;
What can be printed?• “Hello”?• “World”?• Nothing?• “Hello World”?
![Page 7: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/7.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2
if (B == 0) print “Hello”;
B = 1if (A == 0)
print “World”;
A = 1
![Page 8: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/8.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2
if (B == 0) print “Hello”;
B = 1if (A == 0)
print “World”;
A “happens-before” graph shows the order in which events must execute to get a desired outcome. • If there’s a cycle in the graph, an outcome is impossible—an
event must happen before itself!
A = 1
![Page 9: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/9.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2
if (B == 0) print “Hello”;
B = 1if (A == 0)
print “World”;
A “happens-before” graph shows the order in which events must execute to get a desired outcome. • If there’s a cycle in the graph, an outcome is impossible—an
event must happen before itself!
A = 1
![Page 10: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/10.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2
if (B == 0) print “Hello”;
B = 1if (A == 0)
print “World”;
A “happens-before” graph shows the order in which events must execute to get a desired outcome. • If there’s a cycle in the graph, an outcome is impossible—an
event must happen before itself!
A = 1
![Page 11: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/11.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2
if (B == 0) print “Hello”;
B = 1if (A == 0)
print “World”;
A “happens-before” graph shows the order in which events must execute to get a desired outcome. • If there’s a cycle in the graph, an outcome is impossible—an
event must happen before itself!
A = 1
![Page 12: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/12.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2
if (B == 0) print “Hello”;
B = 1if (A == 0)
print “World”;
A “happens-before” graph shows the order in which events must execute to get a desired outcome. • If there’s a cycle in the graph, an outcome is impossible—an
event must happen before itself!
A = 1
![Page 13: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/13.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2A = 1 B = 1if (B == 0)
print “Hello”;if (A == 0)
print “World”;
![Page 14: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/14.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2A = 1
if (r0 == 0) print “Hello”;
B = 1
if (r1 == 0) print “World”;
r0 = B r1 = A
![Page 15: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/15.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2A = 1 B = 1r0 = B r1 = A
![Page 16: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/16.jpg)
Things that shouldn’t happen
This program should never print “Hello World”.
Thread 1 Thread 2A = 1 B = 1r0 = B r1 = A
Not allowed: r0 = 0 and r1 = 0
![Page 17: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/17.jpg)
Sequential consistency• All operations executed in some sequential order
• As if they were manipulating a single shared memory • Each thread’s operations happen in program order
(This is the interleaving model you probably remember from 332)
Thread 1 Thread 2
Not allowed: r0 = 0 and r1 = 0
A = 1 B = 1r0 = B r1 = A
![Page 18: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/18.jpg)
Sequential consistencyCan be seen as a “switch” running one instruction at a time
Memory A = 0 B = 0
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
Executed
![Page 19: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/19.jpg)
Sequential consistencyCan be seen as a “switch” running one instruction at a time
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedMemory
A = 0 B = 0
![Page 20: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/20.jpg)
Sequential consistencyCan be seen as a “switch” running one instruction at a time
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1 Memory
A = 1 B = 0
![Page 21: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/21.jpg)
Sequential consistencyCan be seen as a “switch” running one instruction at a time
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1 Memory
A = 1 B = 0
![Page 22: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/22.jpg)
Sequential consistencyCan be seen as a “switch” running one instruction at a time
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1
B = 1
Memory A = 1 B = 1
![Page 23: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/23.jpg)
Sequential consistencyCan be seen as a “switch” running one instruction at a time
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1
B = 1
Memory A = 1 B = 1
![Page 24: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/24.jpg)
Sequential consistencyCan be seen as a “switch” running one instruction at a time
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1
B = 1
r1 = A (= 1)
Memory A = 1 B = 1
![Page 25: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/25.jpg)
Sequential consistencyCan be seen as a “switch” running one instruction at a time
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1
B = 1
r1 = A (= 1)
Memory A = 1 B = 1
![Page 26: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/26.jpg)
Sequential consistencyCan be seen as a “switch” running one instruction at a time
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1
B = 1
r1 = A (= 1)
Memory A = 1 B = 1
r0 = B (= 1)
![Page 27: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/27.jpg)
Sequential consistencyTwo invariants: • All operations executed in some sequential order • Each thread’s operations happen in program order
Says nothing about which order all operations happen in • Any interleaving of threads is allowed
Due to Leslie Lamport in 1979 • Won the Turing award for this idea!
![Page 28: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/28.jpg)
Memory consistency models• A memory consistency model defines the permitted
reorderings of memory operations during execution
• A contract between hardware and software: the hardware will only mess with your memory operations in these ways
• Sequential consistency is the strongest memory model: allows the fewest reorderings/strange behaviors • (At least until you take 452!)
![Page 29: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/29.jpg)
Assume sequential consistency, and all variables are initially 0.
Can r0 = 0 and r1 = 0? (3) → (4) → (1) → (2)
Pop Quiz!
Thread 1 Thread 2X = 1 Y = 1
r0 = Y r1 = X
(1)(2)
(3)(4)
![Page 30: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/30.jpg)
Assume sequential consistency, and all variables are initially 0.
Can r0 = 0 and r1 = 0? (3) → (4) → (1) → (2)
Pop Quiz!
Thread 1 Thread 2X = 1 Y = 1
r0 = Y r1 = X
(1)(2)
(3)(4)
![Page 31: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/31.jpg)
Assume sequential consistency, and all variables are initially 0.
Can r0 = 0 and r1 = 0? (3) → (4) → (1) → (2)Can r0 = 1 and r1 = 1? (1) → (2) → (3) → (4)
Pop Quiz!
Thread 1 Thread 2X = 1 Y = 1
r0 = Y r1 = X
(1)(2)
(3)(4)
![Page 32: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/32.jpg)
Assume sequential consistency, and all variables are initially 0.
Can r0 = 0 and r1 = 0? (3) → (4) → (1) → (2)Can r0 = 1 and r1 = 1? (1) → (2) → (3) → (4)
Pop Quiz!
Thread 1 Thread 2X = 1 Y = 1
r0 = Y r1 = X
(1)(2)
(3)(4)
![Page 33: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/33.jpg)
Assume sequential consistency, and all variables are initially 0.
Can r0 = 0 and r1 = 0? (3) → (4) → (1) → (2)Can r0 = 1 and r1 = 1? (1) → (2) → (3) → (4)Can r0 = 0 and r1 = 1? (1) → (3) → (4) → (2)
Pop Quiz!
Thread 1 Thread 2X = 1 Y = 1
r0 = Y r1 = X
(1)(2)
(3)(4)
![Page 34: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/34.jpg)
Assume sequential consistency, and all variables are initially 0.
Can r0 = 0 and r1 = 0? (3) → (4) → (1) → (2)Can r0 = 1 and r1 = 1? (1) → (2) → (3) → (4)Can r0 = 0 and r1 = 1? (1) → (3) → (4) → (2)
Pop Quiz!
Thread 1 Thread 2X = 1 Y = 1
r0 = Y r1 = X
(1)(2)
(3)(4)
![Page 35: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/35.jpg)
Assume sequential consistency, and all variables are initially 0.
Can r0 = 0 and r1 = 0? (3) → (4) → (1) → (2)Can r0 = 1 and r1 = 1? (1) → (2) → (3) → (4)Can r0 = 0 and r1 = 1? (1) → (3) → (4) → (2)Can r0 = 1 and r1 = 0? No!
Pop Quiz!
Thread 1 Thread 2X = 1 Y = 1
r0 = Y r1 = X
(1)(2)
(3)(4)
![Page 36: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/36.jpg)
Assume sequential consistency, and all variables are initially 0.
Can r0 = 0 and r1 = 0? (3) → (4) → (1) → (2)Can r0 = 1 and r1 = 1? (1) → (2) → (3) → (4)Can r0 = 0 and r1 = 1? (1) → (3) → (4) → (2)Can r0 = 1 and r1 = 0? No!
Pop Quiz!
Thread 1 Thread 2X = 1 Y = 1
r0 = Y r1 = X
(1)(2)
(3)(4)
![Page 37: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/37.jpg)
Why sequential consistency?• Agrees with programmer intuition!
Why not sequential consistency?• Horribly slow to guarantee in hardware
• The “switch” model is overly conservative
![Page 38: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/38.jpg)
The problem with SC
Memory
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1
![Page 39: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/39.jpg)
The problem with SC
Memory
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1
These two instructions don’t conflict—there’s no need to wait for the first one to finish before executing the second.
![Page 40: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/40.jpg)
The problem with SC
Memory
Core 1 A = 1 r0 = B
Core 2 B = 1 r1 = A
ExecutedA = 1
These two instructions don’t conflict—there’s no need to wait for the first one to finish before executing the second.
And writing to memory takes forever! (about 100 cycles ≈ 30 ns)
![Page 41: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/41.jpg)
Optimization: Store buffers• Store writes in a local buffer and then proceed to next
instruction immediately • The cache will pull writes out of the store buffer when it’s
ready
Core 1Thread 1
Store buffer
Caches A = 0 B = 0
Memory A = 0 B = 0
A = 1r0 = B
![Page 42: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/42.jpg)
Optimization: Store buffers• Store writes in a local buffer and then proceed to next
instruction immediately • The cache will pull writes out of the store buffer when it’s
ready
Core 1Thread 1
Store buffer
Caches A = 0 B = 0
Memory A = 0 B = 0
A = 1
r0 = B
![Page 43: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/43.jpg)
Optimization: Store buffers• Store writes in a local buffer and then proceed to next
instruction immediately • The cache will pull writes out of the store buffer when it’s
ready
Core 1Thread 1
Store buffer
Caches A = 0 B = 0
Memory A = 0 B = 0
A = 1r0 = B
![Page 44: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/44.jpg)
Optimization: Store buffers• Store writes in a local buffer and then proceed to next
instruction immediately • The cache will pull writes out of the store buffer when it’s
ready
Core 1Thread 1
Store buffer
Caches A = 0 B = 0
Memory A = 0 B = 0
A = 1
r0 = B
![Page 45: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/45.jpg)
Optimization: Store buffers• Store writes in a local buffer and then proceed to next
instruction immediately • The cache will pull writes out of the store buffer when it’s
ready
Core 1Thread 1
Store buffer
Caches C = 0
Memory C = 0
C = 1r0 = Cr0 = CC = 1
![Page 46: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/46.jpg)
Optimization: Store buffers• Store writes in a local buffer and then proceed to next
instruction immediately • The cache will pull writes out of the store buffer when it’s
ready
Core 1Thread 1
Store buffer
Caches C = 0
Memory C = 0C = 1
r0 = Cr0 = CC = 1
![Page 47: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/47.jpg)
Optimization: Store buffers• Store writes in a local buffer and then proceed to next
instruction immediately • The cache will pull writes out of the store buffer when it’s
ready
Core 1Thread 1
Store buffer
Caches C = 0
Memory C = 0
C = 1r0 = Cr0 = CC = 1
![Page 48: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/48.jpg)
Optimization: Store buffers• Store writes in a local buffer and then proceed to next
instruction immediately • The cache will pull writes out of the store buffer when it’s
ready
Core 1Thread 1
Store buffer
Caches C = 0
Memory C = 0
C = 1
r0 = C
r0 = CC = 1
![Page 49: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/49.jpg)
Optimization: Store buffers• Store writes in a local buffer and then proceed to next
instruction immediately • The cache will pull writes out of the store buffer when it’s
ready
Core 1Thread 1
Store buffer
Caches C = 0
Memory C = 0
C = 1
r0 = C
r0 = CC = 1
![Page 50: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/50.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1 Thread 2(1)(2)
(3)(4)Store buffer Store buffer
Memory A = 0 B = 0
Can r0 = 0 and r1 = 0?
A = 1r0 = B
B = 1r1 = A
![Page 51: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/51.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1 Thread 2(1)(2)
(3)(4)Store buffer Store buffer
Memory A = 0 B = 0
Can r0 = 0 and r1 = 0?SC: No!
A = 1r0 = B
B = 1r1 = A
![Page 52: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/52.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executed
A = 1 B = 1(1)(2)
(3)(4)
![Page 53: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/53.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executed
A = 1 B = 1(1)(2)
(3)(4)
![Page 54: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/54.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executed
A = 1
B = 1(1)(2)
(3)(4)
![Page 55: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/55.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executed
A = 1
B = 1 (1)(2)
(3)(4)
![Page 56: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/56.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executed
A = 1 B = 1
(1)(2)
(3)(4)
![Page 57: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/57.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B
r1 = A
Executed
A = 1 B = 1
(1)(2)
(3)(4)
![Page 58: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/58.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B
r1 = A
Executedr0 = B (= 0)
A = 1 B = 1
(1)(2)
(3)(4)
![Page 59: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/59.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executedr0 = B (= 0)
A = 1 B = 1
(1)(2)
(3)(4)
![Page 60: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/60.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executedr0 = B (= 0)
r1 = A (= 0)
A = 1 B = 1
(1)(2)
(3)(4)
![Page 61: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/61.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executedr0 = B (= 0)
r1 = A (= 0)
A = 1
A = 1
B = 1
(1)(2)
(3)(4)
![Page 62: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/62.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executedr0 = B (= 0)
r1 = A (= 0)
A = 1
B = 1
A = 1B = 1
(1)(2)
(3)(4)
![Page 63: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/63.jpg)
Store buffers change memory behavior
Core 1 Core 2 Thread 1
Store buffer
Thread 2
Store buffer
Memory A = 0B = 0
Can r0 = 0 and r1 = 0?SC: No!
r0 = B r1 = A
Executedr0 = B (= 0)
r1 = A (= 0)
A = 1
B = 1
A = 1B = 1
Store buffers: Yes!
(1)(2)
(3)(4)
![Page 64: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/64.jpg)
So, who uses store buffers?Every modern CPU! • x86 • ARM • PowerPC • …
![Page 65: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/65.jpg)
So, who uses store buffers?Every modern CPU! • x86 • ARM • PowerPC • …
A Volatile-by-Default JVM for Server Applications. Liu, Millstein, Musuvathi. OOPSLA 2017.
Slow
dow
n w
ithou
t sto
re b
uffer
s
0.0
0.5
1.0
1.5
2.0
avrora fop h2 jython luindex pmd sunflow tomcat xalan
Java is 7–81% slower without store buffers!
![Page 66: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/66.jpg)
Total Store Ordering (TSO)• Sequential consistency plus
store buffers • Allows more behaviors than SC
• Harder to program!
• x86 specifies TSO as its memory model
![Page 67: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/67.jpg)
Write buffer
More esoteric memory models• Partial Store Ordering (used by SPARC)
• Write coalescing: merge writes to the same cache line inside the write buffer to save memory bandwidth
• Allows writes to be reordered with other writes
Thread 1X = 1Y = 1
Z = 1
Assume X and Z are on the same cache line
Executed
X = 1Y = 1
Z = 1
![Page 68: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/68.jpg)
Write buffer
More esoteric memory models• Partial Store Ordering (used by SPARC)
• Write coalescing: merge writes to the same cache line inside the write buffer to save memory bandwidth
• Allows writes to be reordered with other writes
Thread 1
X = 1
Y = 1
Z = 1
Assume X and Z are on the same cache line
Executed
X = 1Y = 1
Z = 1
![Page 69: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/69.jpg)
Write buffer
More esoteric memory models• Partial Store Ordering (used by SPARC)
• Write coalescing: merge writes to the same cache line inside the write buffer to save memory bandwidth
• Allows writes to be reordered with other writes
Thread 1
X = 1
Y = 1
Z = 1Assume X and Z are on the same cache line
Executed
X = 1Y = 1
Z = 1
![Page 70: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/70.jpg)
Write buffer
More esoteric memory models• Partial Store Ordering (used by SPARC)
• Write coalescing: merge writes to the same cache line inside the write buffer to save memory bandwidth
• Allows writes to be reordered with other writes
Thread 1
Y = 1
Assume X and Z are on the same cache line
ExecutedX = 1
Z = 1
X = 1Y = 1
Z = 1
![Page 71: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/71.jpg)
Write buffer
More esoteric memory models• Partial Store Ordering (used by SPARC)
• Write coalescing: merge writes to the same cache line inside the write buffer to save memory bandwidth
• Allows writes to be reordered with other writes
Thread 1Assume X and Z are on the same cache line
ExecutedX = 1
Z = 1
Y = 1
X = 1Y = 1
Z = 1
![Page 72: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/72.jpg)
More esoteric memory models• Weak ordering (ARM, PowerPC)
• No guarantees about operations on data • Almost everything can be reordered! 😱 • One exception: dependent operations are ordered
ldr r0, #y ldr r1, [r0] ldr r2, [r1]
int** r0 = y; // y stored in r0 int* r1 = *y; int* r2 = *r1;
![Page 73: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/73.jpg)
Even more esoteric memory models• DEC Alpha
• A successor to VAX… • Killed in 2001
• Dependent operations can be reordered!
• Lowest common denominator for the Linux kernel
1998 2003 2015 Inc.
![Page 74: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/74.jpg)
This seems like a nightmare!• Every architecture provides synchronization primitives to make
memory ordering stricter • Fence instructions prevent reorderings, but are expensive • Other synchronization primitives: read-modify-write/
compare-and-swap/atomics, transactional memory, …
movl $1,%[x] movl %[y],%eax
movl $1,%[y] movl %[x],%ebx
![Page 75: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/75.jpg)
This seems like a nightmare!• Every architecture provides synchronization primitives to make
memory ordering stricter • Fence instructions prevent reorderings, but are expensive • Other synchronization primitives: read-modify-write/
compare-and-swap/atomics, transactional memory, …
movl $1,%[x] movl %[y],%eax
movl $1,%[y] movl %[x],%ebx
movl $1, %ecx xchg %ecx, %[x]
movl $1, %ecx xchg %ecx, %[y]
![Page 76: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/76.jpg)
This seems like a nightmare!• Every architecture provides synchronization primitives to make
memory ordering stricter • Fence instructions prevent reorderings, but are expensive • Other synchronization primitives: read-modify-write/
compare-and-swap/atomics, transactional memory, …
movl $1,%[x] movl %[y],%eax
movl $1,%[y] movl %[x],%ebx
![Page 77: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/77.jpg)
This seems like a nightmare!• Every architecture provides synchronization primitives to make
memory ordering stricter • Fence instructions prevent reorderings, but are expensive • Other synchronization primitives: read-modify-write/
compare-and-swap/atomics, transactional memory, …
movl $1,%[x] movl %[y],%eax
movl $1,%[y] movl %[x],%ebx
movl $1,%[x] mfence movl %[y],%eax
movl $1,%[y] mfence movl %[x],%eax
![Page 78: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/78.jpg)
But it’s not just hardware…
Thread 1X = 0 for i=0 to 100: X = 1 print X
![Page 79: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/79.jpg)
But it’s not just hardware…
Thread 1X = 0 for i=0 to 100: X = 1 print X
Thread 1X = 1 for i=0 to 100: print X
compiler
![Page 80: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/80.jpg)
But it’s not just hardware…
Thread 1X = 0 for i=0 to 100: X = 1 print X
Thread 1X = 1 for i=0 to 100: print X
![Page 81: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/81.jpg)
But it’s not just hardware…
Thread 1X = 0 for i=0 to 100: X = 1 print X
Thread 1X = 1 for i=0 to 100: print X
Thread 2X = 0
Thread 2X = 0
![Page 82: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/82.jpg)
But it’s not just hardware…
Thread 1X = 0 for i=0 to 100: X = 1 print X
Thread 1X = 1 for i=0 to 100: print X
Thread 2X = 0
Thread 2X = 0
11111111111…11111111111…
![Page 83: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/83.jpg)
But it’s not just hardware…
Thread 1X = 0 for i=0 to 100: X = 1 print X
Thread 1X = 1 for i=0 to 100: print X
Thread 2X = 0
Thread 2X = 0
11111000000…
11111111111…11111111111…
![Page 84: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/84.jpg)
But it’s not just hardware…
Thread 1X = 0 for i=0 to 100: X = 1 print X
Thread 1X = 1 for i=0 to 100: print X
Thread 2X = 0
Thread 2X = 0
11111000000…
11111111111…11111111111…
11111011111…
![Page 85: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/85.jpg)
Are computers broken?• Every example so far has involved a data race
• Two accesses to the same memory location • At least one is a write • Unordered by synchronization operations
• If there are no data races, reordering behavior doesn’t matter • Accesses are ordered by synchronization, and
synchronization forces sequential consistency • Note this is not the same as determinism
![Page 86: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/86.jpg)
Memory models in the real world• Modern (C11, C++11) and not-so-modern (Java 5) languages
guarantee sequential consistency for data-race-free programs (“SC for DRF”) • Compilers will insert the necessary synchronization to cope
with the hardware memory model
• No guarantees (undefined behavior) if your program contains even a single data race! • The intuition is that most programmers would consider a
racy program to be buggy • Use a synchronization library!
• Incredibly difficult to get right in the compiler and kernel • Countless bugs and mailing list arguments
![Page 87: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/87.jpg)
Memory models in the Linux kernelManfred Spraul spin_unlock optimization(i386) the current spin_unlock asm code is lock; btrl $0,%0 it takes ~ 22 ticks on my PII/350. I think it's possible to replace that with movl $0,%0 which would be a simple, pairable single-tick instruction.
![Page 88: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/88.jpg)
Memory models in the Linux kernelManfred Spraul spin_unlock optimization(i386) the current spin_unlock asm code is lock; btrl $0,%0 it takes ~ 22 ticks on my PII/350. I think it's possible to replace that with movl $0,%0 which would be a simple, pairable single-tick instruction.
Linus Torvalds Re: spin_unlock optimization(i386) It does NOT WORK! Let the FreeBSD people use it, and then get faster timings. They will crash, eventually. […] the above CAN return 1 […] I might be proven wrong, but I don’t think I am.
![Page 89: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/89.jpg)
Memory models in the Linux kernelManfred Spraul spin_unlock optimization(i386) the current spin_unlock asm code is lock; btrl $0,%0 it takes ~ 22 ticks on my PII/350. I think it's possible to replace that with movl $0,%0 which would be a simple, pairable single-tick instruction.
Linus Torvalds Re: spin_unlock optimization(i386) It does NOT WORK! Let the FreeBSD people use it, and then get faster timings. They will crash, eventually. […] the above CAN return 1 […] I might be proven wrong, but I don’t think I am.
Erich Boleyn Re: spin_unlock optimization(i386) It will always return 0. […] Erich Boleyn PMD IA32 Architecture Intel
![Page 90: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/90.jpg)
Memory models in the Linux kernelManfred Spraul spin_unlock optimization(i386) the current spin_unlock asm code is lock; btrl $0,%0 it takes ~ 22 ticks on my PII/350. I think it's possible to replace that with movl $0,%0 which would be a simple, pairable single-tick instruction.
Linus Torvalds Re: spin_unlock optimization(i386) It does NOT WORK! Let the FreeBSD people use it, and then get faster timings. They will crash, eventually. […] the above CAN return 1 […] I might be proven wrong, but I don’t think I am.
Erich Boleyn Re: spin_unlock optimization(i386) It will always return 0. […] Erich Boleyn PMD IA32 Architecture Intel
[119 emails later …]
![Page 91: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/91.jpg)
Memory models in the Linux kernelManfred Spraul spin_unlock optimization(i386) the current spin_unlock asm code is lock; btrl $0,%0 it takes ~ 22 ticks on my PII/350. I think it's possible to replace that with movl $0,%0 which would be a simple, pairable single-tick instruction.
Linus Torvalds Re: spin_unlock optimization(i386) It does NOT WORK! Let the FreeBSD people use it, and then get faster timings. They will crash, eventually. […] the above CAN return 1 […] I might be proven wrong, but I don’t think I am.
Erich Boleyn Re: spin_unlock optimization(i386) It will always return 0. […] Erich Boleyn PMD IA32 Architecture Intel
Linus Torvalds Re: spin_unlock optimization(i386) I’m happy.
Everybody has convinced me that yes, the Intel ordering rules are strong enough that all of this really is legal
[119 emails later …]
![Page 92: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/92.jpg)
Memory models in the Linux kernel• New in 2018: a formal Linux kernel memory model
• tools/memory-model/Documentation/explanation.txt • Only 12,000 words!
![Page 93: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/93.jpg)
“Reordering” in computer architecture• Today: memory consistency models
• Ordering of memory accesses to different locations • Visible to programmers!
• Cache coherence protocols • Ordering of memory accesses to the same location • Not visible to programmers
• Out-of-order execution • Ordering of execution of a single thread’s instructions • Significant performance gains from dynamically scheduling • Not visible to programmers
• Except through bugs — Spectre/Meltdown
![Page 94: Memory Consistency Models - courses.cs.washington.edu€¦ · Memory consistency models The short version: • Multiprocessors reorder memory operations in unintuitive, scary ways](https://reader033.vdocuments.site/reader033/viewer/2022052320/5f06fd607e708231d41abe78/html5/thumbnails/94.jpg)
Memory consistency models• Multiprocessors reorder memory
operations in unintuitive, scary ways • This behavior is necessary for performance • Application programmers rarely see this
behavior • But kernel developers see it all the time