Shared Memory Consistency Models: A Tutorial
Sarita V. Adve Kouroush Ghrachorloo
Western Research Laboratory
September 1995
Goals
• Expand intuition about concurrent program behavior
• Explore execution sequences due to compiler or hardware optimizations
• Introduce shared memory consistency models• Explore execution sequences due to a
particular memory model• Demonstrate Memory Barriers (“fences”)
What happens?
Example of a mutual exclusion (“Dekker’s Algorithm”)
Global variables initially: Flag1 = 0, Flag2 = 0
Flag1 = 1If(Flag2 == 0)
Critical section
Flag2 = 1If(Flag1 == 0)
Critical section
P1 P2
Uniprocessor Hardware
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 == 0 Flag 2 == 0
Flag1 = 1If(Flag2 == 0)
Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
Uniprocessor Hardware
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 == 1 Flag 2 == 0
T1Write Flag 1
Flag1 = 1If(Flag2 == 0)
Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
T1 P1 Flag1 = 1
Uniprocessor Hardware
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 == 1 Flag 2 == 0
T1Write Flag 1
T2Read Flag 2
Flag1 = 1If(Flag2 == 0)
Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
Uniprocessor Hardware
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 == 1 Flag 2 == 1
T1Write Flag 1
T2Read Flag 2
Flag1 = 1If(Flag2 == 0)
Critical section
P1
T3Write Flag 2
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
T3 P2 Flag2 = 1
Uniprocessor Hardware
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 == 1 Flag 2 == 1
T1Write Flag 1
T2Read Flag 2
Flag1 = 1If(Flag2 == 0)
Critical section
P1
T3Write Flag 2
T4Read Flag 1
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
T4 P2 Flag1 == 1
T3 P2 Flag2 = 1
Critical Section Protected
Uniprocessor Hardware OptimizationsBuffer (Cache)
• Writes take about 100 cycles• Reads take about 1 cycle• Use Write Buffer Bypass
Uniprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 == 0 Flag 2 == 0
Flag1 = 1If(Flag2 == 0)
Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
Uniprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1
Flag 1 == 0 Flag 2 == 0
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
T1 P1 Flag1 = 1
Uniprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == ?
T1 P1 Flag1 = 1
Uniprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
Uniprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1Flag 2 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1 T3Write Flag 2
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
T3 P2 Flag2 = 1
Uniprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1Flag 2 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1 T3Write Flag 2
T4Read Flag 1
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
T4 P2 Flag1 == ?
T3 P2 Flag2 = 1
Uniprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1Flag 2 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1 T3Write Flag 2
T4Read Flag 1
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
T4 P2 Flag1 == 1
T3 P2 Flag2 = 1
Critical Section Protected
Multiprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 == 0 Flag 2 == 0
Flag1 = 1If(Flag2 == 0)
Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
Shared Bus
Multiprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1
Flag 1 == 0 Flag 2 == 0
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
T1 P1 Flag1 = 1
Shared Bus
Multiprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == ?
T1 P1 Flag1 = 1
Shared Bus
Multiprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
Shared Bus
Multiprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1 T3Write Flag 2
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
T3 P2 Flag2 = 1Flag 2 = 1
Shared Bus
Multiprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1 T3Write Flag 2
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
T3 P2 Flag2 = 1Flag 2 = 1
Shared Bus
T4Read Flag 1
T4 P2 Flag1 == ?
Multiprocessor Hardware OptimizationsWrite Buffer Bypass
Flag2 = 1If(Flag1 == 0)
Critical section
P2
Flag 1 = 1
Flag 1 == 0 Flag 2 == 0
T2Read Flag 2
T1Write Flag 1Flag1 = 1
If(Flag2 == 0)Critical section
P1 T3Write Flag 2
T0Flag 1 = 0 and Flag 2 = 0
T2 P1 Flag2 == 0
T1 P1 Flag1 = 1
T3 P2 Flag2 = 1Flag 2 = 1
Shared Bus
T4Read Flag 1
T4 P2 Flag1 == 0
Both in Critical Section
Producer Consumer
Example of a Producer and Consumer
Global variables initially: Data = 0, Head = 0
Data = 2Head = 1
while(Head == 0);print Data;
P1 P2
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 0 Head == 0
Data = 2Head = 1
P1
T0Data = 0, Head = 0
P1 Head = 1
P1 Data = 2
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 0 Head == 1
T1Write Head = 1
Data = 2Head = 1
P1
T0Data = 0, Head = 0
T1 GI Head = 1
P1 Head = 1
P1 Data = 2
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 0 Head == 1
T1Write Head = 1
Data = 2Head = 1
P1
T0Data = 0, Head = 0
T2 P2 Head == 1
T1 GI Head = 1
T2Read Head = 1
P1 Head = 1
P1 Data = 2
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 0 Head == 1
T1Write Head = 1
Data = 2Head = 1
P1
T3Read Data = 0
T0Data = 0, Head = 0
T2 P2 Head == 1
T1 GI Head = 1
T3 P2 Data == 0T2Read Head = 1
P1 Head = 1
P1 Data = 2
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 2 Head == 1
T1Write Head = 1
Data = 2Head = 1
P1
T3Read Data = 0
T0Data = 0, Head = 0
T2 P2 Head == 1
T1 GI Head = 1
T4 GI Data = 2
T3 P2 Data == 0
Wrong Data
T4Write Data = 2
T2Read Head = 1
P1 Head = 1
P1 Data = 2
What was expected?
Example of a Producer and Consumer
Global variables initially: Data = 0, Head = 0
Data = 2Head = 1
while(Head == 0);print Data;
P1 P2
Simplify Example and the Operations
Simple Program
Global variables initially: A = 0, B = 0
A = 1B = 2
P1print Aprint B
P2
WX
WY
RX
RY
Reason about possible sequencesExpected Output
A = 1B = 2
P1print Aprint B
P2WX
WY
RX
RY
WX WYRXRY
WXRXWYRY
WXRXRY WY
RX WX RY WY
RX WX WY RY
RX RY WX WY
12
12
10
02
00
00
Reason about possible sequences.We get them all?
A = 1B = 2
P1print Aprint B
P2WX
WY
RX
RY
WX WY RX RY
WX RX WY RY
WX RX RY WY
RX WX RY WY
RX WX WY RY
RX RY WX WY
Similar Reasoning
Example of a Producer and Consumer
Global variables initially: Data = 0, Head = 0
Data = 2Head = 1
while(Head == 0);... = Data;
P1 P2
WX
WY
RY
RX
Reason about possible sequences.Expected Outcomes
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WYRYRX
WXRYWYRX
WXRYRX WY
RY WX RX WY
RY WX WY RX
RY RX WX WY
2
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WYRYRX
WXRYWYRX
WXRYRX WY
RY WX RX WY
RY WX WY RX
RY RX WX WY
2
0Expect This?
Reason about possible sequences.Expected Outcomes
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 0 Head == 0
Data = 2Head = 1
P1
T0Data = 0, Head = 0
P1 Head = 1
P1 Data = 2
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 0 Head == 1
T1Write Head = 1
Data = 2Head = 1
P1
T0Data = 0, Head = 0
T1 GI Head = 1
P1 Head = 1
P1 Data = 2
WY
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 0 Head == 1
T1Write Head = 1
Data = 2Head = 1
P1
T0Data = 0, Head = 0
T2 P2 Head == 1
T1 GI Head = 1
T2Read Head = 1
P1 Head = 1
P1 Data = 2
WY RY
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 0 Head == 1
T1Write Head = 1
Data = 2Head = 1
P1
T3Read Data = 0
T0Data = 0, Head = 0
T2 P2 Head == 1
T1 GI Head = 1
T3 P2 Data == 0T2Read Head = 1
P1 Head = 1
P1 Data = 2
WY RY RX
General Interconnect
Multiprocessor Hardware OptimizationsOverlapped Writes
while(Head == 0);print Data
P2
Data == 2 Head == 1
T1Write Head = 1
Data = 2Head = 1
P1
T3Read Data = 0
T0Data = 0, Head = 0
T2 P2 Head == 1
T1 GI Head = 1
T4 GI Data = 2
T3 P2 Data == 0
Wrong Data
T4Write Data = 2
T2Read Head = 1
P1 Head = 1
P1 Data = 2
WY RY RX WX
0
Compiler Optimizations
• Constant Propagation• Register Allocation• Loop Transformation• Instruction Scheduling• Common Subexpression elimination• Et Cetera
More H/W Optimizations
• Speculative Execution• Execution reordering (e.g. pipelining)• Speculative Store• Read to Write reordering• Write to Read reordering• Write to Write reordering• Read to Read reordering• Et Cetera
Possible Outcomes
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WYRYRXWX
WYRXRYWX
WYRXWXRY
RX WX RY WY
RXWYRYWX
RXWY WX RY
0 0 0 0 0 0
WX WYRYRX
WXRYWYRX
WXRYRX WY
RY WX RX WY
RY WX WY RX
RY RX WX WY
2
Get These Too
What’s missing?
A = 1B = 2
P1print Aprint B
P2WX
WY
RX
RY
WX WY RX RY
WX RX WY RY
WX RX RY WY
RX WX RY WY
RX WX WY RY
RX RY WX WY
Simple ProgramAll possible sequences
A = 1B = 2
P1print Aprint B
P2WX
WY
RX
RY
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Dekker’s AlgorithmSimplify the Operations
Example of a mutual exclusion (“Dekker’s Algorithm)
Global variables initially: Flag1 = 0, Flag2 = 0
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2
WX
RY
WY
RX
Dekker’s AlgorithmAll possible sequences
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Example of a Synchronization (“Dekker’s Algorithm”)
Which of these sequences will prevent concurrent execution?
OK OK OK OK OK OK
OK OK OK OK OK OK
Wrong OK OK OK Wrong Wrong
OK OK OK Wrong Wrong Wrong
Dekker’s AlgorithmSequences and Outcomes
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
OK OK OK OK OK OK
OK OK OK OK OK OK
Wrong OK OK OK Wrong Wrong
OK OK OK Wrong Wrong Wrong
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
Need to restrict certain sequences
Dekker’s AlgorithmSequences and Outcomes
OK OK OK OK OK OK
OK OK OK OK OK OK
Wrong OK OK OK Wrong Wrong
OK OK OK Wrong Wrong Wrong
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
Works whenever WX precedes RX or WY precedes RY
Dekker’s AlgorithmSequences and Outcomes
Dekker’s AlgorithmAll possible sequences
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
Works whenever WX precedes RX or WY precedes RY
18 are OK
6 are Wrong
Simple ProgramAll possible sequences
A = 1B = 2
P1print Aprint B
P2WX
WY
RX
RY
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
No ordering requirement
Simple ProgramAll possible sequences
A = 1B = 2
P1print Aprint B
P2WX
WY
RX
RY
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
No ordering requirement
All 24 are “OK”
0 are “Wrong”
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Require WX precede RX and WY precede RY and WY precede RX
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Require WX precede RX and WY precede RY and WY precede RX
5 are OK
19 are Wrong
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
When RY precedes WY, while-RY-loop spins. Eventually we get WY < RY.
5 are OK
19 are Wrong
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
We have RY, RY, RY … Sequences with RY < WY will eventually end with RY
5 are OK
19 are Wrong?
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
We have RY, RY, RY … Sequences with RY < WY will eventually end with RY
5 are OK
19 are Wrong?
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
RY
WX RY RX WY
RY
WX RY WY RX
RYWY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RY
RX RY WY WX
RYRY WX WY RX
RY
RY WX RX WY
RY
RY WY WX RX
RY
RY WY RX WX
RY
RY RX WX WY
RY
RY RX WY WX
RY
We have RY, RY, RY … Sequences with RY < WY will eventually end with RY
5 are OK
19 are Wrong?
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
RY
WX RY RX WY
RY
WX RY WY RX
RYWY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RY
RX RY WY WX
RYRY WX WY RX
RY
RY WX RX WY
RY
RY WY WX RX
RY
RY WY RX WX
RY
RY RX WX WY
RY
RY RX WY WX
RY
We can remove the earlier RY in those sequences.
5 are OK
19 are Wrong?
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX WY RY
WX RX WY RY
WX WY RX RY
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX WY RY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX WX WY RY
RX WY WX RY
WX WY RX RY
WX RX WY RY
WY WX RX RY
WY RX WX RY
RX WX WY RY
RX WY WX RY
Remove all of the duplicated sequences
5 are OK
19 are Wrong?
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX WY RY
RX WY RY WX
RX WY WX RY
Remove all of the duplicated sequences
5 are OK
7 are Wrong
Producer ConsumerPossible sequences w/write acknowledge
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX WY RY
RX WY RY WX
RX WY WX RY
Some H/W provides write acknowledgment (i.e. wait for pending writes to complete)
5 are OK
7 are Wrong
Producer ConsumerPossible sequences w/write acknowledge
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX WY RY
RX WY RY WX
RX WY WX RY
Remove all sequences where WY < WX.
5 are OK
7 are Wrong
Producer ConsumerPossible sequences w/write acknowledge
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
RX WX WY RY
Remove all sequences where WY < WX.
2 are OK
2 are Wrong
Review. What does the H/W provide?
• Reordering of loads and stores – doesn’t help• Write acknowledge – almost helps• Memory Models
Memory Models
Sequential Consistency
Definition: [A multiprocessor system is sequentially consistent if] the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. [Lamport 1979]
Pros
Cons
• Simple view of program• OK for Uniprocessor environments
• Not OK for Multiprocessor environments• Too restrictive for processor performance
Memory Models
Relaxed Consistency
Description: Relaxed memory consistency models are already implemented on the multiprocessors available. They specify what memory operations may be expected to be reordered by the hardware.
• Write to Read• Write to Write• Read to Read / Write• Read Others Write Early• Read Own Early
They all have methods to force a particular ordering and these are known as the
Safety Net
Available Relaxed Memory Models
SYNCPowerPC
various MEMBARsRMO
MB, WMBAlpha
release, acquire, nsync, RMW
RCpc
release, acquire, nsync, RMW
RCsc
synchronizationWO
RMW, STBARPSO
RMWPC
RMWTSO
serialization instructions
IBM 370
Safety NetRead Own Write Early
Read Others’ Write Early
R RW Order
W W Order
W R Order
Relaxation:
Producer ConsumerRelaxed W->R memory model
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Which of these sequences can be expected with all the memory models listed?
Producer ConsumerAll sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Require WX precede RX and WY precede RY and WY precede RX
Producer ConsumerPossible sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX WY RY
RX WY RY WX
RX WY WX RY
Require WX precede RX and WY precede RY and WY precede RX
5 are OK
7 are Wrong
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RX RY
WX WY RY RX
WX RX WY RY
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX WY RY
RX WY RY WX
RX WY WX RY
Start with Sequential Consistency
5 are OK
7 are Wrong
Producer Consumerwith sequential consistency
Producer Consumerwith sequential consistency
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RY RX
Start with Sequential Consistency
1 is OK
0 are Wrong
Producer ConsumerRelaxed W->R ordering sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RY RX
Add sequences due to the relaxation of W->R ordering
1 is OK
0 are Wrong
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RY RX
No change
1 is OK
0 are Wrong
Producer ConsumerRelaxed W->R ordering sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
WX WY RY RX
Most processors have relaxed w->w orderings also.
1 is OK
0 are Wrong
Producer ConsumerRelaxed W->R, and W->W ordering sequences
Producer ConsumerRelaxed W->R, and W->W ordering sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
Started with sequential consistency, then added relaxed w->r and w->w orderings
3 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Dekker’s AlgorithmRelaxed W->R memory model
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Which of these sequences can be expected with all the memory models listed?
Dekker’s AlgorithmRelaxed W->R ordering sequences
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
Works whenever WX precedes RX or WY precedes RY
18 are OK
6 are Wrong
Dekker’s AlgorithmRelaxed W->R ordering sequences
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
Start with Sequential Consistency
18 are OK
6 are Wrong
Dekker’s AlgorithmRelaxed W->R ordering sequences
WX WY RX RY
WX WY RY RX
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
Start with Sequential Consistency
6 are OK
0 are Wrong
Dekker’s AlgorithmRelaxed W->R ordering sequences
WX WY RX RY
WX WY RY RX
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
Add sequences due to relaxed memory model
6 are OK
0 are Wrong
Dekker’s AlgorithmRelaxed W->R ordering sequences
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
Add sequences due to relaxed memory model
18 are OK
6 are Wrong
Safety Nets
• Atomic instruction (RMW)• Code delineation (serialization instructions)• Synchronization instructions (SYNC)• Identify Data and Synch operations (Weak
Ordering model, and Release Consistency model)
• Memory Bars (aka “fences”)
Producer Consumerw/Fence
Insert a memory barrier between the instructions we want ordered.
Global variables initially: Data = 0, Head = 0
Data = 2Head = 1
while(Head == 0);... = Data;
P1 P2
WX
WY
RY
RX
Producer Consumerw/Fence
Example of a Producer and Consumer with a Memory Barrier applied.
Global variables initially: Data = 0, Head = 0
Data = 2memory_barrier
Head = 1
while(Head == 0);memory_barrier
... = Data;
P1 P2
WX
WY
RY
RX
All memory operations before the memory
barrier must complete before proceeding to
memory operations after the memory barrier.
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Data = 2Head = 1
P1while(Head == 0);
print Data;
P2WX
WY
RY
RX
Started with sequential consistency, then added relaxed w->r and w->w orderings
3 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Data = 2memory_barrier
Head = 1
P1 while(Head == 0);memory_barrier
print Data;
P2WX
WY
RY
RX
Add memory barriers to force WX < WY and RY < RX
3 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Data = 2memory_barrier
Head = 1
P1 while(Head == 0);memory_barrier
print Data;
P2WX
WY
RY
RX
Looks the same.
3 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Data = 2memory_barrier
Head = 1
P1 while(Head == 0);memory_barrier
print Data;
P2WX
WY
RY
RX
With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible.
3 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Data = 2memory_barrier
Head = 1
P1 while(Head == 0);memory_barrier
print Data;
P2WX
WY
RY
RX
Due to MB, RY < RX is enforced
2 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Data = 2memory_barrier
Head = 1
P1 while(Head == 0);memory_barrier
print Data;
P2WX
WY
RY
RX
WY < RY < MB. while-RY-loop waits for WY.
3 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Data = 2memory_barrier
Head = 1
P1 while(Head == 0);memory_barrier
print Data;
P2WX
WY
RY
RX
Due to MB, WX < WY is enforced
3 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Data = 2memory_barrier
Head = 1
P1 while(Head == 0);memory_barrier
print Data;
P2WX
WY
RY
RX
WX < WY and WY < RY and RY < RX is enforced therefore WX < RX is enforced
3 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Data = 2memory_barrier
Head = 1
P1 while(Head == 0);memory_barrier
print Data;
P2WX
WY
RY
RX
With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible.
3 are OK
1 is Wrong
WX WY RY RX
WY WX RY RX
WY RY WX RX
WY RY RX WX
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Data = 2memory_barrier
Head = 1
P1 while(Head == 0);memory_barrier
print Data;
P2WX
WY
RY
RX
With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible.
WX WY RY RX
WY WX RY RX
WY RY WX RX
3 are OK
0 are Wrong
Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences
Dekker’s Algorithmw/Fence
Example of a mutual exclusion (“Dekker’s Algorithm)
Global variables initially: Flag1 = 0, Flag2 = 0
Flag1 = 1memory_barrier
If(Flag2 == 0)Critical section
Flag2 = 1memory_barrier
If(Flag1 == 0)Critical section
P1 P2
WX
WY
RY
RX
All memory operations before the memory
barrier must complete before proceeding to
memory operations after the memory barrier.
Dekker’s Algorithm w/FenceRelaxed W->R ordering sequences
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Flag1 = 1If(Flag2 == 0)
Critical section
P1 Flag2 = 1If(Flag1 == 0)
Critical section
P2WX
RY
WY
RX
Started with sequential consistency, then added relaxed w->r orderings
18 are OK
6 are Wrong
Dekker’s Algorithm w/FenceRelaxed W->R ordering sequences
WX WY RX RY
WX WY RY RX
WX RX WY RY
WX RX RY WY
WX RY RX WY
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
WY RY WX RX
WY RX RY WX
WY RY RX WX
RX WX RY WY
RX WX WY RY
RX WY RY WX
RX WY WX RY
RX RY WX WY
RX RY WY WX
RY WX WY RX
RY WX RX WY
RY WY WX RX
RY WY RX WX
RY RX WX WY
RY RX WY WX
Add memory barriers to force WX < WY and RY < RX
18 are OK
6 are Wrong
Flag1 = 1memory_barrier
If(Flag2 == 0)Critical section
P1 Flag2 = 1memory_barrier
If(Flag1 == 0)Critical section
P2WX
WY
RY
RX
Dekker’s Algorithm w/FenceRelaxed W->R ordering sequences
Add memory barriers to force WX < RY and WY < RX
6 are OK
0 are Wrong
Flag1 = 1memory_barrier
If(Flag2 == 0)Critical section
P1 Flag2 = 1memory_barrier
If(Flag1 == 0)Critical section
P2WX
RY
WY
RX
WX WY RX RY
WX WY RY RX
WX RY WY RX
WY WX RX RY
WY WX RY RX
WY RX WX RY
Serialization of Writes (Fig 6)w/Fence
Insert a memory barrier between the instructions we want ordered.
Global variables initially: A = 0, B = 0, C= 0
A = 1B = 2
P1
WX
WY
while(B != 1);while(C != 1);Register1 = A
P3
RY
RZ
A = 2C = 1
P2
WX
WZ
while(B != 1);while(C != 1);Register2 = A
P4
RY
RZ
W1 W2
Higher Level Abstractions
• Lower level of complexity• Explicit Parallel Constructs– Fortran 90– MPI
Conclusion
• The Uniprocessor programming model is simple, but does not work on Multiprocessors
• Hardware and compilers make many optimizations that reorder loads and stores
• Memory models exist on the hardware and need to be considered for program correctness
• The Sequential Consistency model was considered for concurrent programs on the Uniprocessor
• Relaxed Memory Consistency models are considered on the Multiprocessor because SC is too restrictive for hardware performance.
• Use memory barriers (fences) to override relaxed memory model when ordering between memory operations must be maintained.
Other Processors
R to RW to RW to WR to W