shared memory consistency models: a tutorial sarita v. adve kouroush ghrachorloo western research...

106
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Upload: gavin-gilbert

Post on 29-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Shared Memory Consistency Models: A Tutorial

Sarita V. Adve Kouroush Ghrachorloo

Western Research Laboratory

September 1995

Page 2: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Goals

• Expand intuition about concurrent program behavior

• Explore execution sequences due to compiler or hardware optimizations

• Introduce shared memory consistency models• Explore execution sequences due to a

particular memory model• Demonstrate Memory Barriers (“fences”)

Page 3: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

What happens?

Example of a mutual exclusion (“Dekker’s Algorithm”)

Global variables initially: Flag1 = 0, Flag2 = 0

Flag1 = 1If(Flag2 == 0)

Critical section

Flag2 = 1If(Flag1 == 0)

Critical section

P1 P2

Page 4: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 == 0 Flag 2 == 0

Flag1 = 1If(Flag2 == 0)

Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

Page 5: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 == 1 Flag 2 == 0

T1Write Flag 1

Flag1 = 1If(Flag2 == 0)

Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

T1 P1 Flag1 = 1

Page 6: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 == 1 Flag 2 == 0

T1Write Flag 1

T2Read Flag 2

Flag1 = 1If(Flag2 == 0)

Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

Page 7: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 == 1 Flag 2 == 1

T1Write Flag 1

T2Read Flag 2

Flag1 = 1If(Flag2 == 0)

Critical section

P1

T3Write Flag 2

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

T3 P2 Flag2 = 1

Page 8: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 == 1 Flag 2 == 1

T1Write Flag 1

T2Read Flag 2

Flag1 = 1If(Flag2 == 0)

Critical section

P1

T3Write Flag 2

T4Read Flag 1

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

T4 P2 Flag1 == 1

T3 P2 Flag2 = 1

Critical Section Protected

Page 9: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware OptimizationsBuffer (Cache)

• Writes take about 100 cycles• Reads take about 1 cycle• Use Write Buffer Bypass

Page 10: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 == 0 Flag 2 == 0

Flag1 = 1If(Flag2 == 0)

Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

Page 11: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1

Flag 1 == 0 Flag 2 == 0

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

T1 P1 Flag1 = 1

Page 12: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == ?

T1 P1 Flag1 = 1

Page 13: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

Page 14: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1Flag 2 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1 T3Write Flag 2

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

T3 P2 Flag2 = 1

Page 15: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1Flag 2 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1 T3Write Flag 2

T4Read Flag 1

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

T4 P2 Flag1 == ?

T3 P2 Flag2 = 1

Page 16: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Uniprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1Flag 2 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1 T3Write Flag 2

T4Read Flag 1

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

T4 P2 Flag1 == 1

T3 P2 Flag2 = 1

Critical Section Protected

Page 17: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Multiprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 == 0 Flag 2 == 0

Flag1 = 1If(Flag2 == 0)

Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

Shared Bus

Page 18: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Multiprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1

Flag 1 == 0 Flag 2 == 0

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

T1 P1 Flag1 = 1

Shared Bus

Page 19: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Multiprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == ?

T1 P1 Flag1 = 1

Shared Bus

Page 20: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Multiprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

Shared Bus

Page 21: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Multiprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1 T3Write Flag 2

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

T3 P2 Flag2 = 1Flag 2 = 1

Shared Bus

Page 22: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Multiprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1 T3Write Flag 2

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

T3 P2 Flag2 = 1Flag 2 = 1

Shared Bus

T4Read Flag 1

T4 P2 Flag1 == ?

Page 23: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Multiprocessor Hardware OptimizationsWrite Buffer Bypass

Flag2 = 1If(Flag1 == 0)

Critical section

P2

Flag 1 = 1

Flag 1 == 0 Flag 2 == 0

T2Read Flag 2

T1Write Flag 1Flag1 = 1

If(Flag2 == 0)Critical section

P1 T3Write Flag 2

T0Flag 1 = 0 and Flag 2 = 0

T2 P1 Flag2 == 0

T1 P1 Flag1 = 1

T3 P2 Flag2 = 1Flag 2 = 1

Shared Bus

T4Read Flag 1

T4 P2 Flag1 == 0

Both in Critical Section

Page 24: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer Consumer

Example of a Producer and Consumer

Global variables initially: Data = 0, Head = 0

Data = 2Head = 1

while(Head == 0);print Data;

P1 P2

Page 25: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 0 Head == 0

Data = 2Head = 1

P1

T0Data = 0, Head = 0

P1 Head = 1

P1 Data = 2

Page 26: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 0 Head == 1

T1Write Head = 1

Data = 2Head = 1

P1

T0Data = 0, Head = 0

T1 GI Head = 1

P1 Head = 1

P1 Data = 2

Page 27: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 0 Head == 1

T1Write Head = 1

Data = 2Head = 1

P1

T0Data = 0, Head = 0

T2 P2 Head == 1

T1 GI Head = 1

T2Read Head = 1

P1 Head = 1

P1 Data = 2

Page 28: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 0 Head == 1

T1Write Head = 1

Data = 2Head = 1

P1

T3Read Data = 0

T0Data = 0, Head = 0

T2 P2 Head == 1

T1 GI Head = 1

T3 P2 Data == 0T2Read Head = 1

P1 Head = 1

P1 Data = 2

Page 29: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 2 Head == 1

T1Write Head = 1

Data = 2Head = 1

P1

T3Read Data = 0

T0Data = 0, Head = 0

T2 P2 Head == 1

T1 GI Head = 1

T4 GI Data = 2

T3 P2 Data == 0

Wrong Data

T4Write Data = 2

T2Read Head = 1

P1 Head = 1

P1 Data = 2

Page 30: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

What was expected?

Example of a Producer and Consumer

Global variables initially: Data = 0, Head = 0

Data = 2Head = 1

while(Head == 0);print Data;

P1 P2

Page 31: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Simplify Example and the Operations

Simple Program

Global variables initially: A = 0, B = 0

A = 1B = 2

P1print Aprint B

P2

WX

WY

RX

RY

Page 32: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Reason about possible sequencesExpected Output

A = 1B = 2

P1print Aprint B

P2WX

WY

RX

RY

WX WYRXRY

WXRXWYRY

WXRXRY WY

RX WX RY WY

RX WX WY RY

RX RY WX WY

12

12

10

02

00

00

Page 33: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Reason about possible sequences.We get them all?

A = 1B = 2

P1print Aprint B

P2WX

WY

RX

RY

WX WY RX RY

WX RX WY RY

WX RX RY WY

RX WX RY WY

RX WX WY RY

RX RY WX WY

Page 34: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Similar Reasoning

Example of a Producer and Consumer

Global variables initially: Data = 0, Head = 0

Data = 2Head = 1

while(Head == 0);... = Data;

P1 P2

WX

WY

RY

RX

Page 35: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Reason about possible sequences.Expected Outcomes

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WYRYRX

WXRYWYRX

WXRYRX WY

RY WX RX WY

RY WX WY RX

RY RX WX WY

2

Page 36: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WYRYRX

WXRYWYRX

WXRYRX WY

RY WX RX WY

RY WX WY RX

RY RX WX WY

2

0Expect This?

Reason about possible sequences.Expected Outcomes

Page 37: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 0 Head == 0

Data = 2Head = 1

P1

T0Data = 0, Head = 0

P1 Head = 1

P1 Data = 2

Page 38: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 0 Head == 1

T1Write Head = 1

Data = 2Head = 1

P1

T0Data = 0, Head = 0

T1 GI Head = 1

P1 Head = 1

P1 Data = 2

WY

Page 39: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 0 Head == 1

T1Write Head = 1

Data = 2Head = 1

P1

T0Data = 0, Head = 0

T2 P2 Head == 1

T1 GI Head = 1

T2Read Head = 1

P1 Head = 1

P1 Data = 2

WY RY

Page 40: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 0 Head == 1

T1Write Head = 1

Data = 2Head = 1

P1

T3Read Data = 0

T0Data = 0, Head = 0

T2 P2 Head == 1

T1 GI Head = 1

T3 P2 Data == 0T2Read Head = 1

P1 Head = 1

P1 Data = 2

WY RY RX

Page 41: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

General Interconnect

Multiprocessor Hardware OptimizationsOverlapped Writes

while(Head == 0);print Data

P2

Data == 2 Head == 1

T1Write Head = 1

Data = 2Head = 1

P1

T3Read Data = 0

T0Data = 0, Head = 0

T2 P2 Head == 1

T1 GI Head = 1

T4 GI Data = 2

T3 P2 Data == 0

Wrong Data

T4Write Data = 2

T2Read Head = 1

P1 Head = 1

P1 Data = 2

WY RY RX WX

0

Page 42: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Compiler Optimizations

• Constant Propagation• Register Allocation• Loop Transformation• Instruction Scheduling• Common Subexpression elimination• Et Cetera

Page 43: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

More H/W Optimizations

• Speculative Execution• Execution reordering (e.g. pipelining)• Speculative Store• Read to Write reordering• Write to Read reordering• Write to Write reordering• Read to Read reordering• Et Cetera

Page 44: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Possible Outcomes

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WYRYRXWX

WYRXRYWX

WYRXWXRY

RX WX RY WY

RXWYRYWX

RXWY WX RY

0 0 0 0 0 0

WX WYRYRX

WXRYWYRX

WXRYRX WY

RY WX RX WY

RY WX WY RX

RY RX WX WY

2

Get These Too

Page 45: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

What’s missing?

A = 1B = 2

P1print Aprint B

P2WX

WY

RX

RY

WX WY RX RY

WX RX WY RY

WX RX RY WY

RX WX RY WY

RX WX WY RY

RX RY WX WY

Page 46: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Simple ProgramAll possible sequences

A = 1B = 2

P1print Aprint B

P2WX

WY

RX

RY

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Page 47: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s AlgorithmSimplify the Operations

Example of a mutual exclusion (“Dekker’s Algorithm)

Global variables initially: Flag1 = 0, Flag2 = 0

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2

WX

RY

WY

RX

Page 48: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s AlgorithmAll possible sequences

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Example of a Synchronization (“Dekker’s Algorithm”)

Which of these sequences will prevent concurrent execution?

Page 49: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

OK OK OK OK OK OK

OK OK OK OK OK OK

Wrong OK OK OK Wrong Wrong

OK OK OK Wrong Wrong Wrong

Dekker’s AlgorithmSequences and Outcomes

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Page 50: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

OK OK OK OK OK OK

OK OK OK OK OK OK

Wrong OK OK OK Wrong Wrong

OK OK OK Wrong Wrong Wrong

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Need to restrict certain sequences

Dekker’s AlgorithmSequences and Outcomes

Page 51: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

OK OK OK OK OK OK

OK OK OK OK OK OK

Wrong OK OK OK Wrong Wrong

OK OK OK Wrong Wrong Wrong

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Works whenever WX precedes RX or WY precedes RY

Dekker’s AlgorithmSequences and Outcomes

Page 52: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s AlgorithmAll possible sequences

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Works whenever WX precedes RX or WY precedes RY

18 are OK

6 are Wrong

Page 53: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Simple ProgramAll possible sequences

A = 1B = 2

P1print Aprint B

P2WX

WY

RX

RY

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

No ordering requirement

Page 54: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Simple ProgramAll possible sequences

A = 1B = 2

P1print Aprint B

P2WX

WY

RX

RY

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

No ordering requirement

All 24 are “OK”

0 are “Wrong”

Page 55: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Require WX precede RX and WY precede RY and WY precede RX

Page 56: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Require WX precede RX and WY precede RY and WY precede RX

5 are OK

19 are Wrong

Page 57: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

When RY precedes WY, while-RY-loop spins. Eventually we get WY < RY.

5 are OK

19 are Wrong

Page 58: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

We have RY, RY, RY … Sequences with RY < WY will eventually end with RY

5 are OK

19 are Wrong?

Page 59: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

We have RY, RY, RY … Sequences with RY < WY will eventually end with RY

5 are OK

19 are Wrong?

Page 60: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

RY

WX RY RX WY

RY

WX RY WY RX

RYWY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RY

RX RY WY WX

RYRY WX WY RX

RY

RY WX RX WY

RY

RY WY WX RX

RY

RY WY RX WX

RY

RY RX WX WY

RY

RY RX WY WX

RY

We have RY, RY, RY … Sequences with RY < WY will eventually end with RY

5 are OK

19 are Wrong?

Page 61: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

RY

WX RY RX WY

RY

WX RY WY RX

RYWY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RY

RX RY WY WX

RYRY WX WY RX

RY

RY WX RX WY

RY

RY WY WX RX

RY

RY WY RX WX

RY

RY RX WX WY

RY

RY RX WY WX

RY

We can remove the earlier RY in those sequences.

5 are OK

19 are Wrong?

Page 62: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX WY RY

WX RX WY RY

WX WY RX RY

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX WY RY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX WX WY RY

RX WY WX RY

WX WY RX RY

WX RX WY RY

WY WX RX RY

WY RX WX RY

RX WX WY RY

RX WY WX RY

Remove all of the duplicated sequences

5 are OK

19 are Wrong?

Page 63: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX WY RY

RX WY RY WX

RX WY WX RY

Remove all of the duplicated sequences

5 are OK

7 are Wrong

Page 64: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerPossible sequences w/write acknowledge

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX WY RY

RX WY RY WX

RX WY WX RY

Some H/W provides write acknowledgment (i.e. wait for pending writes to complete)

5 are OK

7 are Wrong

Page 65: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerPossible sequences w/write acknowledge

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX WY RY

RX WY RY WX

RX WY WX RY

Remove all sequences where WY < WX.

5 are OK

7 are Wrong

Page 66: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerPossible sequences w/write acknowledge

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

RX WX WY RY

Remove all sequences where WY < WX.

2 are OK

2 are Wrong

Page 67: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Review. What does the H/W provide?

• Reordering of loads and stores – doesn’t help• Write acknowledge – almost helps• Memory Models

Page 68: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Memory Models

Sequential Consistency

Definition: [A multiprocessor system is sequentially consistent if] the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. [Lamport 1979]

Pros

Cons

• Simple view of program• OK for Uniprocessor environments

• Not OK for Multiprocessor environments• Too restrictive for processor performance

Page 69: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Memory Models

Relaxed Consistency

Description: Relaxed memory consistency models are already implemented on the multiprocessors available. They specify what memory operations may be expected to be reordered by the hardware.

• Write to Read• Write to Write• Read to Read / Write• Read Others Write Early• Read Own Early

They all have methods to force a particular ordering and these are known as the

Safety Net

Page 70: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Available Relaxed Memory Models

SYNCPowerPC

various MEMBARsRMO

MB, WMBAlpha

release, acquire, nsync, RMW

RCpc

release, acquire, nsync, RMW

RCsc

synchronizationWO

RMW, STBARPSO

RMWPC

RMWTSO

serialization instructions

IBM 370

Safety NetRead Own Write Early

Read Others’ Write Early

R RW Order

W W Order

W R Order

Relaxation:

Page 71: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerRelaxed W->R memory model

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Which of these sequences can be expected with all the memory models listed?

Page 72: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerAll sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Require WX precede RX and WY precede RY and WY precede RX

Page 73: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerPossible sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX WY RY

RX WY RY WX

RX WY WX RY

Require WX precede RX and WY precede RY and WY precede RX

5 are OK

7 are Wrong

Page 74: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RX RY

WX WY RY RX

WX RX WY RY

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX WY RY

RX WY RY WX

RX WY WX RY

Start with Sequential Consistency

5 are OK

7 are Wrong

Producer Consumerwith sequential consistency

Page 75: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer Consumerwith sequential consistency

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RY RX

Start with Sequential Consistency

1 is OK

0 are Wrong

Page 76: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerRelaxed W->R ordering sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RY RX

Add sequences due to the relaxation of W->R ordering

1 is OK

0 are Wrong

Page 77: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RY RX

No change

1 is OK

0 are Wrong

Producer ConsumerRelaxed W->R ordering sequences

Page 78: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

WX WY RY RX

Most processors have relaxed w->w orderings also.

1 is OK

0 are Wrong

Producer ConsumerRelaxed W->R, and W->W ordering sequences

Page 79: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer ConsumerRelaxed W->R, and W->W ordering sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

Started with sequential consistency, then added relaxed w->r and w->w orderings

3 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Page 80: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s AlgorithmRelaxed W->R memory model

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Which of these sequences can be expected with all the memory models listed?

Page 81: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s AlgorithmRelaxed W->R ordering sequences

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Works whenever WX precedes RX or WY precedes RY

18 are OK

6 are Wrong

Page 82: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s AlgorithmRelaxed W->R ordering sequences

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Start with Sequential Consistency

18 are OK

6 are Wrong

Page 83: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s AlgorithmRelaxed W->R ordering sequences

WX WY RX RY

WX WY RY RX

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Start with Sequential Consistency

6 are OK

0 are Wrong

Page 84: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s AlgorithmRelaxed W->R ordering sequences

WX WY RX RY

WX WY RY RX

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Add sequences due to relaxed memory model

6 are OK

0 are Wrong

Page 85: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s AlgorithmRelaxed W->R ordering sequences

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Add sequences due to relaxed memory model

18 are OK

6 are Wrong

Page 86: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Safety Nets

• Atomic instruction (RMW)• Code delineation (serialization instructions)• Synchronization instructions (SYNC)• Identify Data and Synch operations (Weak

Ordering model, and Release Consistency model)

• Memory Bars (aka “fences”)

Page 87: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer Consumerw/Fence

Insert a memory barrier between the instructions we want ordered.

Global variables initially: Data = 0, Head = 0

Data = 2Head = 1

while(Head == 0);... = Data;

P1 P2

WX

WY

RY

RX

Page 88: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer Consumerw/Fence

Example of a Producer and Consumer with a Memory Barrier applied.

Global variables initially: Data = 0, Head = 0

Data = 2memory_barrier

Head = 1

while(Head == 0);memory_barrier

... = Data;

P1 P2

WX

WY

RY

RX

All memory operations before the memory

barrier must complete before proceeding to

memory operations after the memory barrier.

Page 89: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Data = 2Head = 1

P1while(Head == 0);

print Data;

P2WX

WY

RY

RX

Started with sequential consistency, then added relaxed w->r and w->w orderings

3 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Page 90: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Data = 2memory_barrier

Head = 1

P1 while(Head == 0);memory_barrier

print Data;

P2WX

WY

RY

RX

Add memory barriers to force WX < WY and RY < RX

3 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Page 91: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2memory_barrier

Head = 1

P1 while(Head == 0);memory_barrier

print Data;

P2WX

WY

RY

RX

Looks the same.

3 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Page 92: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2memory_barrier

Head = 1

P1 while(Head == 0);memory_barrier

print Data;

P2WX

WY

RY

RX

With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible.

3 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Page 93: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2memory_barrier

Head = 1

P1 while(Head == 0);memory_barrier

print Data;

P2WX

WY

RY

RX

Due to MB, RY < RX is enforced

2 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Page 94: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2memory_barrier

Head = 1

P1 while(Head == 0);memory_barrier

print Data;

P2WX

WY

RY

RX

WY < RY < MB. while-RY-loop waits for WY.

3 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Page 95: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2memory_barrier

Head = 1

P1 while(Head == 0);memory_barrier

print Data;

P2WX

WY

RY

RX

Due to MB, WX < WY is enforced

3 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Page 96: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2memory_barrier

Head = 1

P1 while(Head == 0);memory_barrier

print Data;

P2WX

WY

RY

RX

WX < WY and WY < RY and RY < RX is enforced therefore WX < RX is enforced

3 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Page 97: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2memory_barrier

Head = 1

P1 while(Head == 0);memory_barrier

print Data;

P2WX

WY

RY

RX

With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible.

3 are OK

1 is Wrong

WX WY RY RX

WY WX RY RX

WY RY WX RX

WY RY RX WX

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Page 98: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Data = 2memory_barrier

Head = 1

P1 while(Head == 0);memory_barrier

print Data;

P2WX

WY

RY

RX

With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible.

WX WY RY RX

WY WX RY RX

WY RY WX RX

3 are OK

0 are Wrong

Producer Consumer w/FenceRelaxed W->R, and W->W ordering sequences

Page 99: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s Algorithmw/Fence

Example of a mutual exclusion (“Dekker’s Algorithm)

Global variables initially: Flag1 = 0, Flag2 = 0

Flag1 = 1memory_barrier

If(Flag2 == 0)Critical section

Flag2 = 1memory_barrier

If(Flag1 == 0)Critical section

P1 P2

WX

WY

RY

RX

All memory operations before the memory

barrier must complete before proceeding to

memory operations after the memory barrier.

Page 100: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s Algorithm w/FenceRelaxed W->R ordering sequences

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Flag1 = 1If(Flag2 == 0)

Critical section

P1 Flag2 = 1If(Flag1 == 0)

Critical section

P2WX

RY

WY

RX

Started with sequential consistency, then added relaxed w->r orderings

18 are OK

6 are Wrong

Page 101: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s Algorithm w/FenceRelaxed W->R ordering sequences

WX WY RX RY

WX WY RY RX

WX RX WY RY

WX RX RY WY

WX RY RX WY

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

WY RY WX RX

WY RX RY WX

WY RY RX WX

RX WX RY WY

RX WX WY RY

RX WY RY WX

RX WY WX RY

RX RY WX WY

RX RY WY WX

RY WX WY RX

RY WX RX WY

RY WY WX RX

RY WY RX WX

RY RX WX WY

RY RX WY WX

Add memory barriers to force WX < WY and RY < RX

18 are OK

6 are Wrong

Flag1 = 1memory_barrier

If(Flag2 == 0)Critical section

P1 Flag2 = 1memory_barrier

If(Flag1 == 0)Critical section

P2WX

WY

RY

RX

Page 102: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Dekker’s Algorithm w/FenceRelaxed W->R ordering sequences

Add memory barriers to force WX < RY and WY < RX

6 are OK

0 are Wrong

Flag1 = 1memory_barrier

If(Flag2 == 0)Critical section

P1 Flag2 = 1memory_barrier

If(Flag1 == 0)Critical section

P2WX

RY

WY

RX

WX WY RX RY

WX WY RY RX

WX RY WY RX

WY WX RX RY

WY WX RY RX

WY RX WX RY

Page 103: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Serialization of Writes (Fig 6)w/Fence

Insert a memory barrier between the instructions we want ordered.

Global variables initially: A = 0, B = 0, C= 0

A = 1B = 2

P1

WX

WY

while(B != 1);while(C != 1);Register1 = A

P3

RY

RZ

A = 2C = 1

P2

WX

WZ

while(B != 1);while(C != 1);Register2 = A

P4

RY

RZ

W1 W2

Page 104: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Higher Level Abstractions

• Lower level of complexity• Explicit Parallel Constructs– Fortran 90– MPI

Page 105: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Conclusion

• The Uniprocessor programming model is simple, but does not work on Multiprocessors

• Hardware and compilers make many optimizations that reorder loads and stores

• Memory models exist on the hardware and need to be considered for program correctness

• The Sequential Consistency model was considered for concurrent programs on the Uniprocessor

• Relaxed Memory Consistency models are considered on the Multiprocessor because SC is too restrictive for hardware performance.

• Use memory barriers (fences) to override relaxed memory model when ordering between memory operations must be maintained.

Page 106: Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Other Processors

R to RW to RW to WR to W