![Page 1: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/1.jpg)
Operating SystemsThread Synchronization: Advanced Topics
Thomas Ropars
2021
1
![Page 2: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/2.jpg)
References
The content of these lectures is inspired by:
• The lecture notes of Prof. Andre Schiper.
• The lecture notes of Prof. David Mazieres.
• Operating Systems: Three Easy Pieces by R. Arpaci-Dusseauand A. Arpaci-Dusseau
• A Primer on Memory Consistency and Cache Coherence by D.Sorin, M. Hill and D. Wood.
2
![Page 3: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/3.jpg)
Included in this lecture
Several concepts related to concurrent programming
• PerformanceI Amdahl’s law
• Characteristics of shared memoryI CoherenceI Consistency (SC, TSO)
• Correctness of algorithmsI Data race / race conditionI Avoiding deadlocks
3
![Page 4: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/4.jpg)
Agenda
Efficiency of concurrent code
Consistency and Coherence
Memory models
Data Races
Other topics
4
![Page 5: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/5.jpg)
Agenda
Efficiency of concurrent code
Consistency and Coherence
Memory models
Data Races
Other topics
5
![Page 6: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/6.jpg)
First a reminder
Threads have two main purposes:
• Overlapping computation and I/Os
• Taking advantage of multicore (multi-)processors forcompute-intensive code
The performance improvement that can be expected from usingmultiple threads is not infinite.
6
![Page 7: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/7.jpg)
Amdahl’s Law
Speedup of parallel code is limited by the sequential part of thecode:
T (n) = (1− p) · T (1) +p
n· T (1)
where:
• T (x) is the time it takes to complete the task on x CPUs
• p is the portion of time spent on parallel code in a sequentialexecution
• n is the number of CPUs
Even with massive number of CPUs:
limn→∞
T (n) = (1− p) · T (1)
7
![Page 8: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/8.jpg)
Amdahl’s Law
Figure: Speedup as a function of the number of CPUs1
1“AmdahlsLaw” by Daniels220 at English Wikipedia8
![Page 9: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/9.jpg)
Efficient code
Critical sections are pieces of code that are executedsequentially!
Finding the right granularity for critical sections:
• Correctness always comes first!
• Try to reduce the size of the critical sections that are executedmost often
• Do not forget that lock/unlock operations also take time
9
![Page 10: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/10.jpg)
Agenda
Efficiency of concurrent code
Consistency and Coherence
Memory models
Data Races
Other topics
10
![Page 11: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/11.jpg)
A modern multicore processor
core 0 core 1 core 2 core 3
Interconnection Network
Memory Controller
Main Memory
11
![Page 12: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/12.jpg)
A modern multicore processor with caches
core 0 core 1 core 2 core 3
Interconnection Network
Memory Controller
Main Memory
cache
cont.cache
cache
cont.cache
cache
cont.cache
cache
cont.cache
Shared Cache (LLC)
12
![Page 13: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/13.jpg)
A modern multicore processor with caches
Cache• Goal: Reducing memory access latency
• Mandatory for good performance
• Stores a copy of the last accessed cache lines
• Multiple level of private cache + a larger shared cache
Network interconnect• Legacy architectures based on a bus
• Modern CPUs use a multi-dimensional network (mesh)
13
![Page 14: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/14.jpg)
Memory system properties
Basic properties
• Each core read (load) and write (store) to a single sharedaddress space.
• Size of loads and stores is typically 1 to 8 bytes on 64-bitsystems.
• Cache blocks size is 64 bytes on common architectures (cacheline).
Questions
• How is correctness defined for such a system?
• How do caches impact correctness?
14
![Page 15: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/15.jpg)
Memory system propertiesConsistency (Memory Model)
• Defines correct shared memory behavior in terms of loads andstoresI What value a load is allowed to return.
• Specifies the allowed behavior of multi-threaded programsexecuting with shared memoryI Pb: multiple executions can be correct
• Defines ordering across memory addresses
Coherence• Ensures that load/store behavior remains consistent despite
cachesI The cache coherence protocol makes caches logically invisible
to the user.
• Deals with the case where multiple cores access the samememory address.
15
![Page 16: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/16.jpg)
Memory system propertiesConsistency (Memory Model)
• Defines correct shared memory behavior in terms of loads andstoresI What value a load is allowed to return.
• Specifies the allowed behavior of multi-threaded programsexecuting with shared memoryI Pb: multiple executions can be correct
• Defines ordering across memory addresses
Coherence• Ensures that load/store behavior remains consistent despite
cachesI The cache coherence protocol makes caches logically invisible
to the user.
• Deals with the case where multiple cores access the samememory address.
15
![Page 17: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/17.jpg)
Cache coherenceSorin, Hill and Wood
Single-Writer, Multiple-Reader (SWMR) invariant
For any given memory location, at any given moment in time,there is either a single core that may write it (and that may alsoread it) (read-write epoch) or some number of cores that may readit (read epoch).
Data-Value InvariantThe value of the memory location at the start of an epoch is thesame as the value of the memory location at the end of its lastread–write epoch.
Implementation
• Invalidate protocols (outside the scope of this course)
16
![Page 18: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/18.jpg)
Agenda
Efficiency of concurrent code
Consistency and Coherence
Memory models
Data Races
Other topics
17
![Page 19: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/19.jpg)
Illustrating the problemC0 C1data = NEW;
flag = SET;
while(flag != SET){;}d = data;
C0 C1S1: Store data = NEW;
S2: Store flag = SET; L1: Load r1 = flag;
B1: if(r1!=SET) goto L1;
L2: Load r2 = data;
Final value of r2?
First guess:• The obvious answer is r2 = NEW• This assumes the hardware does not re-order load/store
operations.
In practice:• We cannot answer• It depends what re-ordering between operations the hardware
can do
18
![Page 20: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/20.jpg)
Illustrating the problemC0 C1S1: Store data = NEW;
S2: Store flag = SET; L1: Load r1 = flag;
B1: if(r1!=SET) goto L1;
L2: Load r2 = data;
Final value of r2?
First guess:
• The obvious answer is r2 = NEW
• This assumes the hardware does not re-order load/storeoperations.
In practice:
• We cannot answer
• It depends what re-ordering between operations the hardwarecan do
18
![Page 21: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/21.jpg)
Illustrating the problemC0 C1S1: Store data = NEW;
S2: Store flag = SET; L1: Load r1 = flag;
B1: if(r1!=SET) goto L1;
L2: Load r2 = data;
Final value of r2?
First guess:
• The obvious answer is r2 = NEW
• This assumes the hardware does not re-order load/storeoperations.
In practice:
• We cannot answer
• It depends what re-ordering between operations the hardwarecan do
18
![Page 22: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/22.jpg)
Illustrating the problemC0 C1S1: Store data = NEW;
S2: Store flag = SET; L1: Load r1 = flag;
B1: if(r1!=SET) goto L1;
L2: Load r2 = data;
Final value of r2?
First guess:
• The obvious answer is r2 = NEW
• This assumes the hardware does not re-order load/storeoperations.
In practice:
• We cannot answer
• It depends what re-ordering between operations the hardwarecan do
18
![Page 23: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/23.jpg)
Re-ordering memory accesses
A modern core might re-order memory accesses to differentaddresses in different ways:• Store-store reordering
I Non-FIFO write bufferI S1 misses while S2 hits the local cache
• Load-load reorderingI Out-of-order cores (dynamic execution) = Executes
instructions in an order governed by the availability of inputdata.
• Load-store/store-load reorderingI Out-of-order cores
Re-ordering is used to improve performance.
19
![Page 24: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/24.jpg)
Re-ordering memory accesses
If Store-store or Load-load reordering is allowed, the answer to theprevious question can be r2 6= NEW
The memory consistency model defines what behavior theprogrammer can expect and what optimizations might be used inhardware.
Note that with a single thread executing on one core, all thesere-orderings are fully safe.
Some consistency models:
• Sequential consistency
• Total store order
20
![Page 25: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/25.jpg)
Sequential Consistency (SC)
DefinitionThe result of any execution is the same as if the operations of allprocessors (cores) were executed in some sequential order, and theoperations of each individual processor (core) appear in thissequence in the order specified by its program. – L. Lamport
This boils down to two requirements:
• All operations of a core appear in program order from memorypoint of view• All write operations appear atomic
I A write is made visible to all cores at the same time
21
![Page 26: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/26.jpg)
Sequential Consistency (SC)
Second opLoad Store
First opLoad X XStore X X
Table: SC ordering rules. (What ordering is enforced for requests issuedby one core)
With SC, the answer to previous question is r2 = NEW
ProblemSome hardware optimizations do not work efficiently or arecomplex to implement with SC
22
![Page 27: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/27.jpg)
Sequential Consistency (SC)
Second opLoad Store
First opLoad X XStore X X
Table: SC ordering rules. (What ordering is enforced for requests issuedby one core)
With SC, the answer to previous question is r2 = NEW
ProblemSome hardware optimizations do not work efficiently or arecomplex to implement with SC
22
![Page 28: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/28.jpg)
Total Store Order (TSO)
DefinitionSame as SC except that:
• A load is allowed to return before an earlier store to adifferent location.
• A core is allowed to read its own writes before their are madevisible to all cores.
TSO appears to be the memory model of AMD and Intel x86architectures (no formal specification).
• It is the consequence of introducing write buffers
23
![Page 29: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/29.jpg)
Write Buffer
Each core has a write buffer where are stored pending writeoperations. It prevents the core from stalling if the core does nothave immediate read/write access to a cache line.
• TSO formalizes the behavior observed when write buffers areused
Second opLoad Store
First opLoad X XStore WB X
Table: TSO ordering rules
WB = No ordering guaranty except that most recent value shouldbe returned from write buffer if both operations are to the sameaddress.
24
![Page 30: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/30.jpg)
Write Buffer
Each core has a write buffer where are stored pending writeoperations. It prevents the core from stalling if the core does nothave immediate read/write access to a cache line.
• TSO formalizes the behavior observed when write buffers areused
Second opLoad Store
First opLoad X XStore WB X
Table: TSO ordering rules
WB = No ordering guaranty except that most recent value shouldbe returned from write buffer if both operations are to the sameaddress.
24
![Page 31: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/31.jpg)
Impact of TSO
C0 C1S1: Store data = NEW;
S2: Store flag = SET; L1: Load r1 = flag;
B1: if(r1!=SET) goto L1;
L2: Load r2 = data;
Final value of r2?
• r2 = NEW
• TSO behaves like SC for most programs
• You can assume SC except if you start designing some lowlevel synchronization
• Some architectures (ex: ARM) have a weaker memory model(no ordering enforced for store operations)
25
![Page 32: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/32.jpg)
Impact of TSO
C0 C1S1: Store data = NEW;
S2: Store flag = SET; L1: Load r1 = flag;
B1: if(r1!=SET) goto L1;
L2: Load r2 = data;
Final value of r2?
• r2 = NEW
• TSO behaves like SC for most programs
• You can assume SC except if you start designing some lowlevel synchronization
• Some architectures (ex: ARM) have a weaker memory model(no ordering enforced for store operations)
25
![Page 33: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/33.jpg)
TSO versus SCCore C0 Core C1
S1: Store x = NEW;
L1: Load r1 = y;
/* x=0, y=0 initially*/
S2: Store y = NEW;
L2: Load r2 = x;
Is the final result r1=0, r2=0 possible?
• With SC?
No
• With TSO?
Yes
FenceA memory fence (or memory barrier) can be used to force thehardware to follow program order.
S1: Store x = NEW;
FENCE
L1: Load r1 = y;
S2: Store y = NEW;
FENCE
L2: Load r2 = x;
26
![Page 34: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/34.jpg)
TSO versus SCCore C0 Core C1
S1: Store x = NEW;
L1: Load r1 = y;
/* x=0, y=0 initially*/
S2: Store y = NEW;
L2: Load r2 = x;
Is the final result r1=0, r2=0 possible?
• With SC? No
• With TSO?
Yes
FenceA memory fence (or memory barrier) can be used to force thehardware to follow program order.
S1: Store x = NEW;
FENCE
L1: Load r1 = y;
S2: Store y = NEW;
FENCE
L2: Load r2 = x;
26
![Page 35: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/35.jpg)
TSO versus SCCore C0 Core C1
S1: Store x = NEW;
L1: Load r1 = y;
/* x=0, y=0 initially*/
S2: Store y = NEW;
L2: Load r2 = x;
Is the final result r1=0, r2=0 possible?
• With SC? No
• With TSO? Yes
FenceA memory fence (or memory barrier) can be used to force thehardware to follow program order.
S1: Store x = NEW;
FENCE
L1: Load r1 = y;
S2: Store y = NEW;
FENCE
L2: Load r2 = x;
26
![Page 36: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/36.jpg)
TSO versus SCCore C0 Core C1
S1: Store x = NEW;
L1: Load r1 = y;
/* x=0, y=0 initially*/
S2: Store y = NEW;
L2: Load r2 = x;
Is the final result r1=0, r2=0 possible?
• With SC? No
• With TSO? Yes
FenceA memory fence (or memory barrier) can be used to force thehardware to follow program order.
S1: Store x = NEW;
FENCE
L1: Load r1 = y;
S2: Store y = NEW;
FENCE
L2: Load r2 = x;
26
![Page 37: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/37.jpg)
Another Example (1)
volatile int flag1 = 0;volatile int flag2 = 0;
void p1(void ∗ignored) {flag1 = 1;if (! flag2 ) { critical section 1 (); }
}
void p2 (void ∗ignored) {flag2 = 1;if (! flag1 ) { critical section 2 (); }
}
int main() {tid id = thread create(p1, NULL);p2();thread join ( id );
}
What does volatile
mean?
• No compileroptimization can beapplied to thisvariable (typicallybecause it it accessedby multiple threads)
Can both critical sectionsrun?
• with SC?
No
• with TSO?
Yes
• Peterson’s algorithmdoesn’t work as iswith TSO!
27
![Page 38: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/38.jpg)
Another Example (1)
volatile int flag1 = 0;volatile int flag2 = 0;
void p1(void ∗ignored) {flag1 = 1;if (! flag2 ) { critical section 1 (); }
}
void p2 (void ∗ignored) {flag2 = 1;if (! flag1 ) { critical section 2 (); }
}
int main() {tid id = thread create(p1, NULL);p2();thread join ( id );
}
What does volatile
mean?
• No compileroptimization can beapplied to thisvariable (typicallybecause it it accessedby multiple threads)
Can both critical sectionsrun?
• with SC?
No
• with TSO?
Yes
• Peterson’s algorithmdoesn’t work as iswith TSO!
27
![Page 39: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/39.jpg)
Another Example (1)
volatile int flag1 = 0;volatile int flag2 = 0;
void p1(void ∗ignored) {flag1 = 1;if (! flag2 ) { critical section 1 (); }
}
void p2 (void ∗ignored) {flag2 = 1;if (! flag1 ) { critical section 2 (); }
}
int main() {tid id = thread create(p1, NULL);p2();thread join ( id );
}
What does volatile
mean?
• No compileroptimization can beapplied to thisvariable (typicallybecause it it accessedby multiple threads)
Can both critical sectionsrun?
• with SC?
No
• with TSO?
Yes
• Peterson’s algorithmdoesn’t work as iswith TSO!
27
![Page 40: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/40.jpg)
Another Example (1)
volatile int flag1 = 0;volatile int flag2 = 0;
void p1(void ∗ignored) {flag1 = 1;if (! flag2 ) { critical section 1 (); }
}
void p2 (void ∗ignored) {flag2 = 1;if (! flag1 ) { critical section 2 (); }
}
int main() {tid id = thread create(p1, NULL);p2();thread join ( id );
}
What does volatile
mean?
• No compileroptimization can beapplied to thisvariable (typicallybecause it it accessedby multiple threads)
Can both critical sectionsrun?
• with SC? No
• with TSO?
Yes
• Peterson’s algorithmdoesn’t work as iswith TSO!
27
![Page 41: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/41.jpg)
Another Example (1)
volatile int flag1 = 0;volatile int flag2 = 0;
void p1(void ∗ignored) {flag1 = 1;if (! flag2 ) { critical section 1 (); }
}
void p2 (void ∗ignored) {flag2 = 1;if (! flag1 ) { critical section 2 (); }
}
int main() {tid id = thread create(p1, NULL);p2();thread join ( id );
}
What does volatile
mean?
• No compileroptimization can beapplied to thisvariable (typicallybecause it it accessedby multiple threads)
Can both critical sectionsrun?
• with SC? No
• with TSO? Yes
• Peterson’s algorithmdoesn’t work as iswith TSO!
27
![Page 42: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/42.jpg)
Another Example (1)
volatile int flag1 = 0;volatile int flag2 = 0;
void p1(void ∗ignored) {flag1 = 1;if (! flag2 ) { critical section 1 (); }
}
void p2 (void ∗ignored) {flag2 = 1;if (! flag1 ) { critical section 2 (); }
}
int main() {tid id = thread create(p1, NULL);p2();thread join ( id );
}
What does volatile
mean?
• No compileroptimization can beapplied to thisvariable (typicallybecause it it accessedby multiple threads)
Can both critical sectionsrun?
• with SC? No
• with TSO? Yes
• Peterson’s algorithmdoesn’t work as iswith TSO!
27
![Page 43: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/43.jpg)
Another Example (2)
volatile int data = 0;volatile int ready = 0;
void p1 (void ∗ignored) {data = 2000;ready = 1;
}
void p2 (void ∗ignored) {while (!ready)
;use (data);
}
int main () { ... }
Can use be called withvalue 0?
• with SC?
No
• with TSO?
No
28
![Page 44: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/44.jpg)
Another Example (2)
volatile int data = 0;volatile int ready = 0;
void p1 (void ∗ignored) {data = 2000;ready = 1;
}
void p2 (void ∗ignored) {while (!ready)
;use (data);
}
int main () { ... }
Can use be called withvalue 0?
• with SC? No
• with TSO?
No
28
![Page 45: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/45.jpg)
Another Example (2)
volatile int data = 0;volatile int ready = 0;
void p1 (void ∗ignored) {data = 2000;ready = 1;
}
void p2 (void ∗ignored) {while (!ready)
;use (data);
}
int main () { ... }
Can use be called withvalue 0?
• with SC? No
• with TSO? No
28
![Page 46: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/46.jpg)
Another Example (3)volatile int flag1=0;volatile int flag2=0;
int p1 (void){int f , g;flag1 = 1;f = flag1;g = flag2;return 2∗f + g;
}
int p2 (void){int f , g;flag2 = 1;f = flag2;g = flag1;return 2∗f + g;
}
int main () { ... }
Can both return 2?
• with SC?
No
• with TSO?
Yes
29
![Page 47: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/47.jpg)
Another Example (3)volatile int flag1=0;volatile int flag2=0;
int p1 (void){int f , g;flag1 = 1;f = flag1;g = flag2;return 2∗f + g;
}
int p2 (void){int f , g;flag2 = 1;f = flag2;g = flag1;return 2∗f + g;
}
int main () { ... }
Can both return 2?
• with SC? No
• with TSO?
Yes
29
![Page 48: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/48.jpg)
Another Example (3)volatile int flag1=0;volatile int flag2=0;
int p1 (void){int f , g;flag1 = 1;f = flag1;g = flag2;return 2∗f + g;
}
int p2 (void){int f , g;flag2 = 1;f = flag2;g = flag1;return 2∗f + g;
}
int main () { ... }
Can both return 2?
• with SC? No
• with TSO? Yes
29
![Page 49: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/49.jpg)
Agenda
Efficiency of concurrent code
Consistency and Coherence
Memory models
Data Races
Other topics
30
![Page 50: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/50.jpg)
Data races and race conditions
Data raceA data race is when two threads access the same memory location,at least one of these accesses is a write, and there is nosynchronization preventing the two accesses from occurringconcurrently.
Race conditionA race condition is a flaw in a program that occurs because of thetiming or ordering of some events.
• A data race may lead to a race condition but not always
• A race condition might be due to a data race but not always
• Peterson’s synchronization algorithm is an example wheredata races do not lead to race conditions
31
![Page 51: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/51.jpg)
Data races and race conditions
Data raceA data race is when two threads access the same memory location,at least one of these accesses is a write, and there is nosynchronization preventing the two accesses from occurringconcurrently.
Race conditionA race condition is a flaw in a program that occurs because of thetiming or ordering of some events.
• A data race may lead to a race condition but not always
• A race condition might be due to a data race but not always
• Peterson’s synchronization algorithm is an example wheredata races do not lead to race conditions
31
![Page 52: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/52.jpg)
Data races and race conditions
Transfer_Money(amount, account_from, account_to){
if(account_from.balance < amount){
return FAILED;
}
account_to.balance += amount;
account_from.balance -= amount;
return SUCCESS;
}
• Data race?
• Race condition?
32
![Page 53: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/53.jpg)
Data races and race conditionsTransfer_Money(amount, account_from, account_to){
if(account_from.balance < amount){
return FAILED;
}
account_to.balance += amount;
account_from.balance -= amount;
return SUCCESS;
}
• Data race? Yes (updates of balance)
• Race condition? Yes (balance can get below 0)
32
![Page 54: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/54.jpg)
Data races and race conditions: New try
Transfer_Money(amount, account_from, account_to){
mutex1.lock();
val=account_from.balance;
mutex1.unlock();
if(val < amount){
return FAILED;
}
mutex2.lock();
account_to.balance += amount;
mutex2.unlock();
mutex1.lock();
account_from.balance -= amount;
mutex1.unlock();
return SUCCESS;
}
• Data race?
• Race condition?
32
![Page 55: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/55.jpg)
Data races and race conditions: New tryTransfer_Money(amount, account_from, account_to){
mutex1.lock();
val=account_from.balance;
mutex1.unlock();
if(val < amount){
return FAILED;
}
mutex2.lock();
account_to.balance += amount;
mutex2.unlock();
mutex1.lock();
account_from.balance -= amount;
mutex1.unlock();
return SUCCESS;
}
• Data race? No
• Race condition? Yes (balance can get below 0)
32
![Page 56: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/56.jpg)
Agenda
Efficiency of concurrent code
Consistency and Coherence
Memory models
Data Races
Other topics
33
![Page 57: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/57.jpg)
Deadlocks (with condition variables)
mutex t m1, m2;cond t c1;
void p1 (void ∗ignored) {lock (m1); lock (m2);while(!ready){wait(c1,m2);}/∗ do something ∗/unlock (m2); unlock (m1);
}void p2 (void ∗ignored) {
lock (m1); lock (m2);/∗ do something∗/signal (c1)unlock (m2);unlock (m1);
}
One lesson: Dangerous to hold locks when crossing abstractionlimits!
• e.g., lock(a) then call function that uses condition variable
34
![Page 58: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/58.jpg)
Dealing with deadlocks
PreventionEliminating one of the conditions:
• Non-blocking (wait-free) algorithms
• Wait on all resources at once
• Use optimistic concurrency control (transactions)
• Always lock resources in the same order
AvoidancePrevent the system from entering an unsafe state (requiresknowing a priori the resource requirements of each process)
35
![Page 59: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/59.jpg)
Dealing with deadlocks
Detection + corrective actionProblem: What corrective action?
• Process termination
• Rollback (Possible? Cost?)
Ignore the problem
Solution chosen by most OSes (including UNIX systems)
• Assume it occurs rarely enough to avoid the need for acomplex solution
Read “Operating Systems: Three Easy Pieces”, chapter 32 for a moredetailed discussion on deadlocks
36
![Page 60: Operating Systems - Thread Synchronization: Advanced Topics](https://reader033.vdocuments.site/reader033/viewer/2022041609/6252f1801b829c568d444153/html5/thumbnails/60.jpg)
Concurrent algorithms without locks
Non-blocking concurrent algorithms do not rely on locks.
Progress condition
• Lock freedom: If the algorithm runs long enough, it isguaranteed that some operations will finish.I Note that with locks, there is no such guarantee: if the thread
holding the lock crashes, no progress anymore.
• Wait freedom: Each operation takes a finite number of stepsto complete.I Newcomers need to help older requests before executing their
own operation
These algorithms strongly rely on compare and swap() operations.
37