distributed shared memory and sequential consistency
TRANSCRIPT
![Page 1: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/1.jpg)
Distributed Shared Memory and Sequential Consistency
![Page 2: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/2.jpg)
Outline
Consistency Models
Memory Consistency Models
Distributed Shared Memory
Implementing Sequential Consistency in Distributed Shared Memory
![Page 3: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/3.jpg)
ConsistencyThere are many aspects for consistency. But remember that the consistency is the way for the people to reason about the systems. (What behavior should be considered as “correct” or “suitable”.)
Consistency model is considered to be the constrains of a system that can be observed by the outside of the system.
Consistency problems raised in many applications in distributed system including DSM(distributed shared memory), multiprocessors with shared memory(called as memory model), and replicas stored on multiple servers.
![Page 4: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/4.jpg)
Examples for consistencyMemory:◦ step1: write x=5; step2: read x;◦ step2 of read x should return 5 as the read operation is following the write
operation and should reveal the write effectiveness. This is single object consistency and also called as “coherence”.
Database: Bank Transaction◦ (transfer 1000 from acctA to acctB)◦ ACT1: acctA=accA+1000; ACT2: accB=accB=1000; ◦ acctA+acctB should be kept as the same. Any internal state should not be
seen from the outside.
Replica in Distributed System◦ All the replicas for the same data should be the same despite the network
or server problems.
![Page 5: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/5.jpg)
Consistency ChallengesNo right or wrong consistency models. Often it is the art of tradeoff between ease of programmability and efficiency.
There is no consistency problem when you are using one thread to read or write data as the read will always reveal the result of the most recent write.
Thus, consistency problem raises while dealing with concurrent accessing on either single object or multiple objects.
Pay attention that this might be less obvious than you though before.
We will focus on building a distributed shared memory system.
![Page 6: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/6.jpg)
Many systems involve consistencyMany systems have storage/memory with concurrent readers and writers, all these systems will face the consistency problems.◦ Multiprocessors, databases, AFS, lab extent server, lab YFS
You often want to improve in ways that risk changing behavior:◦ add caching◦ split over multiple servers◦ replication for fault tolerance
How can we figure out that such optimizations are “correct”?
We need a way to think about correct execution of distributed programs. Most of these ideas from multiprocessors (memory models) and databases (transactions) 20/30 years ago.
The following discussion is focused on the correctness and efficiency, not fault-tolerance.
![Page 7: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/7.jpg)
Distributed Shared Memory
Multiple processes connect to the virtually shared memory. The virtual shared memory might be physcially located in distributed hosts connected by a network.
So, how to implement a distributed shared memory system?
![Page 8: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/8.jpg)
Naive Distributed Shared Memory
Each machine has a local copy of all memory (mem0, mem1, mem2 should be kept as the same)
Read: from local memory
Write: send update message to each other host (but don’t wait)
This is fast because the processes never waits for communication
Does this memory work well?
![Page 9: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/9.jpg)
M0:
v0=f0();
done0=1;
M1:
while(done0==0) ;
v1=f1(v0);
done1=1;
M2:
while(done1==0);
v2=f2(v0,v1);
Intuitive intent: M2 should execute f2() with results from M0 and M1, waiting for M1 implies waiting for M0.
Example1:
![Page 10: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/10.jpg)
Will the naive distributed memory work for example 1?
Problem A
M0’s writes of v0 and done 0 may be interchanged by network leaving v0 unset but done0=1
how to fix?
![Page 11: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/11.jpg)
Will naive distributed memory work for example 1?
Problem B
M2 sees M1’s writes before M0’s writes i.e. M2 and M1 disagree on order of M0 and M1 writes.
How to fix?
![Page 12: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/12.jpg)
Naive distributed memory is fastBut has unexpected behavior
Maybe it is not “correct”?
maybe we should never have expected example 1 to work.
So?
How can we write correct distributed programs with shared storage?◦ Memory system promises to behave according to certain rules.◦ We write programs assuming those rules.◦ Rules are a “consistency model”◦ This is the contract between memory system and programmer.
![Page 13: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/13.jpg)
What makes a good consistency model?There is no “right” or “wrong” consistency models.
A model may make it harder to program but with good efficiency.
A model may make it easier to program but with bad performance.
Some consistency model may output astonishing results.
Applications might use different kinds of memory models such as Web pages or shared memory according the types of applications.
![Page 14: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/14.jpg)
Strict ConsistencyDefine the strict consistency?◦ Suppose we can tag each operation with a timestamp (global time).◦ Suppose each operation can complete instantaneous.
Thus:◦ A read returns the results of the most recently written value.◦ This is what uniprocessors support.
![Page 15: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/15.jpg)
Strict ConsistencyThis follows the strict consistency:◦ a=1;a=2;print a; always produce the value of a (2)
Is this strict consistency?◦ P0: w(x) 1◦ P1: r(x)0 r(x)1
Strict consistency is a very intuitive consistency model.
So, would strict consistency avoid problem A and B?
![Page 16: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/16.jpg)
Implementation of Strict Consistency Model
How is R@2 aware of W@1?
How does W@4 know to pause until R@3 has finished? How long to wait?
This is too hard to implement.
![Page 17: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/17.jpg)
Sequential ConsistencySequential consistency (serializability): the results are the same as if operations from different processors are interleaved, but operations of a single processor appear in the order specified by the program
Example of sequentially consistent execution (Not strictly consistency as it violate the physical time effectiveness) :
P1: W(x)1
P2: R(x)0 R(x)1
Sequential consistency is inefficient: we want to weaken the model further
![Page 18: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/18.jpg)
What sequential consistency implies?Sequential consistency defines a total order of operations:◦ Inside each machine, the operations (instructions) appear in-order in the
total order (and defined by the program). The results will be defined by the total order.
◦ All machines see results consistent with the total order (all machines agree the operation order that applied to the shared memory). All reads see most recent write in the total order. All machines see the same total order.
Sequential Consistency has better performance than strict consistency◦ System has some freedom in how to interleave different operations from
different machines.◦ Not forced to order by operation time (as in strict consistency model) and
can delay a read or write while it finds current values.
![Page 19: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/19.jpg)
Problem A and B in sequential consistencyProblem A ◦ M0's execution order was v0= done0= ◦ M1 saw done0= v0=
Each machine's operations must appear in execution order, so this cannot happen with sequential consistency.
Problem B◦ M1 saw v0= done0= done1= ◦ M2 saw done1= v0=
This cannot occur given a single total order, so this cannot happen with sequential consistency.
![Page 20: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/20.jpg)
The performance bottleneckOnce a machine’s write completes, other machines’ reads must see the new data.
Thus communication cannot be omitted or much delayed.
Thus either reads or writes (or both) will be expensive.
![Page 21: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/21.jpg)
The implementation of sequential consistency
Using a single server. Each machines will send the read/write operations to the server and queued.
(The operations should be sent in order by the corresponding machine and should be queued in that order.)
The server picks order among waiting operations.
Server executes one by one, sending back the replies.
![Page 22: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/22.jpg)
Performance problem of the simple implementationSingle server will soon get overloaded.
No local cache! all operations will wait replies from the server. (This is severe performance killer for multicore processors)
So:◦ Partition memory across multiple servers to eliminate single-sever bottleneck.◦ Can serve many machines in parallel if they don’t use same memory
Lamport paper from 1979 shows that a system is sequential consistent if:◦ 1. each machine executes on operation at a time, waiting for it to complete.◦ 2. executes operations on each memory location at a time i.e. you can have
lots of independent machines and memory systems.
![Page 23: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/23.jpg)
Distributed shared memoryIf a memory location is not written, it can be replicated i.e. cache it on each machine so that reads are fast.
But we have to ensure that reads and writes are ordered◦ Once the write modifies the location, no read should return the old value.◦ Must revoke cached copies before writing.◦ This delays writes to improve read performance.
![Page 24: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/24.jpg)
IVY: memory coherence in shared virtual memory systems (Kai Li and Paul Hudak)
![Page 25: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/25.jpg)
IVY: Distributed Shared MemoryIVY: connect multiple desktop / server together through LAN and provide the illusion of super power machine.
A single power machine: single machine with shared memory and all CPUs are visible to the applications.
Applications can use the concepts of multi-thread programming and harnessing the power of many machines!
Applications do not need make explicit communication. (different from MPI)
![Page 26: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/26.jpg)
Page operationsIVY operates on pages of memory, stored in machine DRAM (no memory server, different from the single server implementation)
Uses VM (virtual memory) hardware to intercept reads/writes.
Let’s build the IVY system step by step.
![Page 27: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/27.jpg)
Simplified IVYOnly one copy of a page at a time (on only one machine)
All other copies marked invalid in VM tables
If M0 faults PageX (either read or write)
Fine the one copy e.g. in M1
Invalidate PageX in M1
Move PageX to M0
M0 marks the page R/W in VM tablesProvide sequential consistency: order of reads/writes can be set by order in which page moves.Slow: think about the applications perform many reads without any write, the mechanism require many faults and page move
![Page 28: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/28.jpg)
Multiple reads in IVYIVY allows multiple reader copies between writes.
No need to force an order for reads that occur between two writes.
IVY put a copy of the page at each reader thus the reads can performed concurrently.
![Page 29: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/29.jpg)
IVY core strategyEither:◦ multiple read-only copies and no writeable copies,◦ or one writeable copy, no other copies
Before write, invalidate all other copies,
Must track one writer (owner) and copies (copy_set)
![Page 30: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/30.jpg)
Why crucial to invalidate all copies before write?Once a write completes, all subsequent reads *must* see new data. Otherwise it might be possible that different machine will see the different order.
If one could read stale data, this could occur:
M0: wv=0 wv=99 wdone=1
M1: rv=0 rdone=1 rv=0
But we know that can't happen with sequential consistency.
![Page 31: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/31.jpg)
IVY ImplementationManager: the process to manage the relationship between page and its owner. Manager acts like a map to help the process to find the corresponding page. In IVY, manager can be either fixed or dynamic.
Owner: the owner of a page has the write privilege and all other processes have the read-only privilege.
copy_set: store the information about the copies for a specific page. If the page is read-only, the copy_set indicates the copies of the page together with the location of the copies. If the page is writable, the copy_set contain only one entry i.e. the owner.
![Page 32: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/32.jpg)
IVY MessagesRQ (read query, reader to MGR)
RF (read forward, MGR to owner)
RD (read data, owner to reader)
RC (read confirm, reader to MGR)
WQ (write query, writer to MGR)
IV (invalidate, MGR to copy_set)
IC (invalidate confirm, copy_set to MGR)
WF (write forward, MGR to owner)
WD (write data, owner to writer)
WC (write confirm, writer to MGR)
![Page 33: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/33.jpg)
Scenariosscenario 1: M0 has writeable copy, M1 wants to read◦ 0. page fault on M1, since page must have been marked invalid ◦ 1. M1 sends RQ to MGR ◦ 2. MGR sends RF to M0, MGR adds M1 to copy_set ◦ 3. M0 marks page as access=read, sends RD to M1 ◦ 4. M1 marks access=read, sends RC to MGR
scenario 2: now M2 wants to write ◦ 0. page fault on M2 ◦ 1. M2 sends WQ to MGR ◦ 2. MGR sends IV to copy_set (i.e. M1) ◦ 3. M1 sends IC msg to MGR ◦ 4. MGR sends WF to M0, sets owner=M2, copy_set={} ◦ 5. M0 sends WD to M2, access=none ◦ 6. M2 marks r/w, sends WC to MGR
![Page 34: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/34.jpg)
34
CPU0
CPU1
CPU2 / MGR
lock access
owner?
lock access
owner?
lock access
owner?
lock copy_set
owner
ptable info
ptable (all CPUs)access: R, W, or nilowner: T or F
info (MGR only)copy_set: list of CPUs
with read-only copiesowner: CPU that can
write page
![Page 35: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/35.jpg)
35
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F nil F
…
lock access
owner?
F nil F
…
lock access
owner?
F W T
…
lock copy_set
owner
F {} CPU0
…ptable info
read
![Page 36: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/36.jpg)
36
CPU0
CPU1
CPU2 / MGR
lock access
owner?
T nil F
…
lock access
owner?
F nil F
…
lock access
owner?
F W T
…
lock copy_set
owner
F {} CPU0
…ptable info
read
RQ
![Page 37: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/37.jpg)
37
CPU0
CPU1
CPU2 / MGR
lock access
owner?
T nil F
…
lock access
owner?
F nil F
…
lock access
owner?
F W T
…
lock copy_set
owner
T {} CPU0
…ptable info
read
RQ
![Page 38: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/38.jpg)
38
CPU0
CPU1
CPU2 / MGR
lock access
owner?
T nil F
…
lock access
owner?
F nil F
…
lock access
owner?
F W T
…
lock copy_set
owner
T {CPU1}
CPU0
…ptable info
read
RQ
RF
![Page 39: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/39.jpg)
39
CPU0
CPU1
CPU2 / MGR
lock access
owner?
T nil F
…
lock access
owner?
F nil F
…
lock access
owner?
T W T
…
lock copy_set
owner
T {CPU1}
CPU0
…ptable info
read
RQ
RF
![Page 40: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/40.jpg)
40
CPU0
CPU1
CPU2 / MGR
lock access
owner?
T nil F
…
lock access
owner?
F nil F
…
lock access
owner?
T R T
…
lock copy_set
owner
T {CPU1}
CPU0
…ptable info
read
RQ
RF
RD
![Page 41: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/41.jpg)
41
CPU0
CPU1
CPU2 / MGR
lock access
owner?
T nil F
…
lock access
owner?
F nil F
…
lock access
owner?
F R T
…
lock copy_set
owner
T {CPU1}
CPU0
…ptable info
read
RQ
RF
RD
RC
![Page 42: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/42.jpg)
42
CPU0
CPU1
CPU2 / MGR
lock access
owner?
T R F
…
lock access
owner?
F nil F
…
lock access
owner?
F R T
…
lock copy_set
owner
T {CPU1}
CPU0
…ptable info
read
RQ
RF
RD
RC
![Page 43: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/43.jpg)
43
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F R F
…
lock access
owner?
F nil F
…
lock access
owner?
F R T
…
lock copy_set
owner
T {CPU1}
CPU0
…ptable info
read
RQ
RF
RD
RC
![Page 44: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/44.jpg)
44
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F R F
…
lock access
owner?
F nil F
…
lock access
owner?
F R T
…
lock copy_set
owner
F {CPU1}
CPU0
…ptable info
read
RQ
RF
RD
RC
![Page 45: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/45.jpg)
45
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F R F
…
lock access
owner?
F nil F
…
lock access
owner?
F R T
…
lock copy_set
owner
F {CPU1}
CPU0
…ptable info
write
![Page 46: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/46.jpg)
46
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F R F
…
lock access
owner?
T nil F
…
lock access
owner?
F R T
…
lock copy_set
owner
F {CPU1}
CPU0
…ptable info
write
WQ
![Page 47: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/47.jpg)
47
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F R F
…
lock access
owner?
T nil F
…
lock access
owner?
F R T
…
lock copy_set
owner
T {CPU1}
CPU0
…ptable info
write
WQ
IV
![Page 48: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/48.jpg)
48
CPU0
CPU1
CPU2 / MGR
lock access
owner?
T nil F
…
lock access
owner?
T nil F
…
lock access
owner?
F R T
…
lock copy_set
owner
F {CPU1}
CPU0
…ptable info
write
WQ
IVIC
![Page 49: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/49.jpg)
49
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F nil F
…
lock access
owner?
T nil F
…
lock access
owner?
F R T
…
lock copy_set
owner
T {} CPU0
…ptable info
write
WQ
IVIC
WF
![Page 50: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/50.jpg)
50
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F nil F
…
lock access
owner?
T nil F
…
lock access
owner?
T nil F
…
lock copy_set
owner
T {} CPU0
…ptable info
write
WQ
IVIC
WF
WD
![Page 51: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/51.jpg)
51
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F nil F
…
lock access
owner?
T W T
…
lock access
owner?
F nil F
…
lock copy_set
owner
T {} CPU0
…ptable info
write
WQ
IVIC
WF
WD
WC
![Page 52: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/52.jpg)
52
CPU0
CPU1
CPU2 / MGR
lock access
owner?
F nil F
…
lock access
owner?
F W T
…
lock access
owner?
F nil F
…
lock copy_set
owner
T {} CPU2
…ptable info
write
WQ
IVIC
WF
WD
WC
![Page 53: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/53.jpg)
What if Two CPUS Want to Write to Same Page at Same Time?Write has several steps, modifies multiple tables.
Invariants for tables:◦ MGR must agree with CPUs about single owner◦ MGR must agree with CPUs about copy_set◦ copy_set != {} must agree with read-only for owner
Write operation should thus be atomic!
What enforces atomicity?
![Page 54: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/54.jpg)
What if there were no RC message?MGR unlocked after sending RF?◦ could RF be overtaken by subsequent WF?◦ or does IV/IC+ptable[p].lock hold up any subsequent RF? but invalidate
can't acquire ptable lock -- deadlock?
no IC?◦ i.e. MGR didn't wait for holders of copies to ack?
no WC?◦ e.g. MGR unlocked after sending WF to M0? MGR would send subsequent
RF, WF to M2 (new owner) What if such a WF/RF arrived at M2 before WD? No problem! M2 has ptable[p].lock locked until it gets WD RC + info[p].lock prevents RF from being overtaken by a WF so it's not clear why WC is needed! but I am not confident in this conclusion.
![Page 55: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/55.jpg)
Does IVY provide strict consistency?no: MGR might process two Ws in order opposite to issue time
no: W may take a long time to revoke read access on other machines ◦ so Rs may get old data long after the W issues.
![Page 56: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/56.jpg)
Performance In what situations will IVY perform well?
1. Page read by many machines, written by none
2. Page written by just one machine at a time, not used at all by others
Cool that IVY moves pages around in response to changing use patterns
![Page 57: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/57.jpg)
What about the page size?Will page size of e.g. 4096 bytes be good or bad?
good if spatial locality, i.e. program looks at large blocks of data
bad if program writes just a few bytes in a page◦ subsequent readers copy whole page just to get a few new bytes
bad if false sharing ◦ i.e. two unrelated variables on the same page and at least one is frequently
written page will bounce between different machines◦ even read-only users of a non-changing variable will get invalidations◦ even though those computers never use the same location
![Page 58: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/58.jpg)
DiscussionsWhat about IVY's performance?◦ after all, the point was speedup via parallelism
What's the best we could hope for in terms of performance?◦ Nx faster on N machines
What might prevent us from getting Nx speedup?◦ Network traffic (moving lots of pages)◦ locks◦ Many machines writing the same page◦ application is inherently non-scalable
![Page 59: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/59.jpg)
SummaryMust exist total order of operations such that:◦ All CPUs see results consistent with that total order (i.e.,
LDs see most recent ST in total order)◦ Each CPU’s instructions appear in order in total order
Two rules sufficient to implement sequential consistency [Lamport, 1979]:◦ Each CPU must execute reads and writes in program
order, one at a time◦ Each memory location must execute reads and writes in
arrival order, one at a time
![Page 60: Distributed Shared Memory and Sequential Consistency](https://reader031.vdocuments.site/reader031/viewer/2022032311/56649dc85503460f94abe88a/html5/thumbnails/60.jpg)
Thank you! Any Questions?
Click icon to add picture