chap 4

Chapter-IVDistributed Shared Memory

(DSM)

Introduction Basically there are TWO IPC paradigms in DOS1. Message passing and RPC2. Distributed Shared Memory (DSM)

The DSM provides to processes in a system with a shared address space Processes use this address space in the same way as they use local memory Primitives used are 1. data= Read(address)2. Write(address, data) The DSM in tightly coupled system is natural. In loosely coupled systems there is no physically shared memory available to

support DSM paradigm.

The term DSM refers to the shared-memory paradigm applied to loosely coupled distributed-memory systems (Virtual Memory)

DSM Block diagram

CPU 1

CPU2

Memory

Memory mapping manager

CPU 1

CPU2

Memory


CPU 1

CPU2

Memory


Distributed Shared Memory

(Virtual)

Communication Network

Node-1 Node-2 Node-n

(Architecture)

DSM architecture DSM provides a virtual address space shared among processes on loosely

coupled processors DSM is basically an abstraction that integrates the local memory of

different machines in a network environment into a single logical entity shared by cooperating processes executing on multiple sites.

The shared memory exists only virtually, hence it is also called (DSVM) In DSM each node of the system has one or more CPUs and a memory unit. The nodes are connected by a high-speed communication network. The DSM abstraction presents a large shared-memory space to the

processors of all nodes. A software memory-mapping manager routine in each node maps the local

memory onto the shared virtual memory. To facilitate the mapping operation, the shared-memory space is portioned

into blocks. The idea of data caching is used to reduce network latency The main memory of individual nodes is used to cache blocks of the

shared-memory space.

DSM operation when a process on a node wants to access some data from a

memory block of the shared memory space1. The local memory-mapping manager takes charge of its request.2. If the memory block containing the accessed data is resident in the local

memory, the request is satisfied.3. Otherwise, a network block fault is generated and the control is passed

to the operating system.4. The OS then sends a message to the node on which the desired memory

block is located to get the block.5. The missing block is migrated from the remote node to the client

process’s node and the OS maps it into the application address space.6. Data blocks keep migrating from one node to another on demand basis,

but no communication is visible to the user processes.7. Copies of data cached in local memory eliminate network traffic.8. DSM allows replication/migration of data blocks

Design and Implementation issues of DSM1. Granularity: Refers to the block size of a DSM system Unit of sharing data or unit of data transfer across network. Proper block size explores the granularity of parallelism and the amount

of network traffic generated by network block faults.2. Structure of shared-memory space Refers to the layout of the shared data in memory. Depending on the type of application that the DSM system is intended.3. Memory coherence and access synchronization In DSM system, the replication/copies of shared data is available in all

nodes, coherence/consistency of accessing of data in particular node is very difficult.

Concurrent access of shared data requires synchronization primitives such as semaphores, event count, lock etc.

4. Data location and access To share data in DSM, it is necessary to locate and retrieve the data

accessed by a user process.

5. Replacement strategy If the local memory of a node is full, the data block of local memory is

replaced with the shared data block. Cache replacement strategy is necessary in DSM

6. Thrashing In a DSM system, data blocks migrate between nodes on demand. If two nodes compete for write access to a single data item, the

corresponding data block is forced to transfer back and forth at such a high rate that no real work can get done.

The DSM which use a policy to avoid this situation is called thrashing.

7. Heterogeneity The DSM system is designed to take care of the heterogeneity so that it

functions properly with machines with different architectures.

Granularity Factors influencing block size selection1. Paging overhead Shared-memory programs provide locality of reference, a process is likely to

access a large region of its shared address space in a small amount of time. Paging overhead is less for large block sizes as compared to small blocks.2. Directory size Larger the block size, the smaller the directory Reduces directory management for larger block sizes.3. Thrashing Thrashing problem may occur with any block size Different regions in the same block may be updated by processes on different

nodes Easy to implement with large block sizes4. False sharing It occurs when two different processes access two unrelated variables that

reside in the same data block. In this situation, even though the original variables are not shared, the data

block appears to be shared by two processes.

The larger is the block size, the higher is the probability of false sharing. False sharing of a block may lead to a thrashing problem.

Using page size as block size A suitable compromise in granularity

adopted in DSM is to use the typical pagesize of a virtual memory as the block size.

Advantages:1. It allows the use of existing page-fault schemes.2. Memory coherence problem may be resolved3. It allows the access control to be readily integrated into the functionality of

the memory management unit.4. As long as page can fit into a packet, page size( data block) do not impose

undue communication overhead.5. Memory contention is resolved.

P1 access data in this area

P2 access data in this area

p1

p2

Data block

False sharing

Structure of shared-memory space It defines the abstract view of the shared-memory space to be represented to

the application programmers of a DSM system It can be viewed as a storage of data objects.Three approaches used for structuring shared-memory space1. No structuring : It is simply a linear array of words It chooses suitable page size as the unit of sharing and fixed grain size is used

for all applications. Simple and easy to design and implement with any data structure.2. Structure by data byte Memory space is structured as a collection of data objects The granularity defines the size of objects or variable. Complicates the design and implementation.3. Structure as a data base The shared memory space is ordered as an associative memory Memory is addressed as content rather name or address (Tuple space) Processes select tuples by specifying the number of their fields and values or

types Access to shared data is Nontransparent

• Consistency Models Refers to the degree of consistency that has to be maintained for the

shared-memory data to a set of applications.1. Strict consistency model Strongest form of memory coherence Stringent consistency requirement. If the value returned by a read operation on a memory address is always

same as the value written by the most recent write operation to that address, irrespective of the locations of the processes performing read/write operations.

All write operations are visible to all processes.2. Sequential consistency models If all processes see the same order of all memory access operations on the

shared memory. No new memory operation is started until all the previous ones have been

completed. Uses single-copy(one-copy) semantics Acceptable consistency model for Distributed applications

3. Causal consistency model A memory reference operation (read/write) is said to be potentially

causally related to another memory reference operation if the first one might have been influenced in any way by the second one.

Eg: Read operation followed by write operation Write operations are not potentially causally related (w1, w2) It keep tracks of which memory reference operation is dependent on

which other memory reference operations.4. Pipelined Random-Access memory consistency model (PRAM) It ensures that all write operations performed by a single process are

seen by all other processes in the order they were performed as if all the write operations performed by a single process are in a pipeline

It can be implemented by simply sequencing the write operations performed at each node independently.

All write operations of a single process are pipelined Simple, easy to implement and has good performance.

5. Processor consistency model It ensures that all write operations performed on the same memory location

( no matter by which process they are performed) are seen by all the processes in the same order.

Enhances memory coherence- For any memory location all processes agree on the same order of all write operations to that location.

6. Weak consistency model It is not necessary to show the change in memory done by every write

operation to other processes. The results of several write operations can be combined and sent to other

processes only when they needed it. Isolated accesses to shared variables are rare. Very difficult to show and keep track of the changes at time to time It uses a special variable called a synchronization variable (SV)for memory

synchronization (visible to all processes)Conditions to access synchronization variable (SV)1. All accesses to synchronization variables must obey sequential consistency2. All previous write operations must be completed before accessing SV.3. All previous accesses to SV must be completed before accessing any other

variables.

7. Release consistency model All changes made to the memory by the process are propagated to

other nodes. All changes made to the memory by other processes are propagated

from other nodes the process’s node. Uses two synchronization variables (Acquire and Release) Acquire-used by a process to tell the system about entering the critical

section. Release- used by a process to tell about exit from critical section. Release consistency model uses synchronization mechanism based on

barriers. It defines the end of a phase of execution of a group of concurrent

processes before any process is allowed to proceed.Conditions for Release consistency model1. All accesses to acquire and release must obey the processor consistency2. All previous acquires by a process must be completed before memory

accessing by other processes.3. All data access by a process must be completed before accessing release.

Implementing Sequential consistency model• Most commonly used consistency model in DSM• Protocols depends on whether the DSM system allows replication/migration of

shared-memory blocks• Different strategies are

1. Non-Replicated, Non-Migrating blocks (NRNMBs)2. Non-Replicated, Migrating blocks (NRMBs)3. Replicated, Migrating blocks (RMBs)4. Replicated, Non-Migrating blocks (RNMBs)

1. Non-Replicated, Non-Migrating blocks (NRNMBs) Simplest strategy Each block of the shared memory has a single copy whose location is always fixed. All access requests to a block from any node are sent to the owner node of the

block, which has the only copy of the block. On receiving a request from a client node, the MMU and OS of the owner node

return a response to the client. Serializing data access creates a bottle neck Parallelism is not possible there is a singe copy of each block in the system The location of the block never changes

Client node

Owner node

Request

Response

2. Non-Replicated, Migrating blocks Each block of the shared memory has a single copy in the entire system,

however, each access to a block causes the block to migrate from its current node to the node from where it is accessed.

The owner node of a block changes as soon as the block is migrated to a new node.

High locality of reference It is prone to thrashing Parallelism is not possible

Data locating in NRMB strategy There is a single copy of each block and the location of a block

changing dynamically1. broadcasting: Each node maintains an owned blocks table that contains

an entry for each block for which the node is the current owner.

Client node

Owner node

Block request

Block migration

Block address

(Node)Current owner

Block address

(Node)Current owner

Node-1 Node-m

Node boundary

2. centralized-server algorithm A centralized server maintains a block table that contains the location

information for all blocks in the shared memory space The centralized sever extracts the location information and forwards it

to the requested node. Failure of server cause the DSM to stop functioning.

The location and identity of centralized server is well known to all nodes

Block address

Owner node changes dynamically

Contains an Entry for each block

Block table

Node boundary

Node boundary

Node-1 Node-m

3. Fixed distributed-server algorithm It is a direct extension of the centralized server scheme It overcomes the problems of the centralized server scheme by

distributing the role of centralized server. It has a block manager on several nodes, and each block manager is

given a predetermined subset of data blocks to manage.


Block address

owner node (dynamic)

Owner node(dynamic)

Block address


Owner node(dynamic)

Block address


Node-1Block table

Node-2Block table

Node-mBlock table

Node boundary Node boundary

4. Dynamic distributed-server algorithm It does not use any block manager and attempts to keep track of the

ownership information of all blocks in each node. Each node has a block table that contains the ownership information

of all blocks (probable owner) This field gives the node a hint on the location of the owner of a

block

Block address

Block address

Block address

probable node (dynamic)






Node-1Block table

Node-2Block table

Node-mBlock table

3. Replicated Migrating blocks To increase parallelism, all DSM systems replicate blocks With replicated blocks, read operations can be carried out in parallel with

multiple nodes, the average cost of read operation is reduced. However, replication tends to increase the cost of write operations,

because for a write to a block all its replicas must be invalidated or updated to maintain consistency.

Replication complicates the memory coherence protocolBasically there are two protocols for enhancing sequential consistency1. Write-invalidate

In this scheme, all copies of a piece of data except one are invalidated before write operation can be performed on it.

When a write fault occurs, its fault handler copies the accessed block from one of the block’s to its own node and invalidates all other copies by sending a invalidate message.

The write operation is performed on own node The own node holds the modified version of block and is replicated to

other nodes.

Write-Invalidate

Client node

Nodes having valid copies of the data before write operation

3. Invalidate block

1. Request block

2. Replicate block

3. Invalidate block

3. Invalidate block

Node-1

Node-2

Node-m

Has the valid copy of the data block after write operation

2. write-update A write operation is carried out by updating all copies of the

data on which the write is performed. When a write fault occurs at a node, the fault handler copies

the accessed block from one of the block’s current nodes to its own node.

Updates all copies of the block by performing the write operation on the local copy .

Send the address of the modified memory location and its new value to the nodes having a copy of the block.

The write operation completes only after all copies of the block have been successfully updated.

After a write operation completes, all the nodes that had a copy of the block before the write also have a valid copy of the block after the write.

Write- Update

Client node

Nodes having valid copies of the data before and after write operation

3. Update block

1. Request block

2. Replicate block

3. Update block

3. Update block

Node-1

Node-2

Node-m

Has the valid copy of the data block after write operation

Global sequencing mechanism Use a global sequencer to sequence the write operations to all the nodes. The intended modification of each write operation is first sent to the

global sequencer. The global sequencer multicasts the modification to all the nodes where a

replica of data block is located.

Client nodeGlobalSequencer

Replica of data

Modification

Sequenced modification




Nodes

1

2

n

Data locating in RMB strategy The following issues are involved in the write-invalidate protocol 1. Locating the owner of a block2. Keeping track of the nodes that currently have a valid copy of the block

1. Broadcasting Each node has an owned blocks table The block table of a node has an entry for each block for which the node is

the owner The table has a copy-set field that contains a list of nodes that currently

have a valid copy of the corresponding block

Block address(dynamic)

Copy-set(dynamic)



Copy-set(dynamic)

Copy-set(dynamic)

Contains an entry for each block (owner)



Node-1 Node-2 Node-n

Owned blocks table Owned blocks table Owned blocks table

Node boundary

2. centralized-Server algorithm In this method, each entry of the block table managed by the centralized

server, has an owner-node field that indicates the current owner node of the block and a copy-set field that contains a list of nodes having a valid copy of the block.

Block address(Fixed)

Owner node(dynamic)

Copy-set (dynamic)

Contains an entry for each block

Node-1

Node i

Node-n

Node boundaryNode boundary

Block table

3. Fixed distributed-server algorithm The role of the centralized server is distributed to several distributed

servers There is a block manager on several nodes Each block manager manages a predetermined subset of blocks, and a

mapping function is used to map a block to a particular block manager and its corresponding node.

Block address(fixed)



Owner node(dynamic)

Owner node(dynamic)

Owner node(dynamic)

Copy-set (dynamic)

Copy-set (dynamic)

Copy-set (dynamic)

Contains entries for a fixed subset of all blocks



Node-1 Node-2 Node-nNode boundary

Block table, Block manager Block table, Block manager Block table, Block manager

Node boundary

4. Dynamic distributed-server algorithm Each node has a block table that contains an entry for all blocks Each entry of the table has a probable-owner field that gives the node a

hint on the location of the owner. Each true owner block contains a copy-set field that provides a list of nodes

having a valid copy of the block




ProbableOwner node(dynamic)



Copy-set (dynamic)

Copy-set (dynamic)

Copy-set (dynamic)


Node-1 Node-2 Node-nNode boundary

Block table Block table Block table

Node boundary



Entry for true owner node



Replicated Nonmigrating Blocks (RNBs)

In this strategy, a shared-memory block may be replicated at multiple nodes of the system

The location of replica is fixed A read or write access to a memory address is carried out by

sending the access request to one of the nodes having a replica of the block containing the memory address.

Write-update protocol is used Sequential consistency is ensured by using a Global-sequencer.

AssignmentRelease consistency DSM system: MUNIN system

Replacement strategies in DSM DSM system allows shared-memory blocks to be migrated/replicated To provide the space for the addressed block1. Which block should be replaced to make space for new block ?2. Where should the replaced block be placed ?

Which block to replace ?1. Usage based versus non-usage based

Usage based algorithms keep track of the history of usage of a cache line or page and use this information to make replacement decisions.

The reuse of cache normally improves the replacement status Eg: LRU (Least Recently Used) Non-usage based algorithms do not take the record of use of cache lines

into account when doing replacement Eg: FIFO and Rand (Random replacement)

2. Fixed space versus variable space Fixed space- Cache size is fixed Variable space- Cache size changes dynamically Replacement in fixed space algorithms simply involves the selection of a

specific cache line. Replacement in variable space algorithm requires swap-out

Usage based-fixed space algorithms are more suitable for DSM

In DSM each node is classified into one of the following five types1. Unused- free memory block 2. Nil- Invalidated block3. Read-only- only read access right4. Read-owned- owner with Read access right5. Writable- write access permission

Replacement priority1. Both unused and nil blocks have the highest priority2. The read-only block have the next replacement priority- a copy of a

read-only block is available with its owner and is possible to discard it3. Read-owned and writable blocks for which replica exist on some other

node and easy to pass ownership to one of the replica nodes.4. Read-owned and writable blocks have the lowest priority because it

involves replacement of ownership as well as memory block.

Where to place a replaced block ? Once a memory block has been selected for replacement, it should be

ensured that the useful information should not be lost.1. Using Secondary store- Transfer the block on to a local disk

Advantages: Simple, easy for next access, not required n/w to access.2. Using the memory space of other nodes- Keep track of free memory

space at all nodes and transfer the replaced block to the memory of a node with available space.Disadvantage: Requires to maintain a table of free memory space for all nodes.

Thrashing It occur when the system spends a large amount of time transferring

shared data blocks from one node to another, compared to the time spent doing the useful work of executing application processes.

It may occur in the following situations.1. System may allow data blocks to migrate from one node to another.2. When interleaved data access made by two or more nodes causes a data

block to move back and forth from one node to another in quick succession (ping-pong effect)

3. When blocks with read-only permissions are repeatedly invalidated soon after they are replicated.

Thrashing degrades system performance considerably.Creates poor locality of references.

Following are the methods to solve thrashing problems.1. Providing application-controlled locks- Locking data to prevent other

nodes from accessing that data for a short period of time .

2. Nailing a block to a node for a minimum amount time Disallow a block to be taken away from a node until a minimum amount of

time ‘t’ elapses after its allocation to that node. The time ‘t’ is fixed or tuned based on the access patterns. If a process accessing a block for writing, other processes will be

preventing from accessing the block until time ‘t’ elapses. Tuning of ‘t’ dynamically is preferably used because it reduces processor

idle time. MMU’s reference bits are used to fix ‘t’ ‘t’ depends on the length of the processes que waiting for accessing block.

3. Tailoring the coherence algorithm to the shared-data usage patterns Use different coherence protocols for shared data having different

protocols. Use coherence protocols for write-shared variables, which avoids

thrashing Complete transparency of DSM is compromised while minimizing

thrashing. Application-controlled locks makes DSM non-trasparent.

chap 4

Technology

shared memory space

data block of local

shared data block

distributed shared memory

shared virtual memory

desired memory block

sharedmemory paradigm

memory unit