operating systems - distributed systems · 2013-12-09 · data and process migration is under the...

Operating SystemsDistributed Systems

Stephan Sigg

Distributed and Ubiquitous SystemsTechnische Universitat Braunschweig

February 11, 2011

Stephan Sigg Operating Systems 1/53

Overview and Structure

Introduction to operating systemsHistoryArchitectures

ProcessesProcesses, Threads, IPC, SchedulingSynchronisationDeadlocks

Memory managementPagingSegmentation

Filesystems

Security and Protection

Distributed systems

Cryptography


OutlineDistribured systems

1 Distributed systemsDistributed systemsDistributed file systemsDistributed synchronisation


Distributed systemsIntroduction

A distributed system is a collection of loosely coupledprocessors

Interconnected by a communication network

Network

Site ASi

te B

Site

C

Server

Resources Client



Reasons for building distributed systems

Resource sharing

Computation speed-up

Reliability

Communication




Resource sharing :

Different sites are connected to one anotherA user at one site may be able to use theresources available at another site




Computation speed-up :

Partition computation into sub-computationscan are run concurrently




Reliability :

If one site fails in a distributed system, theremaining sites can continue operatingThis increases the reliability of the system




Communication :

When several sites are connected, users atdifferent physical locations can communicate



Types of network-based operating systems

Network operating systems

Remote loginRemote file transfer

Distributed operating systems

Data migrationComputation migrationProcess migration



Network operating systems

The network operating system provides an environment inwhich users can access remote resources.

Remote login :

Users log in remotelyTypically login and password requiredAfter login, the process used to access acts as aproxy for the userThe remote user can provide any actions on theremote machine like a local user

Remote file transfer :

File transfer from one machine to anotherftp, scp, ...




In a distributed operating system, users access remoteresources in the same way they access local resources

Data and process migration is under the control of thedistributed operating system

Data migration :

User could work on a local copy of remote filesOr only small parts of file are transferred andmodifications are forwarded to remote host






Computation migration :

Transfer computational load rather than dataMessage transfer to trigger remote computationOr execute a routine on a remote system viaRPC (Method invoked on local system andresult is returned to the method from theremote system)






Process migration :

Extension of computation migrationUsed for Load balancing, computationalspeed-up, hardware/software preference, Dataaccess


OutlineDistributed systems



Distributed systemsDistributed file systems

Definition

A service is a software entity running on one or more machines andproviding a particular type of function to clients

Definition

A client is a process can invoke a service using a set of operationsthat form its client interface

Definition

A server is the service software running on a single machine



A file system provides file services to clients

A client interface for a file service is formed by a set ofprimitive file operations (create, delete, read, write)

The primary hardware components that a file server controlsis a set of local secondary-storage devices on which files arestored

A distributed file system is a file system whose clients, serversand storage devices are dispersed among the machines of adistributed system



naming

Naming is a mapping between logical and physical objects

In a local file system, symbolical file names are translated to anumerical identifier that is in turn mapped to disk blocks

In a distributed file system, another abstraction layer(network) is added to this naming scheme



Remote file access

When accessing a remote file, data from the file has to betransferred

Basic caching schemeCache update policyConsistency

Network

Site

BSit

e C

Server Workstation

server disk storage local disk storage



Basic caching scheme

A copy of the data is brought to the client system

Access is performed on the copy

After some time or event the cached file is transferred back tothe original location

The challenge is to keep the cached copies and the master fileconsistent with reasonable network traffic



Cache update policy

The policy used to write modified data blocks back to theserver‘s master copy has a critical effect on the system‘sperformance and reliability

Write through policy

Write data as soon as they are placed in any cacheLittle information is lost when a client system crashesEach write access has to wait until the information is sent tothe server

Delayed write policy(write-back caching)

Write data at a later timeWrite accesses complete more quickly since the do not have towait for data to be transferredData may be overwritten before they are written backThis reduces the required network traffic



Consistency

A client machine has to decide whether locally cached copy isconsistent with master copy

Client-initiated approach

Client initiates validity check if data is consistentBut: when and how often the check shall be doneEvery access coupled with a validity check is delayed

Server-initiated approach

Server records for each client the files that it cachesWhen the server detects a potential inconsistency, it mustreact (when two clients in conflicting modes access the file)In UNIX, server and must be notified whenever a file is openedand the intended mode (read or write)In conflicting cases, the server could e.g. disable caching forthe particular file


OutlineDistributed systems



Distributed systemsDistributed synchronisation

Event ordering

In a distributed system it might be hard to say for two events,which one occurred first

The reason is, that no common clock may exist for these twoevents

Network

Site

BSit

e A



Event ordering – The happened-before relation

All events executed in a single process are totally ordered

The happened before relation is defined as

If A and B are events in the same process and A was executedbefore B, then A→ BIf A is the event of sending a message by one process and B isthe event of receiving that message by another process, thenA→ BIf A→ B and B → C then A→ C




If two events are not related by the happened before relation,A did not happen before B and B did not happen before A

These events were executed concurrently

Neither event can causally affect the other

If A→ B then A can affect B




We do not know which of the concurrent events happenedfirst.

Since neither event can affect the other, this is, however, notimportant

This is important only for the processes that care about theorder of two concurrent events agree on some order

?

?

?

?



Event ordering – Lamport clocks

To synchronise logical clocks on distributed systems, Lamportused the happened-before relation

Messages are attached the creation time on their localmachine

Processes synchronised by Lamport clocks alter their clockvalue whenever an incorrect synchronisation becomesapparent due to message passing

P1 2 3P P

0

3

6

9

12

1518

0

4

81216

05

10

15

20

2530

20

2421

24

28 354032

m1

2

3

4

5

m

m

m

m

P1 2 3P P

0

3

6

9

12

1518

0

4

81216

05

10

15

20

2530

20

2921

36

33 354037

m1

2

3

4

5

m

m

m

m



Event ordering – Lamport clocks

With Lamport clocks, a total ordering of processes is possible

However, nothing can be said about the relationship betweentwo events a and b by merely comparing their time values.

Example here: m1 and m2 – With Lamport clocks can notsay, which event happened first

P1 2 3P P

0

3

6

9

12

1518

0

4

81216

05

10

15

20

2530

20

2421

24

28 354032

m1

2

3

4

5

m

m

m

m

P1 2 3P P

0

3

6

9

12

1518

0

4

81216

05

10

15

20

2530

20

2921

36

33 354037

m1

2

3

4

5

m

m

m

m



Event ordering – Implementation

Vector clocks are designed to ensure that a message isdelivered only if all messages that causally presede it havealso been received as well.

With vector clocks, processes maintain and update an eventcounter for all events in the system

P1

2

3

P

P

m1

2m

(1,0,0) (1,1,0)

(1,0,0)

(1,1,0) (1,0,0)

(1,1,0)m1 2m

(1,1,0)



Approaches to implement mutual exclusion

Assumptions

System consists of n processesEach resides at different processorProcesses are numbered uniquely from 1 to nEach process has its own processor




Centralised approach :

One of the processes coordinates the entry tothe critical sectionEach process that wants to invoke mutualexclusion sends a request message to thecoordinatorThe process receives a reply message if requestis positively approvedProcess then enters its critical sectionAfter exiting its critical section, the processsends a release message to the coordinator




Fully distributed approach :

Distribute decision making across entire systemWhen process Pi wants to enter critical section

Generates a new timestamp TSiSends the message request(Pi ,TSi ) to allprocesses in the systemOn receiving a request message a process mayreply immediately or it may defer the replyA process that received a reply message fromall other processes enters its critical sectionAfter leaving the critical section, process sendsa reply message to all its deferred requests





Positive aspects

Mutual exclusion is obtainedFreedom from deadlock is ensuredFreedom from starvation is ensuredThe number of messages per critical sectionentry is 2 · (n − 1)





Negative aspects

Processes need to know the identity of all otherprocesses in the systemIf one process fails , the entire scheme collapsesProcesses that have not entered their criticalsection must pause frequently, to assure otherprocesses that they intend to enter the criticalsection




Token passing approach :

Token is circulated among processesOnly the process with the token is allowed toaccess the critical section at a time

Possible failure cases

When the token is lost, an election must becalled to generate a new tokenIf a process fails a new logical ring must beestablished



Atomicity

In distributed systems, also atomic operations are required

Example: Transaction that has to be executed by either noneor all participating nodes

To ensure atomicity, the following protocols can beimplemented

Two-phase commit protocolThree-phase commit protocol



Atomicity – Two-phase commit

In 1978, Gray introduced the two-phase commit protocol

In the protocol, a coordinator asks all participating nodes tocommit

If only one of the nodes does not answer/agree, the commit isaborted

Otherwise, all nodes commit simultaneouslycommit commit

init

readywait

abort commit commitabort

initVote-request/vote-abort

Commit/Vote-request

Vote-abort/Global-abort

Vote-request/vote-commit

Global-abort/ack Global-commit/ackVote-commit/global-commit

Coordinator Participant



Atomicity – Two-phase commit

The two-phase commit protocol has several problems whensingle nodes fail. This event will frequently result in blockingall participating nodes until the failed node recovers

When the coordinator has to restart in its Wait state, it mightmiss some of the answers of the clientscommit commit

init

readywait

abort commit commitabort


Commit/Vote-request



Global-abort/ack Global-commit/ackVote-commit/global-commit




Atomicity – Three-phase commit

The three-phase commit protocol solves this problem byintroducing an intermediate precommit state

When the coordinator fails in precommit state, the distributednodes can still take independent decisions

It is only possible to arrive in precommit, when all nodesalready agreed to commit

Therefore, a node in precommit state can safely commitregardless of the current state of all other nodes



Atomicity – Three-phase commit

init

readywait

abort precommit

commit

precommit

commit

abort


Commit/Vote-request



Global-abort/ack Prepare-commit/ready-commit

Global-commit/ackReady-commit/global-commit

Vote-commit/prepare-commit




Deadlock handling

To handle deadlocks in distributed systems, the samedeadlock-prevention and deadlock-avoidance algorithms as fornon-distributed system can be applied

However, some modifications must be applied

All resources in the whole distributed system must be assignedunique numbers (for resource ordering)For the bankers algorithm, one process must hold allinformation necessary to carry out the algorithm



Deadlock handling

A timestamp based deadlock prevention scheme fordistributed systems (preemptive)

Each process is assigned a timestamp at creation timeOlder processes are allowed to pre-empt newer processes



Deadlock handling

A timestamp based deadlock prevention scheme fordistributed systems (non-preemptive)

Each process is assigned a timestamp at creation timeOlder processes wait for younger processes to finish their tasksYounger processes quit and restart when encountering olderprocesses holding desired resourcesProcesses that quit and restart keep their timestamp



Deadlock detection

Deadlocks can be detected by creating resource allocationgraphs

The challenge for distributed systems is to decide how tomaintain the graph

Centralised approach

Local resource allocation graphs are merged to a global viewDue to delay in the network, the global view may differtemporarily from the actual situation

Site BSite A Coordinator



Deadlock detection – Fully distributed approachFully distributed approach

Controllers share equally responsibility for detecting deadlockEvery site hosts a resource allocation graphThis graph contains an additional node Pex

An arc Pi → Pex indicates that Pi is waiting for a data item inanother site held by any processAn arc Pex → Pi indicates that any process at another citewaits to acquire Pi

Due to delay in the network, the global view may differtemporarily from the actual situationA cycle including Pex does not necessarily mean that thesystem is in a deadlocked state

Site BSite A



Election algorithms

Many distributed algorithms employ a coordinator process

Especially, the coordinator may fail so that a new coordinatormust be elected

Typically, processes are assigned unique priority numbers

To choose an appropriate process as the coordinator, severalelection algorithms can be applied

The Bully algorithmThe Ring algorithm



Election algorithms – The Bully algorithm

A process that notices the absence of a coordinator sends anelection message with his priority number to all processes withhigher number

If it does not receive an answer during a time interval T , itbecomes the coordinator and informs all active processes

A process receiving an election message answers only if it hasa higher priority (and then starts an election itself)

When the old coordinator becomes available again it alsostarts an election



Election algorithms – The Bully algorithm

1

2

34

5

67

8

1

2

34

5

67

8

1

2

34

5

67

8

1

2

34

5

67

8

1

2

34

5

67

8

1

2

34

5

67

8

electionele

ction

electionelection

ok ok

ok

election

election

electionelecti

on

election

electio

n

ok

okok

coordinator

coordinator

coordin

ator

coordinator coordinator



Election algorithms – The Ring algorithm

The ring algorithm assumes that processes are in anunambiguous order in which messages are sent

A process that notices the absence of a coordinator creates alist with its priority value as first entry and sends this to itssuccessor

If the successor is down, the message is sent to the successorssuccessor (and so on)

A process receiving such a list adds its priority to the end ofthe list if its own number is not on the list and forwards thelist to its successor

If the own priority is already on the list, the process searchesfor the process with highest priority

The message is then circulated a second time as coordinatormessage to inform all nodes of the new coordinator



Election algorithms – The Ring algorithm

1

2

34

5

67

8

1

2

34

5

67

8

1

2

34

5

67

8

1

2

4

68

1

2

34

5

67

8

1

2

34

5

67

8

[4]

[4,5,6,7,1,2,3]

3

5

7

no response

recognise own index

[4,5]

[4,5,6]

[4,5,6,7]

[4,5,6,7][4,5,6,7,1] [4,5,6,7,1,2]

no response

Circulated message:Coordinator:

[7 - 4,5,6,1,2,3]


Distributed systemsQuestions, discussion, remarks

Questions?


LiteratureRecommended literature

A. Tanenbaum, Moderne Betriebssysteme, 2nd edition,Prentice Hall, 2009.

A. Tanenbaum, Modern operating systems, 3rd edition,Prentice Hall, 2008.

A. Silberschatz et al. Operating system concepts, Wiley, 2004.

W. Stallings, Operating systems, 6th edition, Prentice Hall,2008.


operating systems - distributed systems · 2013-12-09 · data and process migration is under the...

Documents