operating systems - distributed systems · 2013-12-09 · data and process migration is under the...
TRANSCRIPT
Operating SystemsDistributed Systems
Stephan Sigg
Distributed and Ubiquitous SystemsTechnische Universitat Braunschweig
February 11, 2011
Stephan Sigg Operating Systems 1/53
Overview and Structure
Introduction to operating systemsHistoryArchitectures
ProcessesProcesses, Threads, IPC, SchedulingSynchronisationDeadlocks
Memory managementPagingSegmentation
Filesystems
Security and Protection
Distributed systems
Cryptography
Stephan Sigg Operating Systems 2/53
OutlineDistribured systems
1 Distributed systemsDistributed systemsDistributed file systemsDistributed synchronisation
Stephan Sigg Operating Systems 3/53
Distributed systemsIntroduction
A distributed system is a collection of loosely coupledprocessors
Interconnected by a communication network
Network
Site ASi
te B
Site
C
Server
Resources Client
Stephan Sigg Operating Systems 4/53
Distributed systemsIntroduction
Reasons for building distributed systems
Resource sharing
Computation speed-up
Reliability
Communication
Stephan Sigg Operating Systems 5/53
Distributed systemsIntroduction
Reasons for building distributed systems
Resource sharing :
Different sites are connected to one anotherA user at one site may be able to use theresources available at another site
Stephan Sigg Operating Systems 6/53
Distributed systemsIntroduction
Reasons for building distributed systems
Computation speed-up :
Partition computation into sub-computationscan are run concurrently
Stephan Sigg Operating Systems 7/53
Distributed systemsIntroduction
Reasons for building distributed systems
Reliability :
If one site fails in a distributed system, theremaining sites can continue operatingThis increases the reliability of the system
Stephan Sigg Operating Systems 8/53
Distributed systemsIntroduction
Reasons for building distributed systems
Communication :
When several sites are connected, users atdifferent physical locations can communicate
Stephan Sigg Operating Systems 9/53
Distributed systemsIntroduction
Types of network-based operating systems
Network operating systems
Remote loginRemote file transfer
Distributed operating systems
Data migrationComputation migrationProcess migration
Stephan Sigg Operating Systems 10/53
Distributed systemsIntroduction
Network operating systems
The network operating system provides an environment inwhich users can access remote resources.
Remote login :
Users log in remotelyTypically login and password requiredAfter login, the process used to access acts as aproxy for the userThe remote user can provide any actions on theremote machine like a local user
Remote file transfer :
File transfer from one machine to anotherftp, scp, ...
Stephan Sigg Operating Systems 11/53
Distributed systemsIntroduction
Distributed operating systems
In a distributed operating system, users access remoteresources in the same way they access local resources
Data and process migration is under the control of thedistributed operating system
Data migration :
User could work on a local copy of remote filesOr only small parts of file are transferred andmodifications are forwarded to remote host
Stephan Sigg Operating Systems 12/53
Distributed systemsIntroduction
Distributed operating systems
In a distributed operating system, users access remoteresources in the same way they access local resources
Data and process migration is under the control of thedistributed operating system
Computation migration :
Transfer computational load rather than dataMessage transfer to trigger remote computationOr execute a routine on a remote system viaRPC (Method invoked on local system andresult is returned to the method from theremote system)
Stephan Sigg Operating Systems 13/53
Distributed systemsIntroduction
Distributed operating systems
In a distributed operating system, users access remoteresources in the same way they access local resources
Data and process migration is under the control of thedistributed operating system
Process migration :
Extension of computation migrationUsed for Load balancing, computationalspeed-up, hardware/software preference, Dataaccess
Stephan Sigg Operating Systems 14/53
OutlineDistributed systems
1 Distributed systemsDistributed systemsDistributed file systemsDistributed synchronisation
Stephan Sigg Operating Systems 15/53
Distributed systemsDistributed file systems
Definition
A service is a software entity running on one or more machines andproviding a particular type of function to clients
Definition
A client is a process can invoke a service using a set of operationsthat form its client interface
Definition
A server is the service software running on a single machine
Stephan Sigg Operating Systems 16/53
Distributed systemsDistributed file systems
A file system provides file services to clients
A client interface for a file service is formed by a set ofprimitive file operations (create, delete, read, write)
The primary hardware components that a file server controlsis a set of local secondary-storage devices on which files arestored
A distributed file system is a file system whose clients, serversand storage devices are dispersed among the machines of adistributed system
Stephan Sigg Operating Systems 17/53
Distributed systemsDistributed file systems
naming
Naming is a mapping between logical and physical objects
In a local file system, symbolical file names are translated to anumerical identifier that is in turn mapped to disk blocks
In a distributed file system, another abstraction layer(network) is added to this naming scheme
Stephan Sigg Operating Systems 18/53
Distributed systemsDistributed file systems
Remote file access
When accessing a remote file, data from the file has to betransferred
Basic caching schemeCache update policyConsistency
Network
Site
BSit
e C
Server Workstation
server disk storage local disk storage
Stephan Sigg Operating Systems 19/53
Distributed systemsDistributed file systems
Basic caching scheme
A copy of the data is brought to the client system
Access is performed on the copy
After some time or event the cached file is transferred back tothe original location
The challenge is to keep the cached copies and the master fileconsistent with reasonable network traffic
Stephan Sigg Operating Systems 20/53
Distributed systemsDistributed file systems
Cache update policy
The policy used to write modified data blocks back to theserver‘s master copy has a critical effect on the system‘sperformance and reliability
Write through policy
Write data as soon as they are placed in any cacheLittle information is lost when a client system crashesEach write access has to wait until the information is sent tothe server
Delayed write policy(write-back caching)
Write data at a later timeWrite accesses complete more quickly since the do not have towait for data to be transferredData may be overwritten before they are written backThis reduces the required network traffic
Stephan Sigg Operating Systems 21/53
Distributed systemsDistributed file systems
Consistency
A client machine has to decide whether locally cached copy isconsistent with master copy
Client-initiated approach
Client initiates validity check if data is consistentBut: when and how often the check shall be doneEvery access coupled with a validity check is delayed
Server-initiated approach
Server records for each client the files that it cachesWhen the server detects a potential inconsistency, it mustreact (when two clients in conflicting modes access the file)In UNIX, server and must be notified whenever a file is openedand the intended mode (read or write)In conflicting cases, the server could e.g. disable caching forthe particular file
Stephan Sigg Operating Systems 22/53
OutlineDistributed systems
1 Distributed systemsDistributed systemsDistributed file systemsDistributed synchronisation
Stephan Sigg Operating Systems 23/53
Distributed systemsDistributed synchronisation
Event ordering
In a distributed system it might be hard to say for two events,which one occurred first
The reason is, that no common clock may exist for these twoevents
Network
Site
BSit
e A
Stephan Sigg Operating Systems 24/53
Distributed systemsDistributed synchronisation
Event ordering – The happened-before relation
All events executed in a single process are totally ordered
The happened before relation is defined as
If A and B are events in the same process and A was executedbefore B, then A→ BIf A is the event of sending a message by one process and B isthe event of receiving that message by another process, thenA→ BIf A→ B and B → C then A→ C
Stephan Sigg Operating Systems 25/53
Distributed systemsDistributed synchronisation
Event ordering – The happened-before relation
If two events are not related by the happened before relation,A did not happen before B and B did not happen before A
These events were executed concurrently
Neither event can causally affect the other
If A→ B then A can affect B
Stephan Sigg Operating Systems 26/53
Distributed systemsDistributed synchronisation
Event ordering – The happened-before relation
We do not know which of the concurrent events happenedfirst.
Since neither event can affect the other, this is, however, notimportant
This is important only for the processes that care about theorder of two concurrent events agree on some order
?
?
?
?
Stephan Sigg Operating Systems 27/53
Distributed systemsDistributed synchronisation
Event ordering – Lamport clocks
To synchronise logical clocks on distributed systems, Lamportused the happened-before relation
Messages are attached the creation time on their localmachine
Processes synchronised by Lamport clocks alter their clockvalue whenever an incorrect synchronisation becomesapparent due to message passing
P1 2 3P P
0
3
6
9
12
1518
0
4
81216
05
10
15
20
2530
20
2421
24
28 354032
m1
2
3
4
5
m
m
m
m
P1 2 3P P
0
3
6
9
12
1518
0
4
81216
05
10
15
20
2530
20
2921
36
33 354037
m1
2
3
4
5
m
m
m
m
Stephan Sigg Operating Systems 28/53
Distributed systemsDistributed synchronisation
Event ordering – Lamport clocks
With Lamport clocks, a total ordering of processes is possible
However, nothing can be said about the relationship betweentwo events a and b by merely comparing their time values.
Example here: m1 and m2 – With Lamport clocks can notsay, which event happened first
P1 2 3P P
0
3
6
9
12
1518
0
4
81216
05
10
15
20
2530
20
2421
24
28 354032
m1
2
3
4
5
m
m
m
m
P1 2 3P P
0
3
6
9
12
1518
0
4
81216
05
10
15
20
2530
20
2921
36
33 354037
m1
2
3
4
5
m
m
m
m
Stephan Sigg Operating Systems 29/53
Distributed systemsDistributed synchronisation
Event ordering – Implementation
Vector clocks are designed to ensure that a message isdelivered only if all messages that causally presede it havealso been received as well.
With vector clocks, processes maintain and update an eventcounter for all events in the system
P1
2
3
P
P
m1
2m
(1,0,0) (1,1,0)
(1,0,0)
(1,1,0) (1,0,0)
(1,1,0)m1 2m
(1,1,0)
Stephan Sigg Operating Systems 30/53
Distributed systemsDistributed synchronisation
Approaches to implement mutual exclusion
Assumptions
System consists of n processesEach resides at different processorProcesses are numbered uniquely from 1 to nEach process has its own processor
Stephan Sigg Operating Systems 31/53
Distributed systemsDistributed synchronisation
Approaches to implement mutual exclusion
Centralised approach :
One of the processes coordinates the entry tothe critical sectionEach process that wants to invoke mutualexclusion sends a request message to thecoordinatorThe process receives a reply message if requestis positively approvedProcess then enters its critical sectionAfter exiting its critical section, the processsends a release message to the coordinator
Stephan Sigg Operating Systems 32/53
Distributed systemsDistributed synchronisation
Approaches to implement mutual exclusion
Fully distributed approach :
Distribute decision making across entire systemWhen process Pi wants to enter critical section
Generates a new timestamp TSiSends the message request(Pi ,TSi ) to allprocesses in the systemOn receiving a request message a process mayreply immediately or it may defer the replyA process that received a reply message fromall other processes enters its critical sectionAfter leaving the critical section, process sendsa reply message to all its deferred requests
Stephan Sigg Operating Systems 33/53
Distributed systemsDistributed synchronisation
Approaches to implement mutual exclusion
Fully distributed approach :
Positive aspects
Mutual exclusion is obtainedFreedom from deadlock is ensuredFreedom from starvation is ensuredThe number of messages per critical sectionentry is 2 · (n − 1)
Stephan Sigg Operating Systems 34/53
Distributed systemsDistributed synchronisation
Approaches to implement mutual exclusion
Fully distributed approach :
Negative aspects
Processes need to know the identity of all otherprocesses in the systemIf one process fails , the entire scheme collapsesProcesses that have not entered their criticalsection must pause frequently, to assure otherprocesses that they intend to enter the criticalsection
Stephan Sigg Operating Systems 35/53
Distributed systemsDistributed synchronisation
Approaches to implement mutual exclusion
Token passing approach :
Token is circulated among processesOnly the process with the token is allowed toaccess the critical section at a time
Possible failure cases
When the token is lost, an election must becalled to generate a new tokenIf a process fails a new logical ring must beestablished
Stephan Sigg Operating Systems 36/53
Distributed systemsDistributed synchronisation
Atomicity
In distributed systems, also atomic operations are required
Example: Transaction that has to be executed by either noneor all participating nodes
To ensure atomicity, the following protocols can beimplemented
Two-phase commit protocolThree-phase commit protocol
Stephan Sigg Operating Systems 37/53
Distributed systemsDistributed synchronisation
Atomicity – Two-phase commit
In 1978, Gray introduced the two-phase commit protocol
In the protocol, a coordinator asks all participating nodes tocommit
If only one of the nodes does not answer/agree, the commit isaborted
Otherwise, all nodes commit simultaneouslycommit commit
init
readywait
abort commit commitabort
initVote-request/vote-abort
Commit/Vote-request
Vote-abort/Global-abort
Vote-request/vote-commit
Global-abort/ack Global-commit/ackVote-commit/global-commit
Coordinator Participant
Stephan Sigg Operating Systems 38/53
Distributed systemsDistributed synchronisation
Atomicity – Two-phase commit
The two-phase commit protocol has several problems whensingle nodes fail. This event will frequently result in blockingall participating nodes until the failed node recovers
When the coordinator has to restart in its Wait state, it mightmiss some of the answers of the clientscommit commit
init
readywait
abort commit commitabort
initVote-request/vote-abort
Commit/Vote-request
Vote-abort/Global-abort
Vote-request/vote-commit
Global-abort/ack Global-commit/ackVote-commit/global-commit
Coordinator Participant
Stephan Sigg Operating Systems 39/53
Distributed systemsDistributed synchronisation
Atomicity – Three-phase commit
The three-phase commit protocol solves this problem byintroducing an intermediate precommit state
When the coordinator fails in precommit state, the distributednodes can still take independent decisions
It is only possible to arrive in precommit, when all nodesalready agreed to commit
Therefore, a node in precommit state can safely commitregardless of the current state of all other nodes
Stephan Sigg Operating Systems 40/53
Distributed systemsDistributed synchronisation
Atomicity – Three-phase commit
init
readywait
abort precommit
commit
precommit
commit
abort
initVote-request/vote-abort
Commit/Vote-request
Vote-abort/Global-abort
Vote-request/vote-commit
Global-abort/ack Prepare-commit/ready-commit
Global-commit/ackReady-commit/global-commit
Vote-commit/prepare-commit
Coordinator Participant
Stephan Sigg Operating Systems 41/53
Distributed systemsDistributed synchronisation
Deadlock handling
To handle deadlocks in distributed systems, the samedeadlock-prevention and deadlock-avoidance algorithms as fornon-distributed system can be applied
However, some modifications must be applied
All resources in the whole distributed system must be assignedunique numbers (for resource ordering)For the bankers algorithm, one process must hold allinformation necessary to carry out the algorithm
Stephan Sigg Operating Systems 42/53
Distributed systemsDistributed synchronisation
Deadlock handling
A timestamp based deadlock prevention scheme fordistributed systems (preemptive)
Each process is assigned a timestamp at creation timeOlder processes are allowed to pre-empt newer processes
Stephan Sigg Operating Systems 43/53
Distributed systemsDistributed synchronisation
Deadlock handling
A timestamp based deadlock prevention scheme fordistributed systems (non-preemptive)
Each process is assigned a timestamp at creation timeOlder processes wait for younger processes to finish their tasksYounger processes quit and restart when encountering olderprocesses holding desired resourcesProcesses that quit and restart keep their timestamp
Stephan Sigg Operating Systems 44/53
Distributed systemsDistributed synchronisation
Deadlock detection
Deadlocks can be detected by creating resource allocationgraphs
The challenge for distributed systems is to decide how tomaintain the graph
Centralised approach
Local resource allocation graphs are merged to a global viewDue to delay in the network, the global view may differtemporarily from the actual situation
Site BSite A Coordinator
Stephan Sigg Operating Systems 45/53
Distributed systemsDistributed synchronisation
Deadlock detection – Fully distributed approachFully distributed approach
Controllers share equally responsibility for detecting deadlockEvery site hosts a resource allocation graphThis graph contains an additional node Pex
An arc Pi → Pex indicates that Pi is waiting for a data item inanother site held by any processAn arc Pex → Pi indicates that any process at another citewaits to acquire Pi
Due to delay in the network, the global view may differtemporarily from the actual situationA cycle including Pex does not necessarily mean that thesystem is in a deadlocked state
Site BSite A
Stephan Sigg Operating Systems 46/53
Distributed systemsDistributed synchronisation
Election algorithms
Many distributed algorithms employ a coordinator process
Especially, the coordinator may fail so that a new coordinatormust be elected
Typically, processes are assigned unique priority numbers
To choose an appropriate process as the coordinator, severalelection algorithms can be applied
The Bully algorithmThe Ring algorithm
Stephan Sigg Operating Systems 47/53
Distributed systemsDistributed synchronisation
Election algorithms – The Bully algorithm
A process that notices the absence of a coordinator sends anelection message with his priority number to all processes withhigher number
If it does not receive an answer during a time interval T , itbecomes the coordinator and informs all active processes
A process receiving an election message answers only if it hasa higher priority (and then starts an election itself)
When the old coordinator becomes available again it alsostarts an election
Stephan Sigg Operating Systems 48/53
Distributed systemsDistributed synchronisation
Election algorithms – The Bully algorithm
1
2
34
5
67
8
1
2
34
5
67
8
1
2
34
5
67
8
1
2
34
5
67
8
1
2
34
5
67
8
1
2
34
5
67
8
electionele
ction
electionelection
ok ok
ok
election
election
electionelecti
on
election
electio
n
ok
okok
coordinator
coordinator
coordin
ator
coordinator coordinator
Stephan Sigg Operating Systems 49/53
Distributed systemsDistributed synchronisation
Election algorithms – The Ring algorithm
The ring algorithm assumes that processes are in anunambiguous order in which messages are sent
A process that notices the absence of a coordinator creates alist with its priority value as first entry and sends this to itssuccessor
If the successor is down, the message is sent to the successorssuccessor (and so on)
A process receiving such a list adds its priority to the end ofthe list if its own number is not on the list and forwards thelist to its successor
If the own priority is already on the list, the process searchesfor the process with highest priority
The message is then circulated a second time as coordinatormessage to inform all nodes of the new coordinator
Stephan Sigg Operating Systems 50/53
Distributed systemsDistributed synchronisation
Election algorithms – The Ring algorithm
1
2
34
5
67
8
1
2
34
5
67
8
1
2
34
5
67
8
1
2
4
68
1
2
34
5
67
8
1
2
34
5
67
8
[4]
[4,5,6,7,1,2,3]
3
5
7
no response
recognise own index
[4,5]
[4,5,6]
[4,5,6,7]
[4,5,6,7][4,5,6,7,1] [4,5,6,7,1,2]
no response
Circulated message:Coordinator:
[7 - 4,5,6,1,2,3]
Stephan Sigg Operating Systems 51/53
Distributed systemsQuestions, discussion, remarks
Questions?
Stephan Sigg Operating Systems 52/53
LiteratureRecommended literature
A. Tanenbaum, Moderne Betriebssysteme, 2nd edition,Prentice Hall, 2009.
A. Tanenbaum, Modern operating systems, 3rd edition,Prentice Hall, 2008.
A. Silberschatz et al. Operating system concepts, Wiley, 2004.
W. Stallings, Operating systems, 6th edition, Prentice Hall,2008.
Stephan Sigg Operating Systems 53/53