ds lecture 3 - techbox/ds_cloud/ds_lecture3.pdf · exceptions . instructor’s guide for coulouris,...

Distributed System: Lecture 3

Box Leangsuksun SWECO Endowed Professor, Computer Science Louisiana Tech University [email protected]

CTO, PB Tech International Inc. [email protected]

From Coulouris, Dollimore, Kindberg and Blair Distributed Systems:

Concepts and Design Edition 5, © Addison-Wesley 2012

Slides for Chapter 7: Operating System support

Outline

•  Background •  OS support

3

Backgound

•  With user/customer requirements •  system modeling concepts

–  physical –  Architecture –  Fundamental (interaction/performance, failure and

security) •  what if analysis & Design

4/22/14 Towards survivable architecture 4

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 © Pearson Education 2012

Clients-servers architecture: Processes & comm

Server

Client

Client

invocation

result

Serverinvocation

result

Process:Key:

Computer:


Peer-to-peer architecture


Sockets and ports interaction model

message

agreed port any port socket socket

Internet address = 138.37.88.249 Internet address = 138.37.94.248

other ports client server


Figure 7.1 System layers

Applications, services

Computer &

Platform

Middleware

OS: kernel,libraries & servers

network hardware

OS1

Computer & network hardware

Node 1 Node 2

Processes, threads,communication, ...

OS2Processes, threads,communication, ...

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed

Systems: Concepts and Design Edn. 3

© Addison-Wesley Publishers 2000

OS Support

•  Important roles of the system kernel/OS •  an understanding of the advantages and disadvantages of

splitting functionality between protection domains (kernel and user-level code)

•  the relation between operation system layer and middle layer •  how well the requirement of middleware & operating system

–  Efficient and robust access to physical resources –  The flexibility to implement a variety of resource-management policies




OS Support

•  The task of any operating system is to provide problem-oriented abstractions of the underlying

physical resources (For example, sockets rather than raw network access) –  the processors – Memory – Communications –  storage media

•  System call interface takes over the physical resources on a single node and manages them to present these resource abstractions

OS Support

•  Network operating systems –  They have a network capability built into them and so

can be used to access remote resources. Access is network-transparent for some – not all – type of resource.

– Multiple system images •  The node running a network operating system retain

autonomy in managing their own processing resources An operating system that produces a single system image like this for all the resources in a distributed system is called a distributed operating system




OS Support

•  a network operating system retain autonomy in managing their own processing resources

•  Single system image –  OS in which users are never concerned with where their

programs run, or the location of any resources. The operating system has control over all the nodes in the system




Middleware and network operating systems

•  No real distributed operating systems in general use, only network operating systems –  users more focus on their application software, which

often meets their current problem-solving needs –  users tend to prefer to have a degree of autonomy for

their machines, even is a closely knit organization

The combination of middleware and network operating systems provides an acceptable balance between the requirement for autonomy


System layers

Applications, services

Computer &

Platform

Middleware

OS: kernel,libraries & servers

network hardware

OS1

Computer & network hardware

Node 1 Node 2

Processes, threads,communication, ...

OS2Processes, threads,communication, ...

Mac OS!

Operating System Concepts"

Window!


Linux


What are core distinct features from OS support?

•  To enable DS applications •  Network enabled features


OS to solve Challenges in DSs

•  Heterogeneity •  Openness •  Security •  Scalability •  Fault handling •  Concurrency •  Transparency





Core OS functionality

Communicationmanager

Thread manager Memory manager

Supervisor

Process manager

Handles the creation of and

operations upon process

Tread creation, synchronization and scheduling

Communication between threads attached to

different processes on the same computer

Management of physical and

virtual memory

Dispatching of interrupts,

system call traps and other exceptions




Kernel and Protection

•  A system program that always runs with complete access privileged for the physical resources on its host computer

•  Execute in supervisor (privileged) mode; the kernel arranges that other processes execute in user (unprivileged) mode

•  sets up address spaces to protect itself and other processes and to provide processes with their required virtual memory layout

•  The process can safely transfer from a user-level address space to the kernel’s address space via an exception such as an interrupt or a system call trap


OS with Hardware-supported Protection

•  Dual-Mode Operation"•  I/O Protection"•  Memory Protection"•  CPU Protection"


Dual-Mode Operation

•  Sharing system resources requires operating system to ensure that an incorrect program cannot cause other programs to execute incorrectly."

•  Provide hardware support to differentiate between at least two modes of operations."1."User mode – execution done on behalf of a user."2.!Monitor mode (also kernel mode or system mode) –

execution done on behalf of operating system. "


Dual-Mode Operation (Cont.)

•  Mode bit added to computer hardware to indicate the current mode: monitor (0) or user (1)."

•  When an interrupt or fault occurs hardware switches to monitor mode."

Privileged instructions can be issued only in monitor mode.

monitor user

Interrupt/fault

set user mode


I/O Protection

•  All I/O instructions are privileged instructions."•  Must ensure that a user program could never gain

control of the computer in monitor mode (I.e., a user program that, as part of its execution, stores a new address in the interrupt vector). "


Use of A System Call to Perform I/O


Memory Protection

•  Must provide memory protection at least for the interrupt vector and the interrupt service routines."

•  In order to have memory protection, add two registers that determine the range of legal addresses a program may access:"–  Base register – holds the smallest legal physical memory address."–  Limit register – contains the size of the range "

•  Memory outside the defined range is protected."


Use of A Base and Limit Register


Hardware Address Protection


Hardware Protection

•  When executing in monitor mode, the operating system has unrestricted access to both monitor and user’s memory."

•  The load instructions for the base and limit registers are privileged instructions."


CPU Protection

•  Timer – interrupts computer after specified period to ensure operating system maintains control."–  Timer is decremented every clock tick."–  When timer reaches the value 0, an interrupt occurs."

•  Timer commonly used to implement time sharing."•  Time also used to compute the current time."•  Load-timer is a privileged instruction."




Processes and threads

•  A thread is the operating system abstraction of an activity (the term derives from the phrase “thread of execution”)

•  An execution environment is the unit of resource management: a collection of local kernel-managed resources to which its threads have access

•  An execution environment primarily consists –  An address space –  Thread synchronization and communication resources

such as semaphore and communication interfaces – High-level resources such as open file and windows

Single and Multithreaded Processes!

Benefits!

•  Responsiveness"

•  Resource Sharing "

•  Utilization of MP & Multicore Architectures"

A comparison of processes and threads as follows

•  Creating a new thread with an existing process is cheaper than creating a process.

•  More importantly, switching to a different thread within the same process is cheaper than switching between threads belonging to different process.

•  Threads within a process may share data and other resources conveniently and efficiently compared with separate processes.

•  But, by the same token, threads within a process are not protected from one another.

A comparison of processes and threads as follows (2)

•  The overheads associated with creating a process are in general considerably greater than those of creating a new thread. –  A new execution environment must first be created,

including address space table •  The second performance advantage of threads

concerns switching between threads – that is, running one thread instead of another at a given process

Types!

•  User-level thread"

•  Kernel-Level Thread"

User Threads!

•  Thread management done by user-level threads library"

•  Three primary thread libraries:"–  POSIX Pthreads!–  Win32 threads"–  Java threads"

Threading Issues!

•  Semantics of fork() and exec() system calls"

•  Thread cancellation"•  Signal handling"•  Thread pools"•  Thread specific data"•  Scheduler activations"

Semantics of fork() and exec()!

•  Does fork() duplicate only the calling thread or all threads?"

Thread Cancellation!

•  Terminating a thread before it has finished"

•  Two general approaches:"– Asynchronous cancellation terminates

the target thread immediately"– Deferred cancellation allows the target

thread to periodically check if it should be cancelled"

"

Signal Handling!

•  Signals are used in UNIX systems to notify a process that a particular event has occurred"

•  A signal handler is used to process signals"1.  Signal is generated by particular event"2.  Signal is delivered to a process"3.  Signal is handled"

•  Options:"–  Deliver the signal to the thread to which the signal

applies"–  Deliver the signal to every thread in the process"–  Deliver the signal to certain threads in the process"–  Assign a specific thread to receive all signals for the

process"

Thread Programming Paradigms!

•  On-demand - create a thread whenever you need "–  Easy to program"– More overheads"

•  Thread pool - create a pool of threads, and then assign tasks to them."– More efficient "– Difficult to program due to you have to manage threads

in your code"

Thread Pools!

•  Create a number of threads in a pool where they await work"

•  Advantages: (over thread on demand approach)"– Usually slightly faster to service a request

with an existing thread than create a new thread"

–  Allows the number of threads in the application(s) to be bound to the size of the pool"

Thread Specific Data!

•  Allows each thread to have its own copy of data"

•  Useful when you do not have control over the thread creation process (i.e., when using a thread pool)"

Scheduler Activations!

•  Both M:M and Two-level models require communication to maintain the appropriate number of kernel threads allocated to the application"

•  Scheduler activations provide upcalls - a communication mechanism from the kernel to the thread library"

•  This communication allows an application to maintain the correct number kernel threads"

Address spaces

•  Region, separated by inaccessible areas of virtual memory

•  Region do not overlap •  Each region is specified by the following

properties –  Its extent (lowest virtual address and size) – Read/write/execute permissions for the process’s

threads – Whether it can be grown upwards or downward


Address space

Stack

Text

Heap

Auxiliaryregions

0

2N

Address spaces (2)

•  A mapped file is one that is accessed as an array of bytes in memory. The virtual memory system ensures that accesses made in memory are reflected in the underlying file storage

•  A shared memory region is that is backed by the same physical memory as one or more regions belonging to other address spaces

•  The uses of shared regions include the following –  Libraries –  Kernel –  Data sharing and communication




Process Creation

•  Supported by the operating system. For example, the UNIX fork system call.

•  For a distributed system, the design of the process creation mechanism has to take account of the utilization of multiple computers

•  The choice of a new process can be separated into two independent aspects –  The choice of a target host –  The creation of an execution environment

Choice of process host

•  The choice of node at which the new process will reside – the process allocation decision – is a matter of policy

•  Transfer policy – Determines whether to situate a new process locally or

remotely. For example, on whether the local node is lightly or heavily load

•  Location policy – Determines which node should host a new process

selected for transfer. This decision may depend on the relative loads of nodes, on their machine architectures and on any specialized resources they may process




Choice of process host (2)

•  Process location policies may be –  Static –  Adaptive

•  Load-sharing systems may be – Centralized – Hierarchical –  decentralized

Load manager collect information about the nodes and use it to allocate new processes to node

One load manager component

Several load manager organized in a tree structure

Node exchange information with one another direct to make allocation decisions




Choice of process host (3)

•  In sender-initiated load-sharing algorithms, the node that requires a new process to be created is responsible for initiating the transfer decision

•  In receiver-initiated algorithm, a node whose load is below a given threshold advertises its existence to other nodes so that relatively loaded nodes will transfer work to it

•  Migratory load-sharing systems can shift load at any time, not just when a new process is created. They use a mechanism called process migration

Creation of a new execution environment

•  There are two approaches to defining and initializing the address space of a newly created process – Where the address space is of statically defined

•  For example, it could contain just a program text region, heap region and stack region

•  Address space regions are initialized from an executable file or filled with zeroes as appropriate

Creation of a new execution environment

–  The address space can be defined with respect to an existing execution environment

•  For example the newly created child process physically shares the parent’s text region, and has heap and stack regions that are copies of the parent’s in extent (as well as in initial contents)

•  When parent and child share a region, the page frames belonging to the parent’s region are mapped simultaneously into the corresponding child region

New ideas: cloud computing to deal with multiple app or processes & runtime issues

•  Virtualization •  System Image (OS or VM) •  App to run on VM •  Create, pause, ship, resume. •  This approach is to ease runtime environment

requirements by sending both runtime & app to target host





Copy-on-write

a) Before write b) After write

Shared frame

A's page table

B's page table

Process A’s address space Process B’s address space

Kernel

RA RB RB copied from RA

The pages are initially write-protected at the hardware level

page fault

The page fault handler allocates a new frame for process B and

copies the original frame’s

data into byte by byte

Client and server with threads

Server

N threads

Input-output

Client

Thread 2 makes

T1

Thread 1

requests to server

generates results

Requests

Receipt & queuing

Worker pool

A disadvantage of this architecture is its inflexibility

Another disadvantage is the high level of switching between the I/O and worker threads as they manipulate the share queue

Alternative server threading architectures

a. Thread-per-request b. Thread-per-connection c. Thread-per-object

remote

workers

I/O remoteremote I/O

per-connection threads per-object threads

objects objects objects

Advantage: the threads do not contend for a shared queue, and throughput is potentially maximized

Disadvantage: the overheads of the thread creation and destruction operations

request

Associates a thread with each connection

Associates a thread with each object

In each of these last two architectures the server benefits from lowered thread-management overheads compared with the thread-per-request architecture.

Their disadvantage is that clients may be delayed while a worker thread has several outstanding requests but another thread has no work to perform

Process & Context Switch


Stop here


Process Control Block (PCB)

Diagram of Process State

Ready Queue And Various I/O Device Queues

CPU Switch From Process to Process

User program & Kernel interface!

Note: This picture is excerpted from Write a Linux Hardware Device Driver, Andrew O’Shauqhnessy, Unix world

IPC Communications Models


Two processes & socket comm

message

agreed port any port socket socket

Internet address = 138.37.88.249 Internet address = 138.37.94.248

other ports client server

Describe detailed cost function or model based on this architecture model

Process A Process B

Cost Model •  Process A reads data

•  Process A: system call write data to a socket

•  Context Switch on Process A

•  Data transmit to Process B on 138.37.88.249 •  Process B system call read data to a socket

•  Context Switch on Process A

•  Process B got data from socket

•  Assume that there are 100,000 write/read between A & B

•  Read data from a variable cost 1 unit time

•  Context swtich 50 unit time 4/22/14 Towards survivable architecture 69

Cost Model

•  Assume that there are 100,000 write/read between A & B

•  Write/Read data from a variable cost 1 unit time

•  Context swtich 50 unit time

•  Data transmission 100 units

•  What is the estimate total cost?

•  What are possible solutions to reduce cost?



Client and server

Server

N threads

Input-output

Client

Thread 2 makes

T1

Thread 1

requests to server

generates results

Requests

Receipt & queuing


Alternative server threading architectures

a. Thread-per-request b. Thread-per-connection c. Thread-per-object

remote

workers

I/O remoteremote I/O

per-connection threads per-object threads

objects objects objects


State associated with execution environments and threads

Execution environment Thread Address space tables Saved processor registers Communication interfaces, open files Priority and execution state (such as

BLOCKED) Semaphores, other synchronization objects

Software interrupt handling information

List of thread identifiers Execution environment identifier

Pages of address space resident in memory; hardware cache entries


Java thread constructor and management methods

Thread(ThreadGroup group, Runnable target, String name) Creates a new thread in the SUSPENDED state, which will belong to group and be identified as name; the thread will execute the run() method of target.

setPriority(int newPriority), getPriority() Set and return the thread’s priority.

run() A thread executes the run() method of its target object, if it has one, and otherwise its own run() method (Thread implements Runnable).

start() Change the state of the thread from SUSPENDED to RUNNABLE.

sleep(int millisecs) Cause the thread to enter the SUSPENDED state for the specified time.

yield() Causes the thread to enter the READY state and invoke the scheduler.

destroy() Destroy the thread.


Java thread synchronization calls

thread.join(int millisecs)

Blocks the calling thread for up to the specified time until thread has terminated.

thread.interrupt() Interrupts thread: causes it to return from a blocking method call such as sleep().

object.wait(long millisecs, int nanosecs) Blocks the calling thread until a call made to notify() or notifyAll() on object wakes the thread, or the thread is interrupted, or the specified time has elapsed.

object.notify(), object.notifyAll() Wakes, respectively, one or all of any threads that have called wait() on object.


Scheduler activations

ProcessA

ProcessB

Virtual processors Kernel

Process

Kernel

P idle

P needed

P added

SA blocked

SA unblocked

SA preempted

A. Assignment of virtual processors to processes

B. Events between user-level scheduler & kernel Key: P = processor; SA = scheduler activation

The four type of event that kernel notified to the user-level scheduler

•  Virtual processor allocated –  The kernel has assigned a new virtual processor to the process, and this is

the first timeslice upon it; the scheduler can load the SA with the context of a READY thread, which can thus can thus recommence execution

•  SA blocked –  An SA has blocked in the kernel, and kernel is using a fresh SA to notify the

scheduler: the scheduler sets the state of the corresponding thread to BLOCKED and can allocate a READY thread to the notifying SA

•  SA unblocked –  An SA that was blocked in the kernel has become unblocked and is ready to

execute at user level again; the scheduler can now return the corresponding thread to READY list. In order to create the notifying SA, the another SA in the same process. In the latter case, it also communicates the preemption event to the scheduler, which can re-evaluate its allocation of threads to SAs.

•  SA preempted –  The kernel has taken away the specified SA from the process (although it

may do this to allocate a processor to a fresh SA in the same process); the scheduler places the preempted thread in the READY list and re-evaluates the thread allocation.

Invocation performance

•  Invocation performance is a critical factor in distributed system design

•  Network technologies continue to improve, but invocation times have not decreased in proportion with increases in network bandwidth


Invocations between address spaces

Control transfer viatrap instruction

User Kernel

Thread

User 1 User 2

Control transfer viaprivileged instructions

Thread 1 Thread 2

Protection domainboundary

(a) System call

(b) RPC/RMI (within one computer)

Kernel

(c) RPC/RMI (between computers)

User 1 User 2

Thread 1 Network Thread 2

Kernel 2Kernel 1




RPC delay against parameter size

1000 2000

RPC delay

Requested datasize (bytes)

Packetsize

0

Client delay against requested data size. The delay is roughly proportional to the size until the size reaches a threshold at about network packet size




The following are the main components accounting for remote invocation delay, besides network transmission times

– Marshalling – Data copying –  Packet initialization –  Thread scheduling and context switching – Waiting for acknowledgements

Marshalling and unmarshalling, which involve copying and converting data, become a significant overhead as the amount of data grows

Potentially, even after marshalling, message data is copied several times in the course of an RPC

1.  Across the user-kernel boundary, between the client or server address space and kernel buffers

2.  Across each protocol layer (for example, RPC/UDP/IP/Ethernet)

3.  Between the network interface and kernel buffers

This involves initializing protocol headers and trailers, including checksums. The cost is therefore proportional, in part, to the amount of data sent

1.  Several system calls (that is, context switches) are made during an RPC, as stubs invokes the kernel’s communication operations

2.  One or more server threads is scheduled

3.  If the operating system employs a separate network manager process, then each Send involves a context switch to one of its threads

The choice of RPC protocol may influence delay, particularly when large amounts of data are sent




A lightweight remote procedure call

•  The LRPC design is based on optimizations concerning data copying and thread scheduling.

•  Client and server are able to pass arguments and values directly via an A stack. The same stack is used by the client and server stubs

•  In LRPC, arguments are copied once: when they are marshalled onto the A stack. In an equivalent RPC, they are copied four times


A lightweight remote procedure call

1. Copy args

2. Trap to Kernel

4. Execute procedureand copy results

Client

User stub

Server

Kernel

stub

3. Upcall 5. Return (trap)

A A stack

Asynchronous operation

•  A common technique to defeat high latencies is asynchronous operation, which arises in two programming models: –  concurrent invocations –  asynchronous invocations

•  An asynchronous invocation is one that is performed asynchronously with respect to the caller. That is, it is made with a non-blocking call, which returns as soon as the invocation request message has been created and is ready for dispatch


Times for serialized and concurrent invocations

Client Server

execute request

Send

Receiveunmarshal

marshal

Receiveunmarshal

process results

marshalSend

process args

marshalSend

process args

transmission

Receiveunmarshal

process results

execute request

Send

Receiveunmarshal

marshal

marshalSend

process args

marshalSend

process args

execute request

Send

Receiveunmarshal

marshal

execute request

Send

Receiveunmarshal

marshalReceive

unmarshalprocess results

Receiveunmarshal

process resultstime

Client Server

Serialised invocations Concurrent invocations


Monolithic kernel and microkernel

Monolithic Kernel Microkernel

Server: Dynamically loaded server program:Kernel code and data:

.......

.......

Key:

S4

S1 .......

S1 S2 S3

S2 S3 S4


The role of the microkernel

Middleware

Languagesupport

subsystem

Languagesupport

subsystem

OS emulationsubsystem ....

Microkernel

Hardware

The microkernel supports middleware via subsystems

Comparison

•  The chief advantages of a microkernel-based operating system are its extensibility

•  A relatively small kernel is more likely to be free of bugs than one that is large and more complex

•  The advantage of a monolithic design is the relative efficiency with which operations can be invoked

ds lecture 3 - techbox/ds_cloud/ds_lecture3.pdf · exceptions . instructor’s guide for coulouris,...

Documents