comunicac¸ao˜ aula 5: revisao e programac¸˜ ao com sockets...

MC714: Sistemas Distribuıdos

Prof. Lucas Wanner

Instituto de Computacao, Unicamp

Comunicacao

Aula 5: Revisao e Programacao com SocketsAula 6: Troca de Mensagens e Multicast

Aula 7: Chamada de Procedimento Remoto

Revision: Threads and Distributed Systems

Improve performance

Starting a thread is typically much cheaper than starting a new process.Having a single-threaded server prohibits simple scale-up to a multiprocessor system.As with clients: hide network latency by reacting to next request while previous one is beingreplied.

Better structure

Most servers have high I/O demands. Using simple, well-understood blocking calls simplifiesthe overall structure.Multithreaded programs tend to be smaller and easier to understand due to simplified flow ofcontrol.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 2 / 52

Revision: Architecture of VMs

ObservationVirtualization can take place at very different levels, strongly depending on the interfacesas offered by various systems components:

Privileged instructions

System calls

Library functions

General instructions

Hardware

Operating system

Library

Application


Revision: Process VMs versus VM Monitors

Runtime systemRuntime system

Hardware

Operating system

Hardware

Operating systemOperating system

Operating system

Applications

Virtual machine monitor

(a) (b)

Runtime system

Application

Process VM: A program is compiled to intermediate (portable) code, which is thenexecuted by a runtime system (Example: Java VM).VM Monitor: A separate software layer mimics the instruction set of hardware⇒ acomplete operating system and its applications can be supported (Example:VMware, VirtualBox).


Revision: Servers and state

Stateless serversNever keep accurate information about the status of a client after having handled a request:

Don’t record whether a file has been opened (simply close it again after access)Don’t promise to invalidate a client’s cacheDon’t keep track of your clients

Consequences

Clients and servers are completely independentState inconsistencies due to client or server crashes are reducedPossible loss of performance because, e.g., a server cannot anticipate client behavior (thinkof prefetching file blocks)

QuestionDoes connection-oriented communication fit into a stateless design?


Revision: Servers and state

Stateful serversKeeps track of the status of its clients:

Record that a file has been opened, so that prefetching can be doneKnows which data a client has cached, and allows clients to keep local copies ofshared data

ObservationThe performance of stateful servers can be extremely high, provided clients are allowedto keep local copies.


Revision: Server clusters

Logical switch (possibly multiple)

Application/compute servers Distributed file/database

system

Client requests

Dispatched request

First tier Second tier Third tier

Crucial elementThe first tier is generally responsible for passing requests to an appropriate server.


Revision: Request Handling

ObservationHaving the first tier handle all communication from/to the cluster may lead to a bottleneck.

SolutionVarious, but one popular one is TCP-handoff

Switch Client

Server

Server

RequestRequest

(handed off)

ResponseLogically asingle TCPconnection


Revision: Code MigrationBefore execution After execution

Client Server Client Server

Client-Server

code

state

resource

code

state*

resource

Remote Evaluation

code

→ state

resource

→

code

state*

resource

Code-on-Demand state

resource

←

code code

state*

resource

←

Mobile Agents

code

state

resource

→

resource resource

→

code

state*

resource


Revisao: Exercıcios

1 Considere um servico que leva um total de 10 ms para atender um pedido desdeque os dados necessarios estejam em uma cache na memoria principal. Nos casosonde os dados nao estao na cache, uma operacao de disco que leva 90 ms enecessaria antes de completar o pedido, e durante este tempo a thread queprocessa o pedido e suspensa. Assuma que os dados estao na cache para 50% dospedidos. Quantos pedidos por segundo o servidor pode tratar se for implementadocom uma unica thread? E se o servidor usar multiplas threads?

2 Faz sentido limitar o numero de threads em um processo servidor? Argumente.3 Existem casos onde um servidor single-thread tem desempenho melhor do que um

servidor multi-thread? Argumente.4 Um servidor multi-processos tem algumas vantagens e desvantagens quando

comparado com um servidor multi-threads. De alguns exemplos.5 Um servidor que mantem uma conexao TCP/IP para um cliente e stateful ou

stateless?10 / 52

Layered Protocols

Low-level layersTransport layerApplication layerMiddleware layer


Basic networking model

Physical

Data link

Network

Transport

Session

Application

Presentation

Application protocol

Presentation protocol

Session protocol

Transport protocol

Network protocol

Data link protocol

Physical protocol

Network

1

2

3

4

5

7

6

DrawbacksFocus on message-passing onlyOften unneeded or unwanted functionalityViolates access transparency


Low-level layers

RecapPhysical layer: contains the specification and implementation of bits, and theirtransmission between sender and receiverData link layer: prescribes the transmission of a series of bits into a frame to allow forerror and flow controlNetwork layer: describes how packets in a network of computers are to be routed.

ObservationFor many distributed systems, the lowest-level interface is that of the network layer.


Transport Layer

ImportantThe transport layer provides the actual communication facilities for most distributedsystems.

Standard Internet protocolsTCP: connection-oriented, reliable, stream-oriented communicationUDP: unreliable (best-effort) datagram communication


Middleware Layer

ObservationMiddleware is invented to provide common services and protocols that can be used bymany different applications

A rich set of communication protocols(Un)marshaling of data, necessary for integrated systemsNaming protocols, to allow easy sharing of resourcesSecurity protocols for secure communicationScaling mechanisms, such as for replication and caching

NoteWhat remains are truly application-specific protocols...such as?


Types of communication

Client

Server

Synchronize after processing by server

Synchronize at request delivery

Synchronize at request submission

Request

Reply

Storage facility

Transmission interrupt

Time

Distinguish

Transient versus persistent communicationAsynchrounous versus synchronous communication



Client

Server




Request

Reply

Storage facility


Time

Transient versus persistent

Transient communication: Communication server discards message when it cannot bedelivered at the next server, or at the receiver.Persistent communication: A message is stored at a communication server as long as ittakes to deliver it.



Client

Server




Request

Reply

Storage facility


Time

Places for synchronization

At request submissionAt request deliveryAfter request processing


Client/Server

Some observationsClient/Server computing is generally based on a model of transient synchronouscommunication:

Client and server have to be active at time of communicationClient issues request and blocks until it receives replyServer essentially waits only for incoming requests, and subsequently processesthem

Drawbacks synchronous communicationClient cannot do any other work while waiting for replyFailures have to be handled immediately: the client is waitingThe model may simply not be appropriate (mail, news)


Message-Oriented Communication

Transient MessagingMessage-Queuing SystemMessage BrokersExample: System-V and POSIX Message QueuesExample: RabbitMQExample: ØMQ


Revision: sockets

Berkeley socket interface

SOCKET Create a new communication endpointBIND Attach a local address to a socketLISTEN Announce willingness to accept N connectionsACCEPT Block until request to establish a connectionCONNECT Attempt to establish a connectionSEND Send data over a connectionRECEIVE Receive data over a connectionCLOSE Release the connection


Revision: sockets

connect

socket

socket

bind listen read

read

write

write

accept close

close

Server

Client

Synchronization point Communication


Sockets: Python code

Serverimport socketHOST = ’’PORT = SERVERPORTs = socket.socket(socket.AF INET, socket.SOCK STREAM)s.bind((HOST, PORT))s.listen(N) # listen to max N queued connectionconn, addr = s.accept() # returns new socket + addr clientwhile 1: # forever

data = conn.recv(1024)if not data: breakconn.send(data)

conn.close()

Clientimport socketHOST = ’distsys.cs.vu.nl’PORT = SERVERPORTs = socket.socket(socket.AF INET, socket.SOCK STREAM)s.connect((HOST, PORT))s.send(’Hello, world’)data = s.recv(1024)s.close()

23 / 52

Messaging

Message-oriented middlewareAims at high-level persistent asynchronous communication:

Processes send each other messages, which are queuedSender need not wait for immediate reply, but can do other thingsMiddleware often ensures fault tolerance


Message-oriented middleware

EssenceAsynchronous persistent communication through support of middleware-level queues.Queues correspond to buffers at communication servers.

PUT Append a message to a specified queueGET Block until the specified queue is nonempty, and re-

move the first messagePOLL Check a specified queue for messages, and remove

the first. Never blockNOTIFY Install a handler to be called when a message is put

into the specified queue


Message broker

ObservationMessage queuing systems assume a common messaging protocol: all applications agreeon message format (i.e., structure and data representation)

Message brokerCentralized component that takes care of application heterogeneity in an MQ system:

Transforms incoming messages to target formatVery often acts as an application gatewayMay provide subject-based routing capabilities⇒ Enterprise Application Integration


Message broker

Queuinglayer

Brokerprogram

Repository with conversion rules and programsSource client Destination client

OS OSOS

Message broker

Network


Example: System-V and POSIX Message Queues

ObjectiveLocal inter-process communication through message exchanges.

Persistent, asynchronous communicationCreate queue with a key or nameSent messages are stored in the kernelAsynchronous receiveSystem-V: msgget, msgrcv, msgsndPOSIX: mq open, mq send, mq receive, ...

28 / 52

Example: ØMQ

ObjectiveØMQ is a networking library.

FeaturesAsynchronous I/OPublish/Subscribe (fanout) – N-to-N communicationTopic-based exchange (filtering)High performanceMultiple language bindings

29 / 52

Example: RabbitMQ

ObjectiveRabbitMQ is a message broker. It accepts and forwards messages.

Persistent, asynchronous communicationSender

Create a communication channel and declare a message queuePublish data to the queue

Receiver

Create a communication channel and declare a message queueDefine a callback function to handle incoming informationStart consuming data from the channel

30 / 52

Example: RabbitMQ

FeaturesRound-robin dispatchingDurable (persistent) messagesPublish/Subscribe (fanout)Topic-based exchange (filtering)RPC InterfaceMultiple language bindings

31 / 52

Multicast communication

Application-level multicastingGossip-based data dissemination


Application-level multicasting

EssenceOrganize nodes of a distributed system into an overlay network and use that network todisseminate data.

Chord-based tree building1 Initiator generates a multicast identifier mid.2 Lookup succ(mid), the node responsible for mid.3 Request is routed to succ(mid), which will become the root.4 If P wants to join, it sends a join request to the root.5 When request arrives at Q:

Q has not seen a join request before⇒ it becomes forwarder; P becomes child of Q.Join request continues to be forwarded.Q knows about tree⇒ P becomes child of Q. No need to forward join request anymore.


ALM: Some costs

A

BD

C

Ra

RbRd

Rc

Internet

RouterEnd host

Overlay network

75

1

1

1

1

30

40

Re 20

Link stress: How often does an ALM message cross the same physical link?Example: message from A to D needs to cross 〈Ra,Rb〉 twice.Stretch: Ratio in delay between ALM-level path and network-level path. Example:messages B to C follow path of length 71 at ALM, but 47 at network level⇒ stretch =71/47.


Epidemic Algorithms

General backgroundUpdate modelsRemoving objects


Principles

Basic ideaAssume there are no write–write conflicts:

Update operations are performed at a single serverA replica passes updated state to only a few neighborsUpdate propagation is lazy, i.e., not immediateEventually, each update should reach every replica

Two forms of epidemics

Anti-entropy: Each replica regularly chooses another replica at random, and exchanges statedifferences, leading to identical states at both afterwardsGossiping: A replica which has just been updated (i.e., has been contaminated), tells anumber of other replicas about its update (contaminating them as well).


Anti-entropy

Principle operationsA node P selects another node Q from the system at random.Push: P only sends its updates to QPull: P only retrieves updates from QPush-Pull: P and Q exchange mutual updates (after which they hold the sameinformation).

ObservationFor push-pull it takes O(log(N)) rounds to disseminate updates to all N nodes (round =when every node as taken the initiative to start an exchange).


Gossiping

Basic modelA server S having an update to report, contacts other servers. If a server is contacted towhich the update has already propagated, S stops contacting other servers withprobability 1/k .

ObservationIf s is the fraction of ignorant servers (i.e., which are unaware of the update), it can beshown that with many servers

s = e−(k+1)(1−s)

NoteIf we really have to ensure that all servers are eventually updated, gossiping alone is notenough


Deleting values

Fundamental problemWe cannot remove an old value from a server and expect the removal to propagate.Instead, mere removal will be undone in due time using epidemic algorithms

SolutionRemoval has to be registered as a special update by inserting a death certificate


Deleting values

Next problem

When to remove a death certificate (it is not allowed to stay for ever):

Run a global algorithm to detect whether the removal is known everywhere, and then collectthe death certificates (looks like garbage collection)Assume death certificates propagate in finite time, and associate a maximum lifetime for acertificate (can be done at risk of not reaching all servers)

NoteIt is necessary that a removal actually reaches all servers.


Example applications

Typical appsData dissemination: Perhaps the most important one. Note that there are manyvariants of dissemination.Aggregation: Let every node i maintain a variable xi . When two nodes gossip, theyeach reset their variable to

xi ,xj ← (xi + xj)/2

Result: in the end each node will have computed the average x = ∑i xi/N.

QuestionWhat happens if initially xi = 1 and xj = 0, j 6= i?


Remote Procedure Call (RPC)

Basic RPC operationParameter passingVariations


Basic RPC operation

ObservationsApplication developers are familiar with simple procedure modelWell-engineered procedures operate in isolation (black box)There is no fundamental reason not to execute procedures on separate machine

ConclusionCommunication between caller & callee canbe hidden by using procedure-callmechanism.

Call local procedureand return results

Call remoteprocedure

Returnfrom call

Client

Request Reply

ServerTime

Wait for result


Basic RPC operation

Implementationof add

Client OS Server OS

Client machine Server machine

Client stub

Client process Server process1. Client call to

procedure

2. Stub buildsmessage

5. Stub unpacksmessage

6. Stub makeslocal call to "add"

3. Message is sentacross the network

4. Server OShands messageto server stub

Server stubk = add(i,j) k = add(i,j)

proc: "add"int: val(i)int: val(j)



1 Client procedure calls client stub.2 Stub builds message; calls local OS.3 OS sends message to remote OS.4 Remote OS gives message to stub.5 Stub unpacks parameters and calls server.

6 Server makes local call and returns result tostub.

7 Stub builds message; calls OS.8 OS sends message to client’s OS.9 Client’s OS gives message to stub.

10 Client stub unpacks result and returns to theclient.


RPC: Parameter passing

Parameter marshalingThere’s more than just wrapping parameters into a message:

Client and server machines may have different data representations (think of byteordering)Wrapping a parameter means transforming a value into a sequence of bytesClient and server have to agree on the same encoding:

How are basic data values represented (integers, floats, characters)How are complex data values represented (arrays, unions)

Client and server need to properly interpret messages, transforming them intomachine-dependent representations.


RPC: Parameter passing

RPC parameter passing: some assumptions

Copy in/copy out semantics: while procedure is executed, nothing can be assumed aboutparameter values.All data that is to be operated on is passed by parameters. Excludes passing references to(global) data.

ConclusionFull access transparency cannot be realized.

ObservationA remote reference mechanism enhances access transparency:

Remote reference offers unified access to remote dataRemote references can be passed as parameter in RPCs


Asynchronous RPCs

EssenceTry to get rid of the strict request-reply behavior, but let the client continue without waitingfor an answer from the server.

Call local procedure


Returnfrom call

Request Accept request

Wait for acceptance

Call local procedureand return results


Returnfrom call

Client Client

Request Reply

Server ServerTime Time

Wait for result

(a) (b)


Deferred synchronous RPCs

Call local procedure


Returnfrom call

Client

RequestAcceptrequest

ServerTime

Wait foracceptance

Interrupt client

Returnresults Acknowledge

Call client withone-way RPC

VariationClient can also do a (non)blocking poll at the server to see whether results are available.


RPC in practice

C compiler

Uuidgen

IDL compiler

C compiler C compiler

Linker Linker

C compiler

Server stubobject file

Serverobject file

Runtimelibrary

Serverbinary

Clientbinary

Runtimelibrary

Client stubobject file

Clientobject file

Client stubClient code Header Server stub

Interfacedefinition file

Server code

#include#include


Client-to-server binding (DCE)

Issues(1) Client must locate server machine, and (2) locate the server.

Endpointtable

Server

DCEdaemon

Client1. Register endpoint

2. Register service3. Look up server

4. Ask for endpoint

5. Do RPC

Directoryserver

Server machineClient machine

Directory machine


Exercıcios

1 Descreva o processo de conexao entre cliente e servidor com sockets TCP/IP.2 Diferencie comunicacao sıncrona e assıncrona, persistente e transiente. De

exemplos de cada combinacao.3 Descreva um problema de escalabilidade com comunicacao sıncrona transiente.4 Qual e o papel de um broker na comunicacao orientada a mensagens?5 Na Figura 4.31, qual e o fator de stretch da rede de overlay na rota A→C?6 Explique o princıpio de anti-entropia usado em protocolos epidemicos.7 Descreva o problema de remocao de dados em protocolos epidemicos e apresente

uma solucao.8 Descreva um algoritmo epidemico que calcule o tamanho de uma rede.

51 / 52

Exercıcios

8 Descreva o funcionamento e implementacao de RPC.9 Considere um procedimento incr com dois parametros inteiros. O procedimento

adiciona um em cada parametro. Suponha que este procedimento foi chamado umavez com a mesma variavel nos dois parametros, por exemplo incr(i,i). Se iinicialmente e 0, qual sera o seu valor se incr for chamada por referencia? E se incrusar copia e restauracao?

10 Uma union em C permite que um campo de uma estrutura guarde uma de variasalternativas possıveis. Em tempo de execucao, nao ha como diretamente saber qualtipo a union guarda em algum momento. Unions apresentam alguma dificuldadepara implementacao de RPC?

11 O que e implementado em uma biblioteca de runtime para RPC?

52 / 52

comunicac¸ao˜ aula 5: revisao e programac¸˜ ao com sockets...

Documents