comunicac¸ao˜ aula 5: revisao e programac¸˜ ao com sockets...
TRANSCRIPT
MC714: Sistemas Distribuıdos
Prof. Lucas Wanner
Instituto de Computacao, Unicamp
Comunicacao
Aula 5: Revisao e Programacao com SocketsAula 6: Troca de Mensagens e Multicast
Aula 7: Chamada de Procedimento Remoto
Revision: Threads and Distributed Systems
Improve performance
Starting a thread is typically much cheaper than starting a new process.Having a single-threaded server prohibits simple scale-up to a multiprocessor system.As with clients: hide network latency by reacting to next request while previous one is beingreplied.
Better structure
Most servers have high I/O demands. Using simple, well-understood blocking calls simplifiesthe overall structure.Multithreaded programs tend to be smaller and easier to understand due to simplified flow ofcontrol.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 2 / 52
Revision: Architecture of VMs
ObservationVirtualization can take place at very different levels, strongly depending on the interfacesas offered by various systems components:
Privileged instructions
System calls
Library functions
General instructions
Hardware
Operating system
Library
Application
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 3 / 52
Revision: Process VMs versus VM Monitors
Runtime systemRuntime system
Hardware
Operating system
Hardware
Operating systemOperating system
Operating system
Applications
Virtual machine monitor
(a) (b)
Runtime system
Application
Process VM: A program is compiled to intermediate (portable) code, which is thenexecuted by a runtime system (Example: Java VM).VM Monitor: A separate software layer mimics the instruction set of hardware⇒ acomplete operating system and its applications can be supported (Example:VMware, VirtualBox).
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 4 / 52
Revision: Servers and state
Stateless serversNever keep accurate information about the status of a client after having handled a request:
Don’t record whether a file has been opened (simply close it again after access)Don’t promise to invalidate a client’s cacheDon’t keep track of your clients
Consequences
Clients and servers are completely independentState inconsistencies due to client or server crashes are reducedPossible loss of performance because, e.g., a server cannot anticipate client behavior (thinkof prefetching file blocks)
QuestionDoes connection-oriented communication fit into a stateless design?
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 5 / 52
Revision: Servers and state
Stateful serversKeeps track of the status of its clients:
Record that a file has been opened, so that prefetching can be doneKnows which data a client has cached, and allows clients to keep local copies ofshared data
ObservationThe performance of stateful servers can be extremely high, provided clients are allowedto keep local copies.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 6 / 52
Revision: Server clusters
Logical switch (possibly multiple)
Application/compute servers Distributed file/database
system
Client requests
Dispatched request
First tier Second tier Third tier
Crucial elementThe first tier is generally responsible for passing requests to an appropriate server.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 7 / 52
Revision: Request Handling
ObservationHaving the first tier handle all communication from/to the cluster may lead to a bottleneck.
SolutionVarious, but one popular one is TCP-handoff
Switch Client
Server
Server
RequestRequest
(handed off)
ResponseLogically asingle TCPconnection
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 8 / 52
Revision: Code MigrationBefore execution After execution
Client Server Client Server
Client-Server
code
state
resource
code
state*
resource
Remote Evaluation
code
→ state
resource
→
code
state*
resource
Code-on-Demand state
resource
←
code code
state*
resource
←
Mobile Agents
code
state
resource
→
resource resource
→
code
state*
resource
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 9 / 52
Revisao: Exercıcios
1 Considere um servico que leva um total de 10 ms para atender um pedido desdeque os dados necessarios estejam em uma cache na memoria principal. Nos casosonde os dados nao estao na cache, uma operacao de disco que leva 90 ms enecessaria antes de completar o pedido, e durante este tempo a thread queprocessa o pedido e suspensa. Assuma que os dados estao na cache para 50% dospedidos. Quantos pedidos por segundo o servidor pode tratar se for implementadocom uma unica thread? E se o servidor usar multiplas threads?
2 Faz sentido limitar o numero de threads em um processo servidor? Argumente.3 Existem casos onde um servidor single-thread tem desempenho melhor do que um
servidor multi-thread? Argumente.4 Um servidor multi-processos tem algumas vantagens e desvantagens quando
comparado com um servidor multi-threads. De alguns exemplos.5 Um servidor que mantem uma conexao TCP/IP para um cliente e stateful ou
stateless?10 / 52
Layered Protocols
Low-level layersTransport layerApplication layerMiddleware layer
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 11 / 52
Basic networking model
Physical
Data link
Network
Transport
Session
Application
Presentation
Application protocol
Presentation protocol
Session protocol
Transport protocol
Network protocol
Data link protocol
Physical protocol
Network
1
2
3
4
5
7
6
DrawbacksFocus on message-passing onlyOften unneeded or unwanted functionalityViolates access transparency
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 12 / 52
Low-level layers
RecapPhysical layer: contains the specification and implementation of bits, and theirtransmission between sender and receiverData link layer: prescribes the transmission of a series of bits into a frame to allow forerror and flow controlNetwork layer: describes how packets in a network of computers are to be routed.
ObservationFor many distributed systems, the lowest-level interface is that of the network layer.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 13 / 52
Transport Layer
ImportantThe transport layer provides the actual communication facilities for most distributedsystems.
Standard Internet protocolsTCP: connection-oriented, reliable, stream-oriented communicationUDP: unreliable (best-effort) datagram communication
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 14 / 52
Middleware Layer
ObservationMiddleware is invented to provide common services and protocols that can be used bymany different applications
A rich set of communication protocols(Un)marshaling of data, necessary for integrated systemsNaming protocols, to allow easy sharing of resourcesSecurity protocols for secure communicationScaling mechanisms, such as for replication and caching
NoteWhat remains are truly application-specific protocols...such as?
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 15 / 52
Types of communication
Client
Server
Synchronize after processing by server
Synchronize at request delivery
Synchronize at request submission
Request
Reply
Storage facility
Transmission interrupt
Time
Distinguish
Transient versus persistent communicationAsynchrounous versus synchronous communication
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 16 / 52
Types of communication
Client
Server
Synchronize after processing by server
Synchronize at request delivery
Synchronize at request submission
Request
Reply
Storage facility
Transmission interrupt
Time
Transient versus persistent
Transient communication: Communication server discards message when it cannot bedelivered at the next server, or at the receiver.Persistent communication: A message is stored at a communication server as long as ittakes to deliver it.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 17 / 52
Types of communication
Client
Server
Synchronize after processing by server
Synchronize at request delivery
Synchronize at request submission
Request
Reply
Storage facility
Transmission interrupt
Time
Places for synchronization
At request submissionAt request deliveryAfter request processing
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 18 / 52
Client/Server
Some observationsClient/Server computing is generally based on a model of transient synchronouscommunication:
Client and server have to be active at time of communicationClient issues request and blocks until it receives replyServer essentially waits only for incoming requests, and subsequently processesthem
Drawbacks synchronous communicationClient cannot do any other work while waiting for replyFailures have to be handled immediately: the client is waitingThe model may simply not be appropriate (mail, news)
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 19 / 52
Message-Oriented Communication
Transient MessagingMessage-Queuing SystemMessage BrokersExample: System-V and POSIX Message QueuesExample: RabbitMQExample: ØMQ
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 20 / 52
Revision: sockets
Berkeley socket interface
SOCKET Create a new communication endpointBIND Attach a local address to a socketLISTEN Announce willingness to accept N connectionsACCEPT Block until request to establish a connectionCONNECT Attempt to establish a connectionSEND Send data over a connectionRECEIVE Receive data over a connectionCLOSE Release the connection
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 21 / 52
Revision: sockets
connect
socket
socket
bind listen read
read
write
write
accept close
close
Server
Client
Synchronization point Communication
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 22 / 52
Sockets: Python code
Serverimport socketHOST = ’’PORT = SERVERPORTs = socket.socket(socket.AF INET, socket.SOCK STREAM)s.bind((HOST, PORT))s.listen(N) # listen to max N queued connectionconn, addr = s.accept() # returns new socket + addr clientwhile 1: # forever
data = conn.recv(1024)if not data: breakconn.send(data)
conn.close()
Clientimport socketHOST = ’distsys.cs.vu.nl’PORT = SERVERPORTs = socket.socket(socket.AF INET, socket.SOCK STREAM)s.connect((HOST, PORT))s.send(’Hello, world’)data = s.recv(1024)s.close()
23 / 52
Messaging
Message-oriented middlewareAims at high-level persistent asynchronous communication:
Processes send each other messages, which are queuedSender need not wait for immediate reply, but can do other thingsMiddleware often ensures fault tolerance
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 24 / 52
Message-oriented middleware
EssenceAsynchronous persistent communication through support of middleware-level queues.Queues correspond to buffers at communication servers.
PUT Append a message to a specified queueGET Block until the specified queue is nonempty, and re-
move the first messagePOLL Check a specified queue for messages, and remove
the first. Never blockNOTIFY Install a handler to be called when a message is put
into the specified queue
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 25 / 52
Message broker
ObservationMessage queuing systems assume a common messaging protocol: all applications agreeon message format (i.e., structure and data representation)
Message brokerCentralized component that takes care of application heterogeneity in an MQ system:
Transforms incoming messages to target formatVery often acts as an application gatewayMay provide subject-based routing capabilities⇒ Enterprise Application Integration
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 26 / 52
Message broker
Queuinglayer
Brokerprogram
Repository with conversion rules and programsSource client Destination client
OS OSOS
Message broker
Network
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 27 / 52
Example: System-V and POSIX Message Queues
ObjectiveLocal inter-process communication through message exchanges.
Persistent, asynchronous communicationCreate queue with a key or nameSent messages are stored in the kernelAsynchronous receiveSystem-V: msgget, msgrcv, msgsndPOSIX: mq open, mq send, mq receive, ...
28 / 52
Example: ØMQ
ObjectiveØMQ is a networking library.
FeaturesAsynchronous I/OPublish/Subscribe (fanout) – N-to-N communicationTopic-based exchange (filtering)High performanceMultiple language bindings
29 / 52
Example: RabbitMQ
ObjectiveRabbitMQ is a message broker. It accepts and forwards messages.
Persistent, asynchronous communicationSender
Create a communication channel and declare a message queuePublish data to the queue
Receiver
Create a communication channel and declare a message queueDefine a callback function to handle incoming informationStart consuming data from the channel
30 / 52
Example: RabbitMQ
FeaturesRound-robin dispatchingDurable (persistent) messagesPublish/Subscribe (fanout)Topic-based exchange (filtering)RPC InterfaceMultiple language bindings
31 / 52
Multicast communication
Application-level multicastingGossip-based data dissemination
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 32 / 52
Application-level multicasting
EssenceOrganize nodes of a distributed system into an overlay network and use that network todisseminate data.
Chord-based tree building1 Initiator generates a multicast identifier mid.2 Lookup succ(mid), the node responsible for mid.3 Request is routed to succ(mid), which will become the root.4 If P wants to join, it sends a join request to the root.5 When request arrives at Q:
Q has not seen a join request before⇒ it becomes forwarder; P becomes child of Q.Join request continues to be forwarded.Q knows about tree⇒ P becomes child of Q. No need to forward join request anymore.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 33 / 52
ALM: Some costs
A
BD
C
Ra
RbRd
Rc
Internet
RouterEnd host
Overlay network
75
1
1
1
1
30
40
Re 20
Link stress: How often does an ALM message cross the same physical link?Example: message from A to D needs to cross 〈Ra,Rb〉 twice.Stretch: Ratio in delay between ALM-level path and network-level path. Example:messages B to C follow path of length 71 at ALM, but 47 at network level⇒ stretch =71/47.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 34 / 52
Epidemic Algorithms
General backgroundUpdate modelsRemoving objects
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 35 / 52
Principles
Basic ideaAssume there are no write–write conflicts:
Update operations are performed at a single serverA replica passes updated state to only a few neighborsUpdate propagation is lazy, i.e., not immediateEventually, each update should reach every replica
Two forms of epidemics
Anti-entropy: Each replica regularly chooses another replica at random, and exchanges statedifferences, leading to identical states at both afterwardsGossiping: A replica which has just been updated (i.e., has been contaminated), tells anumber of other replicas about its update (contaminating them as well).
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 36 / 52
Anti-entropy
Principle operationsA node P selects another node Q from the system at random.Push: P only sends its updates to QPull: P only retrieves updates from QPush-Pull: P and Q exchange mutual updates (after which they hold the sameinformation).
ObservationFor push-pull it takes O(log(N)) rounds to disseminate updates to all N nodes (round =when every node as taken the initiative to start an exchange).
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 37 / 52
Gossiping
Basic modelA server S having an update to report, contacts other servers. If a server is contacted towhich the update has already propagated, S stops contacting other servers withprobability 1/k .
ObservationIf s is the fraction of ignorant servers (i.e., which are unaware of the update), it can beshown that with many servers
s = e−(k+1)(1−s)
NoteIf we really have to ensure that all servers are eventually updated, gossiping alone is notenough
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 38 / 52
Deleting values
Fundamental problemWe cannot remove an old value from a server and expect the removal to propagate.Instead, mere removal will be undone in due time using epidemic algorithms
SolutionRemoval has to be registered as a special update by inserting a death certificate
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 39 / 52
Deleting values
Next problem
When to remove a death certificate (it is not allowed to stay for ever):
Run a global algorithm to detect whether the removal is known everywhere, and then collectthe death certificates (looks like garbage collection)Assume death certificates propagate in finite time, and associate a maximum lifetime for acertificate (can be done at risk of not reaching all servers)
NoteIt is necessary that a removal actually reaches all servers.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 40 / 52
Example applications
Typical appsData dissemination: Perhaps the most important one. Note that there are manyvariants of dissemination.Aggregation: Let every node i maintain a variable xi . When two nodes gossip, theyeach reset their variable to
xi ,xj ← (xi + xj)/2
Result: in the end each node will have computed the average x = ∑i xi/N.
QuestionWhat happens if initially xi = 1 and xj = 0, j 6= i?
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 41 / 52
Remote Procedure Call (RPC)
Basic RPC operationParameter passingVariations
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 42 / 52
Basic RPC operation
ObservationsApplication developers are familiar with simple procedure modelWell-engineered procedures operate in isolation (black box)There is no fundamental reason not to execute procedures on separate machine
ConclusionCommunication between caller & callee canbe hidden by using procedure-callmechanism.
Call local procedureand return results
Call remoteprocedure
Returnfrom call
Client
Request Reply
ServerTime
Wait for result
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 43 / 52
Basic RPC operation
Implementationof add
Client OS Server OS
Client machine Server machine
Client stub
Client process Server process1. Client call to
procedure
2. Stub buildsmessage
5. Stub unpacksmessage
6. Stub makeslocal call to "add"
3. Message is sentacross the network
4. Server OShands messageto server stub
Server stubk = add(i,j) k = add(i,j)
proc: "add"int: val(i)int: val(j)
proc: "add"int: val(i)int: val(j)
proc: "add"int: val(i)int: val(j)
1 Client procedure calls client stub.2 Stub builds message; calls local OS.3 OS sends message to remote OS.4 Remote OS gives message to stub.5 Stub unpacks parameters and calls server.
6 Server makes local call and returns result tostub.
7 Stub builds message; calls OS.8 OS sends message to client’s OS.9 Client’s OS gives message to stub.
10 Client stub unpacks result and returns to theclient.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 44 / 52
RPC: Parameter passing
Parameter marshalingThere’s more than just wrapping parameters into a message:
Client and server machines may have different data representations (think of byteordering)Wrapping a parameter means transforming a value into a sequence of bytesClient and server have to agree on the same encoding:
How are basic data values represented (integers, floats, characters)How are complex data values represented (arrays, unions)
Client and server need to properly interpret messages, transforming them intomachine-dependent representations.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 45 / 52
RPC: Parameter passing
RPC parameter passing: some assumptions
Copy in/copy out semantics: while procedure is executed, nothing can be assumed aboutparameter values.All data that is to be operated on is passed by parameters. Excludes passing references to(global) data.
ConclusionFull access transparency cannot be realized.
ObservationA remote reference mechanism enhances access transparency:
Remote reference offers unified access to remote dataRemote references can be passed as parameter in RPCs
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 46 / 52
Asynchronous RPCs
EssenceTry to get rid of the strict request-reply behavior, but let the client continue without waitingfor an answer from the server.
Call local procedure
Call remoteprocedure
Returnfrom call
Request Accept request
Wait for acceptance
Call local procedureand return results
Call remoteprocedure
Returnfrom call
Client Client
Request Reply
Server ServerTime Time
Wait for result
(a) (b)
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 47 / 52
Deferred synchronous RPCs
Call local procedure
Call remoteprocedure
Returnfrom call
Client
RequestAcceptrequest
ServerTime
Wait foracceptance
Interrupt client
Returnresults Acknowledge
Call client withone-way RPC
VariationClient can also do a (non)blocking poll at the server to see whether results are available.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 48 / 52
RPC in practice
C compiler
Uuidgen
IDL compiler
C compiler C compiler
Linker Linker
C compiler
Server stubobject file
Serverobject file
Runtimelibrary
Serverbinary
Clientbinary
Runtimelibrary
Client stubobject file
Clientobject file
Client stubClient code Header Server stub
Interfacedefinition file
Server code
#include#include
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 49 / 52
Client-to-server binding (DCE)
Issues(1) Client must locate server machine, and (2) locate the server.
Endpointtable
Server
DCEdaemon
Client1. Register endpoint
2. Register service3. Look up server
4. Ask for endpoint
5. Do RPC
Directoryserver
Server machineClient machine
Directory machine
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 50 / 52
Exercıcios
1 Descreva o processo de conexao entre cliente e servidor com sockets TCP/IP.2 Diferencie comunicacao sıncrona e assıncrona, persistente e transiente. De
exemplos de cada combinacao.3 Descreva um problema de escalabilidade com comunicacao sıncrona transiente.4 Qual e o papel de um broker na comunicacao orientada a mensagens?5 Na Figura 4.31, qual e o fator de stretch da rede de overlay na rota A→C?6 Explique o princıpio de anti-entropia usado em protocolos epidemicos.7 Descreva o problema de remocao de dados em protocolos epidemicos e apresente
uma solucao.8 Descreva um algoritmo epidemico que calcule o tamanho de uma rede.
51 / 52
Exercıcios
8 Descreva o funcionamento e implementacao de RPC.9 Considere um procedimento incr com dois parametros inteiros. O procedimento
adiciona um em cada parametro. Suponha que este procedimento foi chamado umavez com a mesma variavel nos dois parametros, por exemplo incr(i,i). Se iinicialmente e 0, qual sera o seu valor se incr for chamada por referencia? E se incrusar copia e restauracao?
10 Uma union em C permite que um campo de uma estrutura guarde uma de variasalternativas possıveis. Em tempo de execucao, nao ha como diretamente saber qualtipo a union guarda em algum momento. Unions apresentam alguma dificuldadepara implementacao de RPC?
11 O que e implementado em uma biblioteca de runtime para RPC?
52 / 52