mc714: sistemas distribuídoslucas/teaching/mc714/... · mc714 – sistemas distrubu´ıdos ementa...

MC714: Sistemas Distribuıdos

Prof. Lucas Wanner

Instituto de Computacao, Unicamp

Aula 1: Introducao e Fundamentos

MC714 – Sistemas Distrubuıdos

ProfessorLucas Wanner – [email protected]

HorarioTercas 21:00-23:00, Sala CB 06Quintas 19:00-21:00, Sala CB 05

Websitehttp://www.lucaswanner.com/sd

Lista de Emailshttps://groups.google.com/d/forum/sd-2016-2Todos os alunos matriculados foram adicionados a lista com seus emails da DAC. Soliciteingresso na lista caso nao tenha recebido notificacao.

2 / 43

MC714 – Sistemas Distrubuıdos

Ementa• Sistemas Distribuıdos • Comunicacao entre processos • Sistemas de arquivos• Servicos de nomes • Coordenacao • Replicacao • Seguranca

BibliografiaTexto principal: A. S. Tanenbaum and M. Van Steen. Distributed Systems:Principles and Paradigms. Second edition, Pearson, 2006.Link para download na pagina do curso.Coulouris, J. Dollimore, T. Kindberg, and G. Blair. Distributed Systems: Conceptsand Design. Fifth Edition, Addison-Wesley, 2011.A.D. Kshemkalyani, M. Singhal, Distributed Computing: Principles, Algorithms, andSystems. Paperback edition, Cambridge University Press, 2011.

3 / 43

Programa: Primeira Parte

Topico CapıtuloIntroducao e Fundamentos 1

Arquituras de sistemas distribuıdos 2Processos e Threads Revisao, 3

Clientes/Servidores, Virtualizacao e Nuvem 3Comunicacao: Revisao, Sockets Revisao, 4Troca de Mensagens, Multicast 4Disseminacao de informacao 4

Remote Procedure Call 4Nomeacao 5

Sincronizacao de relogio 6Relogios Logicos 6Exclusao mutua 6Eleicao de lıder 6

4 / 43

Programa: Segunda Parte

Topico CapıtuloConsistencia: Fundamentos, Modelos 7

Replicacao: Gerencia, Distribuicao de conteudo 7Tolerancia a falhas: Fundamentos, Comunicacao confiavel 8

Commit distribuıdo 8Recuperacao, Checkpointing 8

Arquivos: Arquitetura, Comunicacao, Sincronizacao 11Arquivos: Consistencia e replicacao, Tolerancia a falhas 11Peer-to-Peer: Introducao, Distributed Hash Table (DHT) Coulouris 10

Peer-to-Peer: Chrod, Kademlia, BitTorrent Singhal 18Web: Arquitetura, Comunicacao, HTTP, SOAP, Caching 12

Seguranca em sistemas distribuıdos 9

5 / 43

Avaliacao

ComponentesProvas: (P)Serao aplicadas duas provas teoricas, P1 e P2.Seminarios: (S)Seminarios serao apresentados em sala de aula. Os grupos, datas, e topicos paraapresentacao serao definidos durante o semestre.Testes: (T )Serao aplicados uma serie de pequenos testes e exercıcios de implementacao. Anota dos testes T sera a media aritmetica entre os testes aplicados.

Polıtica de atrasoCada dia em atraso implicara em um desconto de 2.5/10 pontos para cada entregavel.

6 / 43

Avaliacao

MediaA media M da disciplina sera calculada como:

M = P1×0.3+P2×0.4+T ×0.2+S×0.1

ExameAlunos com media 2.5≤M < 5 poderao fazer um exame final (E).

Nota finalA nota final F sera calculada como:

F =

{min {5, M+E

2 } caso 2.5≤M < 5 e o aluno tenha realizado o exame.M caso contrario.

7 / 43

Avaliacao

Datas ImportantesP1: 06/10/2016P2: 06/12/2016Exame: 20/12/2017

8 / 43

Integridade Academica

Polıtica de tolerancia zeroToda e qualquer violacao de integridade academica sera punida ate o limite daautoridade do professor, incluindo mas nao limitado a nota zero na media final do cursopara todos os envolvidos.

Exemplos (nao exaustivos) de violacoes

Cola e plagioCompartilhamento de solucoes e codigo (e.g., “dar uma olhada” no codigo)Falsificacao de dados e resultados

Nao violacoesGrupos de estudoDiscussao de estrategias de implementacao, excluindo detalhes de codigo

9 / 43

Avaliacao

Como ir bem no curso (em ordem de importancia)1 Resolver os exercıcios de cada aula.2 Ler os capıtulos do livro antes da aula correspondente.3 Entregar solucoes para testes dentro do prazo.4 Fazer uma boa apresentacao no seminario.5 Assistir as aulas.

10 / 43

Estilo das Aulas

1 Revisao breve da aula anterior.2 Discussao dos exercıcos da aula anterior.3 Apresentacao das perguntas para a aula.4 Conteudo.5 (em algumas aulas) Testes.

Participacao

Participacao sera ativamente encorajada na discussao, revisao, e apresentacao doconteudo.

11 / 43

Programa

Topico CapıtuloIntroducao e Fundamentos 1

Arquituras de sistemas distribuıdos 2Processos e Threads Revisao, 3

Clientes/Servidores, Virtualizacao e Nuvem 3Comunicacao: Revisao, Sockets Revisao, 4Troca de Mensagens, Multicast 4Disseminacao de informacao 4

Remote Procedure Call 4Nomeacao 5

Sincronizacao de relogio 6Relogios Logicos 6Exclusao mutua 6Eleicao de lıder 6

12 / 43

Exercıcios

1 Defina e compare sistemas distribuıdos e sistemas paralelos.2 Qual e o papel de um middleware em sistemas distribuıdos?3 De exemplos e defina diferentes tipos de transparencia de distribuicao.4 Qual e a diferenca entre transparencia de migracao e transparencia de relocacao?5 Defina escalabilidade. Quais tecnicas sao usadas para atingir escalabilidade?6 Qual e a diferenca entre replicacao e caching?7 A visao tradicional de transacoes diz que quando uma transacao e abortada, e como

se a transacao nunca tivesse acontecido. De um exemplo onde isto nao e verdade.8 Qual e o papel de um coordenador de transacoes?

13 / 43

Distributed System: Definition

A distributed system is a piece of software that ensures that:

a collection of independent computers appears to its users as a single coherentsystem

Two aspects: (1) independent computers and(2) single system⇒ middleware.

Local OS 1 Local OS 2 Local OS 3 Local OS 4

Appl. A Application B Appl. C

Computer 1 Computer 2 Computer 4Computer 3

Network

Distributed system layer (middleware)

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 14 / 43

Distributed System: Alternative Definition

You know you have [a distributed system] when thecrash of a computer you’ve never heard of stops youfrom getting any work done.-Leslie Lamport

15 / 43

Goals of Distributed Systems

Making resources availableDistribution transparencyOpennessScalability


Distribution Transparency

Transp. Description

Access Hide differences in data representation and invocationmechanisms

Location Hide where an object is locatedRelocation Hide that an object may be moved to another location

while in useMigration Hide that an object may move to another locationReplication Hide that an object is replicatedConcurrency Hide that an object may be shared by several

independent usersFailure Hide failure and possible recovery of an object

NoteDistribution transparency is a nice a goal, but achieving it is a different story.


Degree of Transparency

ObservationAiming at full distribution transparency may be too much:

Users may be located in different continentsCompletely hiding failures of networks and nodes is (theoretically and practically) impossible

You cannot distinguish a slow computer from a failing oneYou can never be sure that a server actually performed an operation before a crash

Full transparency will cost performance, exposing distribution of the system

Keeping Web caches exactly up-to-date with the masterImmediately flushing write operations to disk for fault tolerance


Openness of Distributed Systems

Open distributed systemBe able to interact with services from other open systems, irrespective of the underlyingenvironment:

Systems should conform to well-defined interfacesSystems should support portability of applicationsSystems should easily interoperate

Achieving opennessAt least make the distributed system independent from heterogeneity of the underlyingenvironment:

HardwarePlatformsLanguages


Policies versus Mechanisms

Implementing openness

Requires support for different policies:

What level of consistency do we require for client-cached data?Which operations do we allow downloaded code to perform?Which QoS requirements do we adjust in the face of varying bandwidth?What level of secrecy do we require for communication?

Implementing openness

Ideally, a distributed system provides only mechanisms:

Allow (dynamic) setting of caching policiesSupport different levels of trust for mobile codeProvide adjustable QoS parameters per data streamOffer different encryption algorithms


Scale in Distributed Systems

ObservationMany developers of modern distributed system easily use the adjective “scalable” without makingclear why their system actually scales.

Scalability

At least three components:

Number of users and/or processes (size scalability)Maximum distance between nodes (geographical scalability)Number of administrative domains (administrative scalability)

ObservationMost systems account only, to a certain extent, for size scalability. The (non)solution: powerfulservers. Today, the challenge lies in geographical and administrative scalability.


Techniques for Scaling

Hide communication latenciesAvoid waiting for responses; do something else:

Make use of asynchronous communicationHave separate handler for incoming responseProblem: not every application fits this model


Hiding communication latency

23 / 43


DistributionPartition data and computations across multiple machines:

Move computations to clients (Java applets)Decentralized naming services (DNS)Decentralized information systems (WWW)


Distribution: DNS

25 / 43


Replication/cachingMake copies of data available at different machines:

Replicated file servers and databasesMirrored Web sitesWeb caches (in browsers and proxies)File caching (at server and client)


Scaling – The Problem

ObservationApplying scaling techniques is easy, except for one thing:

Having multiple copies (cached or replicated), leads to inconsistencies: modifyingone copy makes that copy different from the rest.Always keeping copies consistent and in a general way requires globalsynchronization on each modification.Global synchronization precludes large-scale solutions.

ObservationIf we can tolerate inconsistencies, we may reduce the need for global synchronization, buttolerating inconsistencies is application dependent.


Developing Distributed Systems: Pitfalls

ObservationMany distributed systems are needlessly complex caused by mistakes that requiredpatching later on. There are many false assumptions:

The network is reliableThe network is secureThe network is homogeneousThe topology does not changeLatency is zeroBandwidth is infiniteTransport cost is zeroThere is one administrator


Types of Distributed Systems

Distributed Computing SystemsDistributed Information SystemsDistributed Pervasive Systems


Distributed Computing Systems

ObservationMany distributed systems are configured for High-Performance Computing

Cluster ComputingEssentially a group of high-end systems connected through a LAN:

Homogeneous: same OS, near-identical hardwareSingle managing node



Local OSLocal OS Local OS Local OS

Standard network

Component of

parallel application

Component of

parallel application

Component of

parallel applicationParallel libs

Management application

High-speed network

Remote access network

Master node Compute node Compute node Compute node



Grid ComputingThe next step: lots of nodes from everywhere:

HeterogeneousDispersed across several organizationsCan easily span a wide-area network

NoteTo allow for collaborations, grids generally use virtual organizations. In essence, this is agrouping of users (or better: their IDs) that will allow for authorization on resourceallocation.


Distributed Computing Systems: Clouds

Application

Infrastructure

Computation (VM), storage (block)

Hardware

Platforms

Software framework (Java/Python/.Net)Storage (DB, File)

Infr

astr

uctu

rea

a S

vc

Pla

tfo

rma

a S

vc

So

ftw

are

aa

Svc Google Apps

YouTubeFlickr

MS AzureAmazon S3

Amazon EC2

DatacentersCPU, memory, disk, bandwidth

Web services, multimedia, business apps


Distributed Computing Systems: Clouds

Cloud computingMake a distinction between four layer:

Hardware: Processors, routers, power and cooling systems. Customers normallynever get to see these.Infrastructure: Deploys virtualization techniques. Evolves around allocating andmanaging virtual storage devices and virtual servers.Platform: Provides higher-level abstractions for storage and such. Example: AmazonS3 storage system offers an API for (locally created) files to be organized and storedin so-buckets.Application: Actual applications, such as office suites (text processors, spreadsheetapplications, presentation applications). Comparable to the suite of apps shippedwith OSes.


Distributed Information Systems

ObservationThe vast amount of distributed systems in use today are forms of traditional informationsystems, that now integrate legacy systems. Example: Transaction processing systems.

BEGIN TRANSACTION(server, transaction)READ(transaction, file-1, data)WRITE(transaction, file-2, data)newData := MODIFIED(data)IF WRONG(newData) THEN

ABORT TRANSACTION(transaction)ELSE

WRITE(transaction, file-2, newData)END TRANSACTION(transaction)

END IF

NoteTransactions form an atomic operation.

35 / 43

Distributed Information Systems: Transactions

ModelA transaction is a collection of operations on the state of an object (database, object composition,etc.) that satisfies the following properties (ACID)

Atomicity: All operations either succeed, or all of them fail. When the transaction fails, the state ofthe object will remain unaffected by the transaction.

Consistency: A transaction establishes a valid state transition. This does not exclude thepossibility of invalid, intermediate states during the transaction’s execution.

Isolation: Concurrent transactions do not interfere with each other. It appears to each transactionT that other transactions occur either before T , or after T , but never both.

Durability: After the execution of a transaction, its effects are made permanent: changes to thestate survive failures.


Transaction Processing Monitor

ObservationIn many cases, the data involved in a transaction is distributed across several servers. ATP Monitor is responsible for coordinating the execution of a transaction

TP monitor

Server

Server

Server

Client application

Requests

Reply

Request

Request

Request

Reply

Reply

Reply

Transaction


Distr. Info. Systems: Enterprise Application Integration

ProblemA TP monitor doesn’t separate apps from their databases. Also needed are facilities fordirect communication between apps.

Server-side application



Client application

Client application

Communication middleware

Remote Procedure Call (RPC)Message-Oriented Middleware (MOM)


Distributed Pervasive Systems

ObservationEmerging next-generation of distributed systems in which nodes are small, mobile, and oftenembedded in a larger system.

Some requirements

Contextual change: The system is part of an environment in which changes should beimmediately accounted for.Ad hoc composition: Each node may be used in a very different ways by different users.Requires ease-of-configuration.Sharing is the default: Nodes come and go, providing sharable services and information.Calls again for simplicity.

NotePervasiveness and distribution transparency: a good match?


Distributed Systems

40 / 43

Pervasive Systems: Examples

Home systemsShould be completely self-organizing:

There should be no system administratorSimplest solution: a centralized home box?

Monitoring a personDevices are physically close to a person:

Where and how should monitored data be stored?How can we prevent loss of crucial data?What is needed to generate and propagate alerts?How can security be enforced?How can environment provide online feedback?


Sensor networks

CharacteristicsThe nodes to which sensors are attached are:

Many (10s-1000s)Simple (small memory/compute/communication capacity)Often battery-powered (or even battery-less)


Sensor networks as distributed systems

Operator's site

Sensor network

Sensor data is sent directly

to operator

Operator's site

Sensor network

Query

Sensors send only answers

Each sensor can process and

store data

(a)

(b)Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 43 / 43

mc714: sistemas distribuídoslucas/teaching/mc714/... · mc714 – sistemas distrubu´ıdos ementa...

Documents