distributed systems technologies cm0356/cm0456 andrew harrison [email protected] 1

29
Distributed Systems Technologies CM0356/CM0456 Andrew Harrison [email protected] 1

Upload: lexie-gawne

Post on 01-Apr-2015

219 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

Distributed SystemsTechnologies

CM0356/CM0456

Andrew [email protected]

1

Page 2: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

Who am I?

• Studied Fine Art many moons ago.– Worked as an artist and theatre designer for 15

years.– Then got into the Web…

• Did the MSc here in 2002-3 and a PhD after that

• Currently work as an RA• My research is in the areas this course covers

- P2P, the Web, Web Services and Grid computing– currently working in Health Informatics Domain

2

Page 3: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

Structure• Course is 21 lectures. Omer gives 3, I give the rest.

– Omer will do next week• The assessment is 100% exam for CM0356• The assessment is 80% exam and 20% coursework for CM0456• CM0456 will receive the coursework in week 2 (after second

lecture). Hand-in is beginning of week 10.• The exam is do-able from the content of the lectures

– A more detailed understanding can be gained from the recommended reading: “From P2P and Grids to Services on the Web” (see Web site)

– labs – weeks 5 – 11 on a Thursday and Friday– Main course site is:

• http://users.cs.cf.ac.uk/A.B.Harrison/distributed/

– Link on Blackboard (learning central) to this site.

3

Page 4: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

Lecture Overview

• Week 1– Lecture 1: Introduction– Lecture 2: Peer-to-Peer Systems– Lecture 3: Gnutella

• Week 2– Lecture 1: Fault Tolerance I (Omer Rana)– Lecture 2: Trust and Reputation (Omer Rana)– Lecture 3: Trust and Reputation (Omer Rana)

• Week 3– Lecture 1: Scalability– Lecture 2: Security in Distributed Systems– Lecture 3: Fault Tolerance II

• Week 4– Lecture 1: RMI and Jini– Lecture 2: BitTorrent– Lecture 3: Freenet

4

Page 5: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

Lecture Overview• Week 5

– Lecture 1: The Web– Lecture 2: The Programmable Web I– Lecture 3: The Programmable Web II

• Week 6– Lecture 1: Web Services I– Lecture 2: Web Services II– Lecture 3: Grid Computing

• Week 7– Lecture 1: Cloud– Lecture 2: CAN– Lecture 3: Jxta

• Week 8– Lecture 1: FREE– Lecture 2: FREE– Lecture 3: FREE

5

Page 6: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

Lecture Overview

• Week 9– Lecture 1: FREE– Lecture 2: FREE– Lecture 3: FREE

• Week 10– Lecture 1: Revision– Lecture 2: Revision– Lecture 3: FREE

• Week 11:– Lecture 1: FREE– Lecture 2: FREE– Lecture 3: FREE

• This may change slightly over time because of lecturers’ availability. The online version on the course website will be accurate.

6

Page 7: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

Aims

• The aim of the course is to provide a comparative understanding of a wide variety of distributed systems

• Some of the systems are middleware– generic tooling to support distributed comms

• E.g. XML and Web Service technologies, Jini

• Some of them are applications– Provide particular functionalities for users

• E.g. Bittorrent and Freenet

• In general, the exam will test you on this comparative understanding, rather than being able to recite every last detail of a particular technology– However, understanding the technologies is the first step to

having a comparative understanding :-)

7

Page 8: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

8

Why does Distributed Computing Matter?

Page 9: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

Why does Distributed Computing Matter?

• Clay Shirkey’s 4 rungs of group interaction– Here comes Everybody (2008)

• Links in a fully connected mesh grow exponentially– nodes * (nodes – 1) / 2

• Internet technology enables ridiculously easy group forming– e.g. email Reply All button

• Describes an evolution from– Simple/selfish sharing

• example: delicious– conversation

• example: high dynamic range photography on Flickr– collaboration

• anime – writing software to subtitle Japanese anime for Western consumption

– collective action• flash mobs in Belarus – now Tunisia and Egypt

• Each rung requires greater co-ordination amongst members

Page 10: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

10

Machine A Machine B Machine C

Distributed Applications

Middleware Services

OS, e.g.Windows

OS, e.g.Mac OS X

OS, e.g.Linux

Network

A Distributed SystemA distributed system is a collection of independent computers that appears to its users as a single coherent system.

• Heterogeneous computers – vendors/OS should be able to interoperate• Should mask the heterogeneity from users (applications)• Should be easy to expand and scale• Should be permanently available (even though parts of it are not)• Communication is based on messages

messages

Page 11: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

11

Layering• The distributed applications and middleware we will

consider occupy the Application Layer of the OSI model

• Some interact more or less closely with underlying layers

• Typically these systems have various layers within them as well.

Page 12: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

12

Layering• For example a Web Service:

• Serializes programming objects to XML, perhaps using W3C XML Schema specification

• Wraps this data in a SOAP envelope.• Sticks the SOAP envelope into an HTTP POST message

payload.• Then HTTP uses what it uses: TCP/IP, DNS

• A Peer to Peer application may:• Define custom messages for file sharing• Use UPnP to negotiate NATs• Tunnel custom messages through another protocol (e.g.

HTTP)• In the end, systems will do whatever they need to get their job

done and don’t care much for nicely defined layers.

Page 13: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

13

Messages

• These define the protocol of the distributed application or middleware.

• Provide the insulation from heterogeneous OS’s• Are capable of traversing the network.• This involves some form of serialization

• i.e. the message can be converted into a stream of bytes and sent over the wire

• Often an in-memory representation of an Object is serialized at the sending side, and de-serialized at the receiving side to create an in-memory representation.

• Strings, XML, Java serialization, bencoding…

Page 14: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

14

Messages• All messages are one-way

• Protocols may build on top of this to create two way message exchange patterns

• E.g. HTTP is request/response based.• RPC emulates a local computing paradigm with

the concept of a remote procedure with return values

• Messages are typically independent of the transport protocol they use (TCP, UDP)• But some rely/presume certain transport protocols• E.g. bittorrent and HTTP keep the TCP

connection open to reduce overhead. Would not work with UDP.

Page 15: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

15

Distributed vs Local Systems• Distributed systems are inherently different from non-

distributed systems.• Latency - network speed• Memory access - not shared• Partial Failure

• remote failure does not mean local failure• no global coordination (like an OS)

• Guaranteed Concurrency• combined with latency, events are not necessarily received in

the same order as they are generated• Indeterminacy

• Your system is not in control of the whole system• With partial failure, a system may just disappear with no

indication of status.• was it the remote machine, remote user or a network

link?

Page 16: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

16

Peer

Client Server

Node

Computer/Device

Service

Resource

Some Terminology

Resource: any hardware or software resource shared on a distributed networke.g. a file storage system, RAM, CPU, a file, a service or a communication channel

Generic term - any computer on a distributed network

A Provider of data

A Consumer of data

Both a Provider and Consumer of data

“A network-enabled entity that provides some capability”

Page 17: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

17

Taxonomy for Distributed SystemsTaxonomy is based on following factors and their relation to centralization:

1. Resource Discovery:

2. Resource Availability: Scalability – do resources scale with network? - does access to them scale with network?

See example...

Mechanism for discovering resources on a distributed system?

• Examples: DNS, JXTA Rendezvous, Jini LUS, UDDI etc

Page 18: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

18

Mp3.com, Napster and Gnutella

User

Mp3.com

MP3.com Scenario

User

Napster.com

Napster Scenario

Gnutella Scenario

Page 19: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

19

Taxonomy for Distributed Systems

3. Resource Communication: Two types:Brokered Communication (centralized): communication is passed through a central server - resources do not have direct references to each other. Point to point (decentralized -peer to peer): a direct connection between the sender and the receiver.

Taxonomy is based on following factors and their relation to centralization:

1. Resource Discovery:

2. Resource Availability: Scalability – do resources scale with network? - does access to them scale with network?

Mechanism for discovering resources on a distributed system?

• Examples: DNS, JXTA Rendezvous, Jini LUS, UDDI etc

Page 20: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

20

Equal Peers, communication is supposed to be even i.e. each provider is also a consumer of information and each node has an equal number of connections

This is not always the case … as we will learn in lecture 4

WebServer

Centralization of Point-to-Point Connections

True Peer to Peer e.g. Gnutella

Many to one relationship between users and the web server and therefore this can be considered centralized communication

Page 21: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

21

Taxonomy for Distributed Systems

Centralized DecentralizedHybrid

Centralized systems -typically, client/server based systems

Decentralized systems - Peer to Peer (P2P)

Hybrid – combinations of the 2 extremes e.g. brokered architecture

3. Resource Communication: Two types:Brokered Communication (centralized): communication is passed through a central server - resources do not have direct references to each other. Point to point (decentralized -peer to peer): a direct connection (although connection maybe multi-hop) between the sender and the receiver.

Taxonomy is based on following factors:

1. Resource Discovery:

2. Resource Availability: Scalability – do resources scale with network?

Mechanism for discovering resources on a distributed system?

• Examples: DNS, JXTA Rendezvous, Jini LUS, UDDI etc

Page 22: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

22

A Web Server: Centralized

- Clients (i.e. users) use their web browser to navigate web pages on one or more web sites. - Web site is static to particular domain

• Discovery: Centralized, DNS • Availability: available or not • Communication: centralized to the particular web server

ResourceAvailability

ResourceDiscovery

ResourceCommunication

Centralized

Decentralized

Web Server Web

Server

Page 23: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

The Web as a whole

• Discovery - ad hoc• Often highly centralized, e.g. Google, but is also highly

decentralized - the Web of links, e.g. the blogosphere and out of bounds

• Availability - depends on the granularity of the request• There is a level of replication on the Web and caching can be used

to duplicate availability• Communication - centralized

• Communication happens via a centralized entity, e.g. Facebook, MySpace, Flickr, blogs, etc

Page 24: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

24

Napster: Brokered

Clients search through Napster web site (well, they used to….)

• Discovery: Centralized through web site• Availability: Once discovered via web site, availability is decentralized.• Communication: decentralized between peers (MP3 sharers)

ResourceAvailability

ResourceDiscovery

ResourceCommunication

Centralized

Decentralized

Napster

User

Napster.com

Page 25: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

25

Gnutella: Decentralized

• Discovery: Decentralized through Gnutella messages (ping/pong mechanisms)

• Availability: Often an alternate path to resource • Communication: point to point: decentralized between peers

ResourceAvailability

ResourceDiscovery

ResourceCommunication

Centralized

Decentralized

Gnutella

Page 26: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

26

SETI@HomeMain Server

• Launched In 1996• Scientific experiment - uses Internet-connected computers in

the Search for Extraterrestrial Intelligence (SETI) • Distributes a screen saver–based application to users • Applies signal analysis algorithms to different data sets to

process radio-telescope data. • Has more than 3 million users - used over a million years of

CPU time to date

Client/Server P2P

1. InstallScreen Saver

Radio-telescope Data

2. SETI client (screen Saver) starts

3. SETI client getsdata from server and runs

4. Client sends resultsback to server

SETI@HOME (Client/Server)

Page 27: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

27

ResourceAvailability

ResourceDiscovery

ResourceCommunication

Centralized

Decentralized

SETI@home

Search for Extraterrestrial Intelligence@home – volunteer computing system

generalized to BOINC API

Page 28: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

28

ResourceAvailability

ResourceDiscovery

ResourceCommunication

Centralized

Decentralized

Web Services

Note: This is for the current Web Services, technology stack- in principal you can host web services in a number of ways

• Discovery: Centralized through registry• Availability: Once discovered via registry, availability is decentralized.• Communication: decentralized between provider and consumer

Page 29: Distributed Systems Technologies CM0356/CM0456 Andrew Harrison a.b.harrison@cs.cf.ac.uk 1

29

Concluding Remarks

1. Taxonomya) Criteria

a) Resource Discoveryb) Resource Availabilityc) Resource Communication

b) Taxonomya) Centralizedb) Hybridc) Decentralized

2. Relevancea) Course relates distributed systems to this taxonomy