distributed system: lecture 2box/ds_cloud/ds_lecture2.pdf · distributed system: lecture 2 ......

71
Distributed System: Lecture 2 Box Leangsuksun SWECO Endowed Professor, Computer Science Louisiana Tech University [email protected] CTO, PB Tech International Inc. naibox@gmail.com

Upload: others

Post on 14-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Distributed System: Lecture 2

Box Leangsuksun SWECO Endowed Professor, Computer Science Louisiana Tech University [email protected]

CTO, PB Tech International Inc. [email protected]

Page 2: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 2

System Models

based on Professor Paul Francis notes, Cornell University

Distributed System: Lecture 2

Page 3: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 3

Models

•  a simplified representation of a system or phenomenon

•  To provide an abstract, simplified but consistent description of a relevant aspect of distributed system – Mathematical representation – Graphical notations

Page 4: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Why do we need modeling

•  To study physical systems without actually building them.

•  Help better design •  Understanding important aspect such as

–  Performance – Reliability – Not to mention to confirm functionality

3/31/14 Towards survivable architecture 4

Page 5: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Why? Goals

•  Compare Alternatives •  Determine impacts (per features) •  System Tuning •  quantify relative Rel/Avail/Perf •  Debugging •  Set Expectation

Page 6: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

How to measure or estimate

•  Measurements •  Simulations •  Analytical Modeling

Page 7: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Measurements

•  Actual System Construction •  Create a workload per requirements •  Provides the best results •  Inherent difficult and inflexible •  Almost impossible for What-if

Page 8: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Measurements (continued)

•  Measure system or subsystem performance with tools – Gprof –  Top/ ps etc.. –  Benchmark programs (e.g. Linpak, Specmark,

Winmark –  Papi, perfctr, perfmon, perfsuite

•  What about reliability measurement? log, trace, outages.

Page 9: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Simulation

•  A program to simulate important characteristics of targeted systems

•  Flexible and ease to modify •  Good for the What-if analysis •  Difficult to model every small details •  Popular – cost-effective and flexible •  Suffer from details

Page 10: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Analytical Modeling

•  Mathematical description of the system •  Provide a quick insight

–  To help guiding in detail simulation or measurement-based

•  Results are much less believable or accurate

Page 11: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Performance

•  Computation – CPU – Memory –  I/O etc

•  Communication –  Latency –  Bandwidth

•  Transaction –  Possible more involvement than DB

Page 12: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Some Criteria

•  Throughput – # of completed requests per time unit

•  Response time – amount of time it takes from when a request was submitted until the first response is produced, not output

•  CPU utilization – keep the CPU as busy as possible

•  Turnaround time – amount of time to execute a particular request (finishing time – arrival time)

Page 13: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Stop Mar 25

•  Important Announcement: "!

Midterm Exam April 10, 2014.!

Page 14: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Performance issue discovery phase

Requirement Architecture/design Development/code test

1/19/2004 3/19/2004

2/1/2004 3/1/2004

1/19/2004 - 3/19/2004Re-design, code, re-test

Telcomm industry architecture review: 1/3 related issues to performance

Page 15: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Simple example of effective memory access time

•  Example – H = cache hit prob, –  Tm = memory access

time, –  Tc= cache access time

•  What is an effective memory accees time?

3/31/14 Towards survivable architecture 15

CPU

cache

memory

Page 16: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Example of modeling problem in DS

•  operation/transaction modeling for an e-commerce system –  Browsing order Tb + submitting order Ts –  90 % vs 10% (volume) – Weight 20% vs. 80% order – Order = 50 instructions + 10 mem

3/31/14 Towards survivable architecture 16

Page 17: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Comparison (Lilja’ book)

Factor Analytical Modeling

Simulation Measurement

Flexibility High High Low Cost Low Medium High Believability Low Medium High Accuracy Low Medium High

Page 18: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 18

System Models

•  Physical Model represents underlying hardware elements of a distributed system that abstracts away from specific details of the computer and networking technologies employed

•  Architectural model defines the way in which the components of the system are placed and how they interact with one another and the way in which they are mapped onto the underlying network of computers.

•  Fundamental models: –  Interaction model deals with communication details among the

components and their timing and performance details. –  Failure model gives specification of faults and defines reliable

communication and correct processes. –  Security model specifies possible threats and defines the concept of secure

channels.

Page 19: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Physical Model

•  represents underlying hardware elements

3/31/14 19

Credit:http://www.krug-soft.com/ Credit:http://cisco.com/

Page 20: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 20

Architectural Model

•  Concerned with placement of its parts and relationship among them.

•  Example: client-server model, peer-to-peer model •  Abstracts the functions of the individual components. •  Defines patterns for distribution of data and

workload. •  Defines patterns of communication among the

components. •  Example: Definition of server process, client process

and peer process and protocols for communication among processes; definition client/server model and its variations.

Page 21: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 21

Software and hardware service layers in distributed systems

Applicat ions, serv ices

Computer and network hardware

Platform

Operating sys tem

Mi ddleware

Page 22: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

7

National Weather Service Web Site

Data Aggregator RMI WeatherInfo

Server

RMI WeatherInfo Client

Application

RMI IP Socket API

Weather Web Service Web Client

Analytics Weather Web Service

Server

Relation Database MySQL

Http

Http SOAP/REST XML

LAN

1

2

3

4

5 6

Weather Google Map Client

7

Example of distributed weather monitoring systems (Architecture Model)

Page 23: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 23

Middleware

•  Layer of software whose purpose is to mask the heterogeneity and to provide a convenient programming model for application programmers.

•  Middleware supports such abstractions as remote method invocation, group communications, event notification, replication of shared data, real-time data streaming.

•  Examples: Java RMI, grid software (Globus, Open grid Services), Web services.

Page 24: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 24

Clients invoke individual servers

Server

Client

Client

invocation

result

Serverinvocation

result

Process:Key:

Computer:EX: browser, web client

EX: Web server

EX: 1. File server, 2. Web crawler

Page 25: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 25

A service provided by multiple servers

Server

Server

Server

Serv ice

Client

Client

EX: akamai (data duplication), now amazon aws (zones)

Page 26: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 26

Web proxy server and caches

Client

Proxy

Web

server

Web

server

serverClient

Proxy servers + cache are used to provide increased Availability and performance. They also play a major role Firewall based security. http://www.interhack.net/pubs/fwfaq/

Page 27: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 27

A distributed application based on peer processes

Coordinat ion

Application

code

Coordinat ion

Application

code

Coordinat ion

Application

code

Ex: distributed Whiteboard Application; Music sharing

Page 28: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 28

Web applets

a) c lient reques t results in the downloading of appl et code

Web server

ClientWeb serverApplet

Applet codeClient

b) c lient interacts with the applet

EX: Code streaming; mobile code

Page 29: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 29

Interaction Models

•  Within address space (using path as addresses)

•  Socket based communication: connection-oriented, connection-less – Socket is an end-point of communication – Lets look at some code + details

Page 30: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 30

Socket based communication

int sockfd; struct sockaddr_in addr; addr.sin_family = AF_INET; addr.sin_addr.s_addr =

inet_addr(SERV_HOST_ADDR); addr.sin_port = htons(SERV_TCP_PORT); sockfd = socket(AF_INET, SOCK_STREAM, 0); connect(sockfd, (struct sockaddr *) &addr,

sizeof(serv_addr)); do_stuff(stdin, sockfd);

Page 31: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 31

Classic view of network API

•  Start with host name (maybe) foo.bar.com

Page 32: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 32

Classic view of network API

•  Start with host name •  Get an IP address foo.bar.com

gethostbyname()

10.5.4.3

Page 33: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 33

Classic view of network API

•  Start with host name •  Get an IP address •  Make a socket

(protocol, address)

foo.bar.com gethostbyname()

10.5.4.3

sock_id

socket();connect();…

Page 34: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 34

Classic view of network API

•  Start with host name •  Get an IP address •  Make a socket

(protocol, address) •  Send byte stream

(TCP) or packets (UDP)

foo.bar.com gethostbyname()

10.5.4.3

sock_id

socket();connect();…

TCP sock UDP sock

Network

1,2,3,4,5,6,7,8,9 . . . …

Eventually arrive in order

May or may not arrive

Page 35: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 35

Protocol layering

•  Communications stack consists of a set of services, each providing a service to the layer above, and using services of the layer below –  Each service has a programming API, just like any software

module •  Each service has to convey information one or more

peers across the network •  This information is contained in a header

–  The headers are transmitted in the same order as the layered services

Page 36: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 36

Protocol layering example

Browser process

HTTP

TCP

Link1

IP

Link1

IP

Link2

Web server process

HTTP

TCP

Link1

IP

Physical Link 1 Physical Link 2

Router

Page 37: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 37

HTTP

Protocol layering example

Browser process

TCP

Link1

IP

Link1

IP

Link2

Web server process

HTTP

TCP

Link1

IP

Physical Link 1 Physical Link 2

Router

H

Browser wants to request a page. Calls HTTP with the web address (URL). HTTP’s job is to convey the URL to the web server. HTTP learns the IP address of the web server, adds its header, and calls TCP.

Page 38: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 38

HTTP

Protocol layering example

Browser process

TCP

Link1

IP

Link1

IP

Link2

Web server process

HTTP

TCP

Link1

IP

Physical Link 1 Physical Link 2

H

TCP’s job is to work with server to make sure bytes arrive reliably and in order. TCP adds its header and calls IP. (Before that, TCP establishes a connection with its peer.)

T Router

Page 39: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 39

HTTP

Protocol layering example

Browser process

TCP

Link1

IP

Link1

IP

Link2

Web server process

HTTP

TCP

Link1

IP

Physical Link 1 Physical Link 2

H

IP’s job is to get the packet routed to the peer through zero or more routers. IP determines the next hop from the destination IP address. IP adds its header and calls the link layer (i.e. Ethernet) with the next hop address.

T

Router

I

Page 40: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 40

HTTP

Protocol layering example

Browser process

TCP

Link1

IP

Link1

IP

Link2

Web server process

HTTP

TCP

Link1

IP

Physical Link 1 Physical Link 2

H

The link’s job is to get the packet to the next physical box (here a router). It adds its header and sends the resulting packet over the “wire”.

T

Router

I L1

Page 41: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 41

HTTP

Protocol layering example

Browser process

TCP

Link1

IP

Link1

IP

Link2

Web server process

HTTP

TCP

Link1

IP

Physical Link 1 Physical Link 2

H

The router’s link layer receives the packet, strips the link header, and hands the result to the IP forwarding process.

T

Router

I

Page 42: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 42

HTTP

Protocol layering example

Browser process

TCP

Link1

IP

Link1

IP

Link2

Web server process

HTTP

TCP

Link1

IP

Physical Link 1 Physical Link 2

H

The router’s IP forwarding process looks at the destination IP address, determines what the next hop is, and hands the packet to the appropriate link layer with the appropriate next hop link address.

T

Router

I

Page 43: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 43

HTTP

Protocol layering example

Browser process

TCP

Link1

IP

Link1

IP

Link2

Web server process

HTTP

TCP

Link1

IP

Physical Link 1 Physical Link 2

H

The packet goes over the link to the web server, after which each layer processes and strips its corresponding header.

T

Router

I L2

H T I

H T

H

Page 44: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 44

Basic elements of any protocol header

•  Demuxing field – Indicates which is the next higher layer (or

process, or context, etc.) •  Length field or header delimiter

– For the header, optionally for the whole packet •  Header format may be text (HTTP, SMTP

(email)) or binary (IP, TCP, Ethernet)

Page 45: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 45

Demuxing fields

•  Ethernet: Protocol Number –  Indicates IPv4, IPv6, (old: Appletalk, SNA, Decnet, etc.)

•  IP: Protocol Number –  Indicates TCP, UDP, SCTP

•  TCP and UDP: Port Number –  Well known ports indicate FTP, SMTP, HTTP, SIP, many others –  Dynamically negotiated ports indicate specific processes (for these and

other protocols)

•  HTTP: Host field –  Indicates “virtual web server” within a physical web server

Page 46: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 46

IP (Internet Protocol)

•  Three services: –  Unicast: transmits a packet to a specific host –  Multicast: transmits a packet to a group of hosts –  Anycast: transmits a packet to one of a group of hosts (typically

nearest) •  Destination and source identified by the IP address (32 bits

for IPv4, 128 bits for IPv6) •  All services are unreliable

–  Packet may be dropped, duplicated, and received in a different order

Page 47: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 47

IP(v4) address format

•  In binary, a 32-bit integer •  In text, this: “128.52.7.243”

–  Each decimal digit represents 8 bits (0 – 255) •  “Private” addresses are not globally unique:

–  Used behind NAT boxes –  10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16

•  Multicast addresses start with 1110 as the first 4 bits (Class D address) –  224.0.0.0/4

•  Unicast and anycast addresses come from the same space

Page 48: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 48

UDP (User Datagram Protocol)

•  Runs above IP •  Same unreliable service as IP

–  Packets can get lost anywhere: •  Outgoing buffer at source •  Router or link •  Incoming buffer at destination

•  But adds port numbers –  Used to identify “application layer” protocols or processes

•  Also a checksum, optional

Page 49: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 49

TCP (Transmission Control Protocol)

•  Runs above IP –  Port number and checksum like UDP

•  Service is in-order byte stream –  Application does not absolutely know how the bytes are packaged

in packets •  Flow control and congestion control •  Connection setup and teardown phases •  Can be considerable delay between bytes in at source and

bytes out at destination –  Because of timeouts and retransmissions

•  Works only with unicast (not multicast or anycast)

Page 50: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 50

UDP vs. TCP

•  UDP is more real-time –  Packet is sent or dropped, but is not delayed

•  UDP has more of a “message” flavor –  One packet = one message –  But must add reliability mechanisms over it

•  TCP is great for transferring a file or a bunch of email, but kind-of frustrating for messaging –  Interrupts to application don’t conform to message boundaries –  No “Application Layer Framing”

•  TCP is vulnerable to DoS (Denial of Service) attacks, because initial packet consumes resources at the receiver

Page 51: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Instructor’s Guide for Coulouris, Dollimore and

Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005

Figure 2.8 Real-time ordering of events

send

receive

send

receive

m1 m2

2

1

3

4X

Y

Z

Physical time

Am3

receive receive

send

receive receive receivet1 t2 t3

receive

receivem2

m1

Page 52: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Instructor’s Guide for Coulouris, Dollimore and

Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005

Figure 2.9 Processes and channels

process p process q

Communi cat ion channel

send

Outgoing message buffer Incoming message buffer

receivem

Page 53: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Instructor’s Guide for Coulouris, Dollimore and

Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005

Failure Model: Omission and arbitrary failures

Class of failure Affects Description Fail-stop Process Process halts and remains halted. Other processes may

detect this state. Crash Process Process halts and remains halted. Other processes may

not be able to detect this state. Omission Channel A message inserted in an outgoing message buffer never

arrives at the other end’s incoming message buffer. Send-omission Process A process completes a send, but the message is not put

in its outgoing message buffer. Receive-omission Process A message is put in a process’s incoming message

buffer, but that process does not receive it. Arbitrary (Byzantine)

Process or channel

Process/channel exhibits arbitrary behaviour: it may send/transmit arbitrary messages at arbitrary times, commit omissions; a process may stop or take an incorrect step.

Page 54: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Instructor’s Guide for Coulouris, Dollimore and

Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005

Figure 2.11 Timing failures

Class of Failure Affects Description Clock Process Process’s local clock exceeds the bounds on its

rate of drift from real time. Performance Process Process exceeds the bounds on the interval

between two steps. Performance Channel A message’s transmission takes longer than the

stated bound.

Page 55: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Dependability Modeling

•  Include reliability modeling and availability modeling •  A designed system can be shown to meet performance

and dependability requirement. •  provide a good mechanism for examining the behavior of

a system, right from the design stage to implementation and final deployment.

Page 56: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Dependability

•  Two measures – Reliability (MTTF) – Availability (ratio of uptime/total)

Page 57: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Reliability

•  Definition: The reliability R(t) of a system at time t is the probability that the system failure has not occurred in the interval [0,t). If X is a random variable that represents the time to occurrence of system failure, then R(t)=P(X>t).

•  unreliability = 1-R(t)

Page 58: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Reliability

•  Definition MTTF of a system is the expected time until the occurrence of the (first) system failure. If X is a random variable that represents the time to occurrence of system failure, then MTTF=E[X].

•  Given the system reliability R(t), the MTTF can be computed as,

MTTF = ∫ R(t)dt

Page 59: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Availability

•  A measurement represents a ratio of uptime vs. total times

•  High availability - ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliabilities of its individual components would suggest.

•  High availability is most often achieved through fault tolerance.

Page 60: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Degree of Availability System Type Unavailability

(minutes/year) Availability (in percent) Availability Class

Unmanaged 50,000 90 1

Managed 5,000 99 2

Well-managed 500 99.9 3

Fault-tolerant 50 99.99 4

High Availability 5 99.999 5

Very High Availability 0.5 99.9999 6

Ultra Availability 0.05

99.99999

7

Page 61: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Availability

•  Definition: Availability A(t) of a system at time t is the probability that the system is functioning correctly at time t.

•  Like the reliability measure, in some applications it is better to compute the system unavailability U(t) = 1 -A(t).

•  Availability = MTTF / (MTTF + MTTR) •  A steady = lim A(t) where t -> ∞

Page 62: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Modeling Techniques

•  Non State-space –  Fault-tree – Reliability Block Diagram

•  State-Space – Continuous Markov Chain –  Stochastic Petri Net

Page 63: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Example of system

Page 64: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Fault Tree

Page 65: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Availability Model Server up Server down & repair

S1

S1

S2

time

Availability model

HA-OSCAR dual head model

S1&S2

Page 66: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

HA-OSCAR SRN model

• Server sub-model

• Switches

• Compute nodes

Page 67: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Server Sub Model

• P Server up • P Server down • Failover • P server repair • Failback

• S is up and ready • S takes control • S Server down • S repair

Page 68: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Compute node sub model

Switch sub model

Page 69: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Class discussion/Exercise

•  Say we have to design and develop a disaster warning system that has interfaces to multiple systems and perform event analysis for possible disaster/dangers

•  High Level Requirements

–  Open interface –  Scalable for many subscribers for event notification –  24/7 availablity

3/31/14 Towards survivable architecture 69

Page 70: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

3/31/14 70

Summary

•  When designing systems or analyzing systems, you want to examine at the high level the architectural model.

•  Subsequent steps will explore fundamental models such as interaction model, security model, failure model, reliability model etc.

Page 71: Distributed System: Lecture 2box/ds_cloud/DS_Lecture2.pdf · Distributed System: Lecture 2 ... naibox@gmail.com. 3/31/14 2 System Models based on Professor Paul Francis notes, Cornell

Case study in Cloud-based EKG system

3/31/14 Towards survivable architecture 71