lecture 9 naming services. eece 411: design of distributed software applications logistics /...

Lecture 9

Naming services

EECE 411: Design of Distributed Software Applications

Logistics / reminders Subscribe to the mailing list! Assignment1 marks

You should have received them Comments

Share ideas / DO NOT share code. TA grading scheme

Assignment2: due this Friday 11:59pm Quizzes:

Q1: Thursday next week (10/14) Q2: 11/16


TA’s grading scheme (1 mark) The code does not properly read the data from the

socket. The code does not reattempt to read the rest of the message if the

message is not completely received from the first read operation. (1-3 marks) Incorrect implementation of the protocol.

Implementation does not skip the server authentication data. The code does not handle correctly the endiannes of the integers.

code works only because your are lucky that the message size and secret code length fit into one byte. But it will break if either is larger than 256.

(1 mark) Bad software engineering practice, code does not handle exceptions.

(1 mark) Bad programming practice. Hardcoded message length. The message length is a variable that you

should parse form the message. Hardcoded the server name, port number, and student ID.

(1 mark) The instructions for running the code do not work / Improper instruction to run the code.

It is not acceptable to ask for any IDE when distributing your code. A make file, ant xml file, or simply, instructions to run the jar file would have worked.


[Last time] Servers and State (I)

Stateless servers: Never keep accurate information about the status of a client after having handled a request:

Examples: Don’t record whether a file has been opened (simply close it

again after access) Don’t promise to invalidate a client’s cache Don’t keep track of your clients

Consequences: Clients and servers are completely independent State inconsistencies resulted from client or server crashes

are reduced. Possible loss of performance because, e.g., a server cannot

anticipate client behavior (think of prefetching file blocks)


Servers and State (II)

Stateful servers: Keep track of client status: E.g., record that a file has been opened, so that

prefetching can be done E.g., Knows which data a client has cached, and allows

clients to keep local copies of shared data and my promise to invalidate them

Observation: The performance of stateful servers can be extremely high, provided clients are allowed to keep local copies.

[Depending on application] reliability may not a major problem.


Quiz question: Stateless or stateful?eSocks is an e-commerce site. The current implementation of the server-side is

statefull (no replication is used). There are about 100 application servers. The MTTF is one year.

Anticipated load is 1000 new users per second. On average a user session lasts 15 minutes.

The business cost of a lost session is $100.

A vendor offers to revamp eSocks infrastructure and move it to a statless implementation for ‘only’ $10M

Should eSocks’ CTO consider this offer? Should the CTO ask more questions (his team or the

vendor)?


Next

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate Cooperate => support needed

Naming Synchronization


Naming systems

Functionality Map: names access points (addresses)

Names are used to denote entities in a distributed system.

To operate on an entity, we need to access it at an access point (address).

Note: A location-independent name for an entity E, is independent from the addresses of the access points offered by E.


Naming systems

Functionality Map: names access points (addresses)

One challenge: scaling #of names, #clients geographical distribution, Management!


War stories (1)

Saving pics in a student registration database

Issues: Naming conflict System vs. user chosen names


War stories (2) ISPs merge!

Story Bob’s email is [email protected] But the ISP running superman.com is bought

by superwoman.com and they want to translate all emails to their domain

[email protected]!

Issues: Overloading (address is not location independent!)

Solutions?


War stories (3): ZIP codes – more overloading

US zip code structure:1st digit: zone (e.g., New England, NW)2nd-3rd digit: ‘section’4-5th digit: post-office

Situation: Congestion at Boston section (021). Solution adopted: split in two (021 and 024) Result?

Issue: Overloading


War stories (4): Running out of phone numbers

10 digit: area(3)+switch(3)+identifier(4)

Running out of numbers Splitting vs. overlay


Terminology

Names Names vs. identifiers

Identifiers have three properties refer to at most one entry each entity is referred by at most one identifier always refers to the same entity

Human friendly vs. arbitrary (random strings)

Namespace Flat (names have no nostructure) vs. Hierarchical (names have structure)


Naming system implementation

Strawman #1: Why not centralize?

Single point of failure High latency

Distant centralized database Scalability bottleneck:

Traffic volume Management: Single point of update



Strawman #2: Why not use /etc/hosts?

Original Name to Address Mapping Flat namespace /etc/hosts SRI kept main copy Downloaded regularly

Count of hosts was increasing: machine per domain machine per user

Many more downloads Many more updates

Still a scalability bottleneck



Strawman #3:…. ?



Idea: partition the namespace

Hierarchical namespace (e.g., DNS)



Idea: partition the namespace

What if I want to keep the namespace flat?


Implementation options: Flat namespace

Problem: Given an essentially unstructured name how can we locate its associated address?

Possible designs: Simple solutions (broadcasting, forwarding

pointers) Home-based approaches Consistent hashing, Distributed Hash Tables

(a.k.a, structured p2p)


Flat namespaces – simple solutions

Broadcasting: Simply broadcast the ID, requesting the entity to return its current address.

Can never scale beyond local-area networks (think of ARP/RARP)

Requires all processes to listen to incoming location requests

Forwarding pointers: Each time an entity moves, it leaves behind a pointer telling where it has gone to.

Update a client’s reference as soon as present location has been found

Geographical scalability problems: Long chains are not fault tolerant Increased network latency at dereferencing


Functionality to implement Map: names access points (addresses)

Similar to a hash-table Manage (huge) list of (name, access point)

pairs Put (key, value) Lookup (key) value

Key idea: partitioning. Allocate parts of the list to different nodes


Partition Solution: Consistent hashing

Consistent hashing: the output range of a hash function is treated as a

fixed circular space or “ring”.

CircularID Space N32

N10

N100

N80

N60

Key ID Node ID

K52

K30

K5

K99

K11

K33

128 0



Mapping keys to nodes Advantages: incremental scalability, load

balancing

N32

N10

N100

N80

N60

CircularID Space

K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID


Consistent hashing

How do store & lookup work?

N32

N10

N100

N80

N60 K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID

“Key 5 isAt N10”


Consistent Hashing -- Summary

Mechanism: Nodes get an identity by hashing their IP address,

keys are also hashed into same space A key with id (hashed into) k, is assigned to first node

whose hashed id is equal or follows k, in circular space: successor(k)

Advantage Incremental scalability, Balanced Distribution, Theoretical results:

[N number of nodes, K number of keys in the system] [With high probability] Each node is responsible for at

most (1+)K/N keys [With high probability] Joining or leaving of a node

relocates O(K/N) keys (and only to or from the responsible node)


Extra consistent hashing tricks:

Virtual Nodes: Each physical node can be responsible for multiple virtual nodes.

Advantage: load balancing Dealing with heterogeneity: The number of virtual nodes that a

node is responsible for can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.

When a node becomes available again, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes.

Lecture 10

Naming systems (for flat namespaces) Consistent hashing DHTs


Key functionality: Manage list of tuples. Naming service

names access points (addresses) Hash table

key values API: store (key, value), Lookup (key) value

Many other applications Cooperative web caches: pages cache location P2P file sharing: filenames list of nodes that store them

(eDonkey, Azureus)

Problem: scalability Key idea: partitioning.

Allocate parts of the list to participating nodes

Functionality similar to a hash-table



Consistent hashing: the output range of a hash function is treated as a fixed

circular space or “ring”. map keys and NodeIDs using the hash function

CircularID Space N32

N10

N100

N80

N60

Key ID Node ID

K52

K30

K5

K99

K11

K33

128 0



Discussion: load balancing, impact of failures

N32

N10

N100

N80

N60

CircularID Space

K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID

128 0


Consistent hashing

How do store & lookup work?

N32

N10

N100

N80

N60 K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID



Additional trick: Virtual Nodes

Problem: How to do load balancing when nodes heterogeneous?Solution idea: Each node owns an ID space proportional to its

‘power’Virtual Nodes: Each physical node is responsible for multiple (similar) virtual

nodes. Virtual nodes are treated the sameAdvantages: load balancing, incremental scalability, dealing with

failures Dealing with heterogeneity: The number of virtual nodes that a

node is responsible for can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

When a node joins (if it supports many VN) it accepts a roughly equivalent amount of load from each of the other existing nodes.

If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.


Consistent Hashing – Summary so far

Mechanism: Nodes get an identity by hashing their IP address, keys are also

hashed into same space A key with id (hashed into) k, is assigned to first node whose

hashed id is equal or follows k, in circular space: successor(k)

Advantage Incremental scalability, load balancing Theoretical results:

[N number of nodes, k number of keys in the system] [With high probability] Each node is responsible for at most

(1+)K/N keys [With high probability] Joining or leaving of a node relocates

O(K/N) keys (and only to or from the responsible node)


BUT Consistent hashing – problem

How large is the state maintained at each node? O(N); N number of nodes.

N32

N10

N100

N80

N60 K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID



Basic Lookup (nonsolution)

N32

N10

N5

N20

N110

N99

N80

N60

N40

“Where is key 50?”


• Lookups find the ID’s successor• Correct if successors are correct


Successor Lists Ensure Robust Lookup

N32

N10

N5

N20

N110

N99

N80

N60

• Each node remembers r successors• Lookup can skip over dead nodes

N40

10, 20, 32

20, 32, 40

32, 40, 60

40, 60, 80

60, 80, 99

80, 99, 110

99, 110, 5

110, 5, 10

5, 10, 20


“Finger Table” Accelerates Lookups

N80

½¼

1/8

1/161/321/641/128


Lookups take O(log N) hops

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19


Summary of Performance Characteristics

Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership

changes


Joining the Ring Three step process

Initialize all fingers of new node Update fingers of existing nodes Transfer keys from successor to new node

Two invariants to maintain to insure correctness Each node’s successor list is maintained successor(k) is responsible for monitoring k


N36

1. Lookup(37,38,40,…,100,164)

N60

N40

N5

N20N99

N80

Join: Initialize New Node’s Finger Table

Locate any node p in the ring Ask node p to lookup fingers of new

node


N36

N60

N40

N5

N20N99

N80

Join: Update Fingers of Existing Nodes

New node calls update function on existing nodes Existing nodes recursively update fingers of other

nodes


Copy keys 21..36from N40 to N36

K30K38

N36

N60

N40

N5

N20N99

N80

K30

K38

Join: Transfer Keys

Only keys in the range are transferred


N120

N113

N102

N80

N85

N10

Lookup(90)

Handling Failures Problem: Failures could cause incorrect lookup Solution: Fallback: keep track of successor

successor (i.e., keep list of r successors)

EECE 411: Design of Distributed Software Applications48

Choosing Successor List Length

r - length of successor list N – nodes in the system

Assume 1/2 of the nodes fail P(successor list all dead for a speciffic

node) = (1/2)r

i.e., P(this node breaks the ring) depends on independent failure assumption

P(no broken nodes) = (1 – (1/2)r)N


DHT – Summary so far

Mechanism: Nodes get an identity by hashing their IP address, keys are also

hashed into same space A key with id (hashed into) k, is assigned to first node whose

hashed id is equal or follows k, in circular space: successor(k)

Properties Incremental scalability, good load balancing Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership changes


Applications


Trackerless BitTorrent:

A client wants to download the file: Contacts the tracker identified in

the .torrent file (using HTTP) Tracker sends client a (random)

list of peers who have/are downloading the file

Client contacts peers on list to see which segments of the file they have

Client requests segments from peers Client reports to other peers it knows about that it

has the segment Other peers start to contact client to get the segment

(while client is getting other segments)


An Example Application: The CD Database

Compute Disc Fingerprint

Recognize Fingerprint?

Album & Track Titles


An Example Application: The CD Database

Type In Album andTrack Titles

Album & Track Titles

No Such Fingerprint


A DHT-Based FreeDB Cache FreeDB is a volunteer service

Has suffered outages as long as 48 hours Service costs born largely by volunteer

mirrors Idea: Build a cache of FreeDB with a

DHT Add to availability of main service Goal: explore how easy this is to do


Cache Illustration

DHTDHTNew Albums

Disc Fingerp

rint

Disc In

fo

Disc Fingerprint


OpenDHT ApplicationsApplication Uses OpenDHT for

Croquet Media Manager replica location

DOA indexing

HIP name resolution

DTN Tetherless Computing Architecture

host mobility

Place Lab range queries

QStream multicast tree construction

VPN Index indexing

DHT-Augmented Gnutella Client rare object search

FreeDB storage

Instant Messaging rendezvous

CFS storage

i3 redirection


Steps: Partition keyspace Build overlay network Route queries


Next

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate Cooperate => support needed

Naming – enables some resource sharing Synchronization

lecture 9 naming services. eece 411: design of distributed software applications logistics /...

Documents

stateless servers

application servers

time servers

prefetching file blocks

secret code length fit

hardcoded message length

make file

jar file