grapevine: an exercise in distributed computing landon cox february 16, 2016

59
Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Upload: jesse-golden

Post on 20-Jan-2018

213 views

Category:

Documents


0 download

DESCRIPTION

Translating hostname to IP addr Hostname  IP address Performed by Domain Name Service (DNS) Used to be a central server /etc/hosts at SRI What’s wrong with this approach? Doesn’t scale to the global Internet

TRANSCRIPT

Page 1: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Grapevine: An Exercise in Distributed Computing

Landon CoxFebruary 16, 2016

Page 2: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Naming other computers• Low-level interface• Provide the destination MAC address• 00:13:20:2E:1B:ED

• Middle-level interface• Provide the destination IP address• 152.3.140.183

• High-level interface• Provide the destination hostname• www.cs.duke.edu

Page 3: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Translating hostname to IP addr

• Hostname IP address• Performed by Domain Name Service

(DNS)• Used to be a central server• /etc/hosts at SRI

• What’s wrong with this approach?• Doesn’t scale to the global Internet

Page 4: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

DNS• Centralized naming doesn’t scale

• Server has to learn about all changes• Server has to answer all lookups

• Instead, split up data • Use a hierarchical database• Hierarchy allows local management of changes• Hierarchy spreads lookup work across many computers

Page 5: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016
Page 6: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Where is www.wikipidia.org?

Page 7: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Example: linux.cs.duke.edu• nslookup in interactive mode

Page 8: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Translating IP to MAC addrs• IP address MAC address

• Performed by ARP protocol within a LAN• How does a router know the MAC address of

152.3.140.183?• ARP (Address Resolution Protocol)• If it doesn’t know the mapping, broadcast through switch• “Whoever has this IP address, please tell me your MAC

address”• Cache the mapping• “/sbin/arp”

• Why is broadcasting over a LAN ok?• Number of computers connected to a switch is relatively small

Page 9: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Broadcast on local networks• On wired ethernet switch

• ARP requests/replies are broadcast• For the most part, IP communication is not broadcast (w/

caveats)• What about on a wireless network?

• Everything is broadcast• Means hosts can see all unencrypted traffic

• Why might this be dangerous?• Means any unencrypted traffic is visible to others• Open WiFi access points + non-SSL web requests and

pages• Many sites send cookie credentials in the clear …

• Use secure APs and SSL!

Page 10: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

High-level network overview

Workstation

Workstation

Workstation

Server

Workstation

Workstation

Gateway

Server

Gateway Workstation

Workstation

Workstation

Ethernet

EthernetEthernet

Page 11: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Client-server• Classic and convenient structure for distributed

systems• How do clients and servers differ?

• Servers have more physical resources (disk, RAM, etc.)• Servers are trusted by all clients

• Why are servers more trustworthy?• Usually have better, more reliable hardware• Servers are better administered (paid staff watch over them)

• Servers are kind of like the kernel of a distributed system• Centralized concentration of trust• Support coordinated activity of mutually distrusting clients

Page 12: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Client-server• Why not put everything on one server?

• Scalability problems (server becomes overloaded)• Availability problems (server becomes single point of failure)• Want to retain organizational control of some data (some

distrust)• How do we address these issues?

• Replicate servers• Place multiple copies of server in network• Allow clients to talk to any server with appropriate

functionality• What are some drawbacks to replication?

• Data consistency (need sensible answers from servers)• Resource discovery (which server should I talk to?)

Page 13: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Client-server• Kernels are centralized too• Subject to availability, scalability problems

• Does it make sense to replicate kernels?• Perhaps for multi-core machines• Assign a kernel to each core• Separate address spaces of each kernel• Coordinate actions via message passing• Multi-core starts to look a lot like a distributed

system

Page 14: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Grapevine services• Message delivery• Send data to specified users

• Access control• Only allow specified users to access name

• Resource discovery• Where can I find a printer?

• Authentication• How do I know who I am talking to?

Page 15: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Registration servers• What logical data structure is replicated?

• The registry• RName Group entry | Individual entry

• What does an RName look like?• Character string F.R• F is a name (individual or group)• R is a registry corresponding to a data partition

• At what grain is registration data replicated?• Servers contain copies of whole registries• Individual server unlikely to have copy of all registries

Page 16: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

RNames

RNamename.registry

Group{RName1, …, RNameN}

IndividualAuthenticator (password),

Inbox sites,Connect site

What two entities are represented by an individual entry?Users and servers

Page 17: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

RNames

RNamename.registry

Group{RName1, …, RNameN}

IndividualAuthenticator (password),

Inbox sites,Connect site

How does an individual entry allow communication with a user?Inbox sites for users

Page 18: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

RNames

RNamename.registry

Group{RName1, …, RNameN}

IndividualAuthenticator (password),

Inbox sites,Connect site

How does an individual entry allow communication with a server?Connect site for servers

Page 19: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Namespace• RNames provide a symbolic namespace

• Similar to file-system hierarchy or DNS• Autonomous control of names within a registry

• What is the most important part of the namespace?• *.gv (for Grapevine)• *.gv is replicated at every registration server

• Who gets to define the other registries?• All other registries must have group entry under *.gv• Owners of *.gv have complete control over other registries

• In what way do file systems and DNS operate similarly?• ICANN’s root DNS servers decide top-level domains• Root user controls root directory “/”

Page 20: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Resource discovery• How do clients locate server replicas?• Get list of all registries via “gv.gv”• Find registry name for service (e.g., “ms”)• Lookup group ms.gv at registration server• ms.gv returns a list of available servers (e.g.,

*.ms)• At this point control is transferred to

service• Service has autonomous control of its namespace• Service can define its own namespace conventions

Page 21: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Implementing services• Mail servers are replicated

• Any message server accepts any delivery request• All message servers can forward to others• An individual may have inboxes on many servers

• How does a client identify a server to send a message?• Find well-known name “MailDrop.ms” in *.ms• MailDrop.ms maps to mail servers• Any mail server can accept a message• Mail servers forward message to servers hosting users’ inboxes

• Note that the mail service makes “MailDrop.ms” special• Grapevine only defines semantics of *.gv• Grapevine delegates control of semantics of *.ms to mail service• Similar to imap.cs.duke.edu or www.google.com

Page 22: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Resource discovery• Bootstrapping resource discovery

• Rely on lower-level methods• Broadcast to name lookup server on Ethernet• Broadcast to registration server on Ethernet

• What data does the name lookup server store?• Simple string to Internet address mappings• Infrequently updated (minimal consistency issues)• Well-known GrapevineRServer addrs of registration servers

• What does this remind you of on today’s networks?• Dynamic host configuration protocol (DHCP)• Clients broadcast DHCP request on Ethernet• DHCP server (usually on gateway) responds with IP addr, DNS info

Page 23: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating replicated servers• At some point need to update registration database

• Want to add new machines• Want to reconfigure server locations

• Why not require updates to be atomic at all servers?• Requires that most servers be accessible to even start• All kinds of reasons why this might not be true• Trans-Atlantic phone line might be down• Servers might be offline for maintenance• Servers might be offline due to failure

• Instead embrace the chaos of eventual consistency• Might have transient differences between server state• Eventually everything will look the same (probably!)

Page 24: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• Information included in timestamps• Time + server address• Timestamps are guaranteed to be unique• Provides a total order on updates from a server

• Does the entry itself need a timestamp (a version)?• Not really, can just compute as the max of item timestamps• Entry version is a convenient optimization

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 25: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• Operations on an entries• Can add/delete items from lists• Can merge lists• Operations update item timestamps, modify

list content

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 26: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• How are updates propagated?• Asynchronously via the messaging service (i.e., *.ms)• Does not require all servers to be online• Updates can be buffered and ordered

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 27: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• How fast is convergence?• Registration servers check their inbox every 30 seconds• If all are online, state will converge in ~30 seconds• If server is offline, may take longer

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 28: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• What happens if two admins update concurrently?• “it is hard to predict which one of them will prevail.”• “acceptable“ because admins aren’t talking to each

other• Anyone make sense of this?

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 29: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• Why not just use a distributed lock?• What if a replica is offline during acquire, but reappears?• What if lock owner crashes?• What if lock maintainer crashes?

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 30: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• What if clients get different answers from servers?• Clients just have to deal with it (•_•) ( •_•)>⌐■-■ (⌐■_■)

• Inconsistencies are guaranteed to be transient• May not be good enough for some applications

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 31: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• What happens if a change message is lost during prop.?• Could lead to permanent inconsistency• Periodic replica comparisons and mergers if needed• Not perfect since partitions can prevent propagation

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 32: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• What happens if namespace is modified concurrently?• Use timestamps to pick a winner (last writer wins)

• Why is this potentially dangerous?• Later update could be trapped in offline machine• Updates to first namespace accumulate• When offline machine goes online, all work to first is thrown

out

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 33: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Updating the database

• What was the solution?• “Shouldn’t happen in practice.”• Humans should coordinate out-of-band• Probably true, but a little unsatisfying

Registration EntryList 1

Active items:{str1|t1, …, strn|tn}

Deleted items:{str1|t1, …, strm|tm}

List 2Active items

Deleted items

Page 34: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Why read Grapevine?• Describes many fundamental

problems• Performance and availabilityCaching and replicationConsistency problems

We still deal with many of these issues

Page 35: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Keeping replicas consistent• Requirement: members of write set agree

• Write request only returns if WS members agree• Problem: things fall apart

• What do we do if something fails in the middle?• This is why we had multiple replicas in first place

• Need agreement protocols that are robust to failures

Page 36: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commit• Two phases

• Voting phase• Completion phase

• During the voting phase• Coordinator proposes value to rest of group• Other replicas tentatively apply update, reply “yes” to coordinator

• During the completion phase• Coordinator tallies votes• Success (entire group votes “yes”): coordinator sends “commit” message• Failure (some “no” votes or no reply): coordinator sends “abort” message• On success, group member commits update, sends “ack” to coordinator• On failure, group member aborts update, sends “ack” to coordinator• Coordinator aborts/applies update when all “acks” have been received

Page 37: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 1

Coordinator

Replica

Replica

Replica

Page 38: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 1

Coordinator

Replica

Replica

Replica

Propose: X 1

Prop

ose:

X 1

Propose: X 1

Page 39: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 1

Coordinator

Replica

Replica

Replica

Yes

Yes

Yes

X 1

X 1

X 1

Page 40: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

3 Yes votes

Page 41: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

Commit: X 1

Commit:

X

1Commit: X 1

Page 42: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

Page 43: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

ACK

ACKACK

Page 44: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 1

• What if fewer than 3 Yes votes?• Replicas will time out and assume

update is aborted

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

2 Yes votes Yes

NoYes

Page 45: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 1

• What if fewer than 3 Yes votes?• Replicas do not commit

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

2 Yes votes Abort: X 1

Abort:

X

1Abort: X 1

Page 46: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 1

• Why might replica vote No?• Replicas will time out and

assume update is aborted

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

2 Yes votes Yes

NoYes

Page 47: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 1

• Why might replica vote No?• Might not be able to acquire local

write lock• Might be committing w/ another

coord.

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

2 Yes votes Yes

NoYes

Page 48: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

• What if coord. fails after vote msg, before decision msg?• Replicas will time out and assume

update is aborted

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

3 Yes votes

Page 49: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

• What if coord. fails after vote msg, before decision msg?• Replicas will time out and assume

update is aborted

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

3 Yes votes

Page 50: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

• What if coord. fails after decision messages are sent?• Replicas commit update

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

Commit: X 1

Commit:

X

1Commit: X 1

Page 51: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

• What if coord. fails after decision messages are sent?• Replicas commit update

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

Page 52: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

• What if coord. fails while decision messages are sent?• If one replica receives a commit, all must

commit• If replica time out, check with other members• If any member receives a commit, all commit

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

Commit:

X

1

Page 53: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

• What if coord. fails while decision messages are sent?• If one replica receives a commit, all must

commit• If replica time out, check with other members• If any member receives a commit, all commit

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

Page 54: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 2

• What if coord. fails while decision messages are sent?• If one replica receives a commit, all must

commit• If replica time out, check with other members• If any member receives a commit, all commit

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

Page 55: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 1 or 2

• What if replica crashes during 2PC?• Coordinator removes it from the replica

group• If replica recovers it can rejoin the group

later

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

Page 56: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commitPhase 1 or 2

• What if replica crashes during 2PC?• Coordinator removes it from the replica

group• If replica recovers it can rejoin the group

later

Coordinator

Replica

Replica

Replica

X 1

X 1

X 1

Page 57: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commit• Anyone detect circular dependencies here?

• How do we agree on the coordinator?• How do we agree on the group membership?

• Need more powerful consensus protocols• Can become very complex• Protocols vary depending on what a “failure” is• Will cover in-depth very soon

• Two classes of failures• Fail-stop: failed nodes do not respond• Byzantine: failed nodes generate arbitrary outputs

Page 58: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Two-phase commit• What’s another problem with this protocol?

• It’s really slow• And it’s slow even when there are no failures (the

common case)• Consistency often requires taking a

performance hit• As we saw it can also undermine availability• Can think of an unavailable service as a really slow

service

Page 59: Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

Course administration• Project 2 questions?• Animesh is working on a test suite

• Mid-term exam• Friday, March 11• Responsible for everything up to that

point