distributed computation

21
Lucas Shen Aug/5/2014 A Note on Distributed Computing Jim Waldo, Geoff Wyant, Ann Wolrath, Sam Kendall

Upload: lucas-shen

Post on 27-Jun-2015

150 views

Category:

Software


4 download

DESCRIPTION

presentation on the papaer

TRANSCRIPT

Page 1: Distributed computation

Lucas Shen Aug/5/2014

A Note on Distributed ComputingJim Waldo, Geoff Wyant, Ann Wolrath, Sam Kendall

Page 2: Distributed computation

✤ Why this subject? For who?

✤ Terminology

✤ Unified vision

✤ What’s the problem?

✤ Example: NFS @Sun

✤ Conclusion

Page 3: Distributed computation

Why this subject? For who?

cloud

cluster App

googleAzure

Amazon

simple instanceCPU cluster

GPU cluster Hadoop

Spark

Designer

Programmer

IaaSSaaS

Dropbox

Page 4: Distributed computation

Terminology

<Local computing>programs are confined to a single address space

<Distributed computing>programs make calls to other address space, even another machine.

Page 5: Distributed computation

Unified Vision

from the programmer’s point of view, there is no essential distinction between objects that share an address space and objects that are on two machines with different architectures located on different continents.

Page 6: Distributed computation

How?

1. write the application without worrying about where objects are located and how their communication is implemented.

2. tune performance by “concretizing” object locations and communication methods.

3. test with “real bullets” (e.g., networks being partitioned, machines going down)

Page 7: Distributed computation

Advantage to do so..

✤ The granularity of change could be done from the level of the entire system to the level of the individual object.

✤ As long as the interfaces between objects remain constant, the implementations of those objects can be altered at will.

✤ An object can be repaired and the repair installed without worry that the change will impact the other objects that make up the system.

Page 8: Distributed computation

Based on…….. what belief?

1. there is a single natural object-oriented design for a given application, regardless of the context in which that application will be deployed

2. failure and performance issues are tied to the implementation of the components of an application, and consideration of these issues should be left out of an initial design

3. the interface of an object is independent of the context in which that object is used.

Page 9: Distributed computation

01

What’s wrong?

✤ Local and distributed computing are very different. You should take it into account at the very beginning.

✤ You? who?

Page 10: Distributed computation

Stop avoiding problems

Designer Programmer

The danger lies in promoting the myth that “remote access and local access are exactly the same” and not enforcing the myth.

vs

Page 11: Distributed computation

Differences

✤ Latency

✤ Memory Access

✤ Partial failure

Page 12: Distributed computation

Latency

✤ local object invocation vs remote: 4~5 order of magnitude

✤ should decide what object should be local and what could be remote?

✤ two solution:

1. Just ignore this issue, hardware advancement will make the difference irrelevant

2. need tools that will allow one to see the pattern of communication between objects that make up an application.then tune the system

Page 13: Distributed computation

Memory access

✤ pointers: ptr in local address space is not valid in in another address space

✤ two choice:

1. all memory access must be controlled by an underlying system, like distributed shared memory

2. programmer be aware of the different type mem access

Designer Programmervs

Page 14: Distributed computation

Partial failure

✤ Components fail are common, not exceptions

✤ no common agent that is able to determine what component has failed and informs others of that failure

✤ Since no so called global-state in distributed system, how to take and fast recover from failures?

Page 15: Distributed computation

Two paths

1. design interfaces of objects as if they were all local

✤ fragile & not robust in any sense =.=

2. design interfaces as if they were all remote

✤ worst case scenario

✤ introduces unnecessary guarantees for object that are never intended to be used remotely..

why so hard?

:Distributed system has no single point of resource allocation, synchronization, or failure recovery, and thus is conceptually very different.

GFS, master node <—> fully distributed

Page 16: Distributed computation

Lesson learned : NFS@Sun

✤ NFS: Sun’s distributed file system

✤ Designers were unwilling to change the interface to the file system to reflect the distributed nature of file access.

✤ example of non-distributed API(open,read, write, close) reimplemented in a distributed way

Page 17: Distributed computation

Soft mount: NFS@Sun

✤ expose network or server failure to the client program. Read and write operations return a failure status much more often than in the single-system case.

✤ programs written with no allowance for these failures can easily corrupt the files used by the program.

Page 18: Distributed computation

Hard mount: NFS@Sun

✤ means: the application hangs until the server comes back up

✤ one server crashes, and many workstations—even those apparently having nothing to do with that server—freeze

Page 19: Distributed computation

why?

✤ The limitations on the reliability and robustness of NFS is not because the implementation of the parts of that system.

✤ In the NFS, an interface was designed for non-distributed computing where partial failure was not possible.

✤ the limitations on the robustness have set a limitation on the scalability of NFS.

Page 20: Distributed computation

conclusion (knowing the difference is the start of advancement) @1994

✤ They are different, and you should take the differences seriously.

✤ to be conscious of those differences at all stages of the design and implementation of distributed applications.

✤ Organization: allocate its research and engineering resources more wisely. Rather than using those resources in attempts to paper over the differences between the two kinds of computing, resources can be directed at improving the performance and reliability of each.

✤ Engineers: have to know whether they are sending messages to local or remote objects, and access those objects differently.

✤ As an user of nowadays cloud services, they work pretty good. But if we want to build a private cloud or cluster in the garage, we need to take care of those details.

Page 21: Distributed computation

01

Thanks for your time