introduction - computer science · parallel computing: multi-processors/cores, parallel systems,...

Organisation Topics Basics Echo Further Readings Project

Introduction

Radu NicolescuDepartment of Computer Science

University of Auckland

16 July 2018

1 / 35

1 Organisation

2 Distributed algorithms

3 Basics

4 Echo algorithm

5 Further algorithms

6 Readings

7 Project and practical work

2 / 35

Organisation

• Before the break: weeks 1–4; after the break: weeks 11–12

• Contents: an introduction to a few fundamental distributedalgorithms

• Exam: theory of the distributed algorithms

• Two projects: practical implementations (emulations ofspecific distributed algorithms on a single computer)

4 / 35

Distributed algorithms

• Computing: computer systems, networking, computerarchitectures, computer programming, algorithms

• Parallel computing: multi-processors/cores, parallel systems,threading, concurrency, data sharing and races, parallelprogramming (data and task parallelism), parallel algorithms

• Distributed computing: distributed systems, message passing,communication channels, distributed architectures, distributedprogramming, distributed algorithms

• concurrency of components• paramount messaging time• lack of global clock in the async case• independent failure of components

• read more: http://en.wikipedia.org/w/index.php?

title=Distributed_computing&oldid=616081922

6 / 35

Overlap between parallel and distributed computing

• One litmus test:

• Parallel computing: tight coupling between tasks

• Distributed computing: loose coupling between nodes

• Another difference – specific for algorithms:

• In classical algorithms, the problem is encoded and given tothe (one single) processing element

• In parallel algorithms, the problem is encoded and partitioned,and then its chunks are given to the processing elements

• In distributed algorithms, the problem is given by the networkitself and solved by coordination protocols

• read more: http://en.wikipedia.org/w/index.php?

title=Distributed_algorithm&oldid=594511669

7 / 35

Typical scenario in distributed computing

• computing nodes have local memories and unique IDs

• nodes are arranged in a network: graph, digraph, ring, ...

• neighbouring nodes can communicate by message passing

• independent failures are possible: nodes, communicationchannels

• the network topology (size, diameter, neighbourhoods) andother characteristics (e.g. latencies) are often unknown toindividual nodes

• the network may or may not dynamically change

• nodes solve parts of a bigger problem or have individualproblems but still need some sort of coordination – which isachieved by distributed algorithms

8 / 35

Recall from graph theory

• Graphs, edges, digraphs, arcs, degrees, connected, completegraphs, size, distance, diameter, radius, paths, weights/costs,shortest paths, spanning trees, ...

• distance between a pair of nodes = minimum length of aconnecting path, aka length of a shortest path (hop count inunweighted graphs)

• min

• diameter = maximum distance, between any pair of nodes

• max min

• radius = minimum maximum distance, over all nodes(minimum attained at centers)

• min max min

10 / 35

Recall from graph theory

• Three distinct spanning trees (rooted at 1) on the sameundirected graph

• Distances? Diameter? Radius? Centres?

11 / 35

Rounds and steps

• Nodes work in rounds (macro-steps), which have threesub-steps:

1 Receive sub-step: receive incoming messages

2 Process sub-step: change local node state

3 Send sub-step: send outgoing messages

• Note: some algorithms expect null or empty messages, as anexplicit confirmation that nothing was sent

12 / 35

Timing models

1 Synchronous models

• nodes work totally synchronised – in lock-step

• easiest to develop, but often unrealistic and less efficient

2 Asynchronous models

• messages can take an arbitrary unbounded time to arrive

• often unrealistic and sometimes impossible

3 Partially synchronous models

• some time bounds or guarantees

• more realistic, but most difficult

• Quiz: guess which models have been first studied?Heard of Nasreddin Hodja’s lamp? ,

13 / 35

Synchronous model – equivalent versions

• Synchronous model – version 1

• all nodes: process takes 1 time unit

• all messages: transit time (send −→ receive) takes 0 time units

• Synchronous model – version 2

• all nodes: process takes 0 time units

• all messages: transit time (send −→ receive) takes 1 time units

• The second (and equivalent) version ensures that synchronisedmodels are a particular case of asynchronous models

14 / 35

Asynchronous model

• Asynchronous model

• all nodes: process takes 0 time units

• each message (individually): transit time (send −→ receive)takes any real number of time units

• or, normalised: any real number in the [0, 1] interval

• thus synchronous models (version 2) are a special case

• often, a FIFO guarantee, which however may affect the abovetiming assumption (see next slide)

15 / 35

Asynchronous model

• Asynchronous model with FIFO channels

• the FIFO guarantee: faster messages cannot overtake earlierslower messages sent over the same channel

• congestion (pileup) may occur and this should be accounted for

• the timing assumption of the previous slide only applies to thetop (oldest) message in the FIFO queue

• after the top message is delivered, the next top message isdelivered after an additional arbitrary delay (in R or [0, 1])...[Lynch]

• essentially, a FIFO “channel” may not be a simple channel, butneed some middleware (to manage the message queue)

• suggestion to develop robust models, who do not rely on anyimplicit FIFO assumption [Tel]

16 / 35

Nondeterminism

• In general, both models are non-deterministic

• Sync and async: often a choice needs to be made betweendifferent requests (e.g. to consider an order among neighbours)

• Async: messages delivery times can vary (even very widely)

• However, all executions must arrive to a valid decision (notnecessarily the same, but valid)

17 / 35

Starting and stopping options

• Starting options

• Single-source or centralised or diffusing: starts from adesignated initiator node

• Many-source or decentralised: starts from many nodes, oftenfrom all nodes, at the same time

• Termination options

• Easy to notice from outside, but how do the nodes themselvesknow this?

• Sometimes one node (often the initiator) takes the finaldecision and can notify the rest (usually omitted phase)

• In general, this can be very challenging and requiressophisticated control algorithms for termination detection

18 / 35

Echo algorithm

• Echo is a fundamental diffusing (single source) algorithm,which is the basis for several others

• combines two phases (imagine a stone thrown into a pondcreating waves which echo back)

• a broadcast, ”top-down” phase, which builds child-to-parentlinks in a spanning tree

• a convergecast, ”bottom-up” or echo phase, which confirmsthe termination

• parent-to-child links can be built either at convergecast time orby additional confirmation messages immediately after thebroadcast (not shown in the next slides)

• after receiving echo tokens from all its neighbours, the sourcenode decides the termination (and can optionally start a thirdphase, to inform the rest of this termination)

20 / 35

Echo Algorithm - Sync

Time Units = 0

Messages = 0

21 / 35

Time Units = 1

Messages = 3

21 / 35

Time Units = 2

Messages = 7

21 / 35

Time Units = 3

Messages = 10

21 / 35

Time Units = 3 = 2D + 1

Messages = 10 = 2|E|

21 / 35

Echo programs (Tel)

• Each node has a list, Neigh, indicating its neighbours

• There are two different programs: (1) for the initiator, and (2)for the non-initiators

• Here, the programs are high-level pseudocode, not in the basicReceive/Process/Send state-machine format (but can bereduced to this)

• With the exception of the initiator’s first step, nodes becomeactive only after receiving at least one message; i.e. nodes areidle (passive) between sending and receiving new messages

• Exercise: try to translate the following pseudocodes into astate machine format

22 / 35

Echo program for initiator (Tel)

1 l e t p a r e n t = nu l l2 l e t rec = 034 f o r q i n Neigh do5 send tok to q67 whi le rec < | Neigh | do8 r e c e i v e tok // blocking call or not?9 rec += 1

1011 d e c i d e

23 / 35

Echo program for non-initiators (Tel)

1 l e t p a r e n t = nu l l2 l e t rec = 034 r e c e i v e tok from q // non-deterministic choice!5 p a r e n t = q6 rec += 178 f o r q i n Neigh \ p a r e n t do9 send tok to q

1011 whi le rec < | Neigh | do // count all received tokens12 r e c e i v e tok // forward and return13 rec += 11415 send tok to p a r e n t

24 / 35

Sync vs. Async

• Using the informal complexity measures appearing in thepreceding slides (there are others), we conclude

• Sync: time complexity = O(D), message complexity = O(|E |)

• The same Echo programs run also in the async mode andobtain valid results

• However, the runtime complexity measure changes(drastically)

• Async: time complexity=O(|V |), message complexity=O(|E |)

• Why?

• For the time complexity, we take the supremum over allpossible normalised executions (delivery time in [0, 1])

25 / 35

Echo Algorithm - Async

Time Units = 0

Messages = 0

26 / 35

Time Units = ε

Messages = 3

26 / 35

Time Units = 2ε

Messages = 4

26 / 35

Time Units = 3ε

Messages = 6

26 / 35

Time Units = 4ε

Messages = 7

26 / 35

Time Units = 1

Messages = 7

26 / 35

Time Units = 2

Messages = 8

26 / 35

Time Units = 3

Messages = 9

26 / 35

Time Units = 4

Messages = 10

26 / 35

Time Units on Broadcast = 1 = D

Messages = 10

Time Units on Convergecast = 3 = |V | − 1

Total Time Units = 4

26 / 35

Further algorithms

• Besides some basic fundamental algorithms, we intend tocover (or have a good look at) some of the most challengingdistributed algorithms

• Distributed MST (Minimal Spanning Tree): Dijkstra prize 2004

• Byzantine agreement: ”the crown jewel of distributedalgorithms” – Dijkstra prize 2005, 2001

• The CAP theorem (Consistency, Availability, Partitiontolerance)

• The relation between Byzantine algorithms and blockchains

• all these have practical significance and applications

28 / 35

Readings

• Textbooks (in library, also limited previews on Google books)

• Lynch, Nancy (1996). Distributed Algorithms. San Francisco,CA: Morgan Kaufmann Publishers. ISBN 978-1-55860-348-6

• Tel, Gerard (2000), Introduction to Distributed Algorithms,Second Edition, Cambridge University Press, ISBN978-0-52179-483-1

• Research and overview articles (generally overlapping thetextbooks) will be indicated for each topic

30 / 35

Readings (optionally)

• Optionally, my articles (in collaboration) on bio-inspiredmodels for distributed algorithms may offer additional views

• network discovery

• firing squad synchronisation (graphs and digraphs)

• DFS, BFS, edge and node disjoint paths

• Byzantine agreement

• dynamic programming, optimisations

• NP complete/hard problems (SAT, TSP)

• image processing (stereo-vision, skeletonisation, regiongrowing)

• asynchronicity

• Several pre-print versions published in the CDMTCS researchreports https://www.cs.auckland.ac.nz/

staff-cgi-bin/mjd/secondcgi.pl?date31 / 35

Project and practical work

• Practical work is necessary to properly understand theconcepts

• Unfortunately, we cannot afford to experiment with realdistributed systems (sorry for this /)

• We need to play the ”Game of Distribution” and simulate or,better, emulate distributed systems on a single lab machine

• For uniformity and fairness in marking, we use .NET,specifically with C# (or F#)

• The final exam is focused on concepts, does not contain”technological” questions (related to .NET)

33 / 35

Prerequisites (cf. 335)

• Familiarity with C#, at least to the level presented in the C#Specifications, Chapter Introduction: http://www.

microsoft.com/en-nz/download/details.aspx?id=7029

• Familiarity with basic .NET API or ability to learn this on thefly, as needed

• Familiarity with the specific API that we will use to emulatedistributed systems

• Familiarity with Visual Studio 2017 (as in the labs)

• Familiarity with searching and reading the MSDN library

• Familiarity with Linqpad http://www.linqpad.net/

34 / 35

How to emulate sync and async distributed systems?

• Over the time, we have considered a variety of technologies

• .NET Remoting (similar to Java RMI)

• WCF = Windows Communication Foundation (high-level layerover TCP and HTTP etc)

• WebAPI

• TPL = Task Parallel Library

• TPL Dataflow Library, which promotes actor/agent-orienteddesigns

• This year we will use... WCF!

• We will have one dedicated lecture on this

35 / 35

introduction - computer science · parallel computing: multi-processors/cores, parallel systems,...

Documents

memory and threading error checker intel® parallel...

tackling memory and concurrency barriers for modern parallel...

concurrency utilities in java 8 - tech...

java concurrency (multi-threading) - tutorial

d4: fast concurrency debugging with parallel differential...

ssc - concurrency and multi-threading basic...

concurrency, parallelism and coroutines...concurrency,...

parallel programming in .net 4.0 tasks and threading

ssc - concurrency and multi-threading thread pool and

concurrency - cs.huji.ac.il · concurrency does not...

threading, events, and...

ssc - concurrency and multi-threading high level...

cs 347: parallel and distributed data management notes05:...

correcting threading errors with intel® parallel...

programming patterns concurrency & parallel

parallel programming with openmp -...

parallel programming course threading building blocks...

concurrency analysis for parallel programs with textually...

threading - asia.kyocera.com · j3 summary of internal...

concurrency and parallel computing€¦ · • concurrency...