unbounded barrier-synchronized concurrent asms for

28
Zhejiang University-University of Illinois at Urbana Champaign Institute Unbounded Barrier-Synchronized Concurrent ASMs for Effective MapReduce Processing on Streams Zilinghan Li 1 , Shilan He 1 , Yiqing Du 1 , Senรฉn Gonzรกlez 2 , Klaus-Dieter Schewe 1 1 Zhejiang University, UIUC Institute, Haining, China 2 P&T Connected, Linz, Austria {zilinghan.18|shilan.18|yiqing.18|kd.schewe}@intl.zju.edu.cn, [email protected]

Upload: others

Post on 10-Jul-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unbounded Barrier-Synchronized Concurrent ASMs for

Zhejiang University-University of Illinois at Urbana Champaign Institute

Unbounded Barrier-Synchronized

Concurrent ASMs for Effective MapReduce

Processing on Streams

Zilinghan Li1, Shilan He1, Yiqing Du1, Senรฉn Gonzรกlez2,

Klaus-Dieter Schewe1

1 Zhejiang University, UIUC Institute, Haining, China 2 P&T Connected, Linz, Austria

{zilinghan.18|shilan.18|yiqing.18|kd.schewe}@intl.zju.edu.cn,

[email protected]

Page 2: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Contents

Introduction

Research Questions

Methodology

โ€ข Stream Query Classification and their MapReduce

Model

โ€ข Extended BSP Model (Inf-Ag-BSP Model)

Next Steps

Conclusions

Page 3: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Introduction

MapReduce Algorithm [1]

โ€ข Processing large but finite data sets in an asynchronous and

distributed way.

โ€ข Comprising map phase, shuffle phase and reduce phase.

Bulk Synchronous Parallel (BSP) Bridging Model [2]

โ€ข Model for parallel computation.

โ€ข Finite and fixed number of agents.

โ€ข Comprising a sequence of supersteps, and each superstep is composed

of a computation phase and communication phase.

โ€ข The synchronization method is barrier-synchronization.

[1] Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Conference

on Symposium on Opearting Systems Design & Implementation โ€“ Volume 6. pp. 10โ€“10

[2] Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103โ€“111 (1990).

https://doi.org/10.1145/79173.79181

Page 4: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Connection between BSP model and MapReduce algorithm:

MapReduce can be realized based on BSP model.

Introduction

Page 5: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Introduction Stream query ๐’ฌ: ๐‘†๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘š โ†’ ๐‘†๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘š is abstractly compuatable if there exists

a kernel function ๐พ: ๐‘“๐‘–๐‘›๐‘†๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘š โ†’ ๐‘“๐‘–๐‘›๐‘†๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘š such that ๐’ฌ can be

obtained by concatenating (โŠ™)1 the results of kernel function ๐พ applied

to larger and larger prefixes of the input. [1]

Behavioral Theory: It consists of three parts

โ€ข An axiomatization of a class of algorithms

โ€ข A definition of an abstract machine model

โ€ข A proof that the abstract machine model captures the class of

algorithms stipulated by the axiomatization.

1. The concatenation (โŠ™) used here is not same as the common concatenation denoted by โˆ‘. It works more like aggregation and its

real functionality varies among different scenarios, but we still use the term concatenation to be consistent with the initial literature.

[1] Gurevich, Y., Leinders, D., Van den Bussche, J.: A theory of stream queries. In: Arenas, M., Schwartzbach, M.I. (eds.) DBPL 2007.

LNCS, vol. 4797, pp. 153โ€“168. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75987-4_11

Page 6: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Research Question

Primary Question: Can MapReduce algorithm, based on BSP model, be

used to handle stream, which is assumed to come indefinitely?

Following Question: After solving the primary question, we find that the

general BSP model is not sufficient for some stream queries, which brings

about another question on how to extend the general BSP model.

Page 7: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Methodology: Stream Query Classification

When handling stream queries, queries may require some

information on previous stream.

According to the different requirements for previous data, we

classify stream queries into three different classes, and each class

has its own MapReduce model.

Three classes of stream queries:

โ€ข Memoryless

โ€ข Semi-Memoryless

โ€ข Memorable

Page 8: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Memoryless Stream Queries

Property: Results of new coming stream are independent of previous

input streams, and the outputs of larger stream can be obtained by

direct concatenation of the outputs of previous streams.

Mathematically,

Example: Query ๐’ฌ1 returns an output stream containing all numbers

greater than a threshold value ๐‘ก๐‘ฅ. For this query, nothing about

previous stream is needed when handling the current input stream.

Page 9: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Memoryless Stream Queries MapReduce Model: Since there are no dependency on previous input, we

can simply adapt the general MapReduce model based on BSP.

Page 10: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Semi-Memoryless Stream Queries

Property: To obtain the result of new coming stream, in addition to the

outputs of previous streams, some aggregated information of previous

streams, denoted as ๐ˆ(๐’”๐’‘๐’“๐’†๐’—), is also required.

Mathematically, denote previous stream as ๐’”๐Ÿ, current stream as ๐’”๐Ÿ:

Example: Query ๐’ฌ2 returns the running average value of the numbers

arrived so far. For ๐’ฌ2, we not only need to know the average of previous

stream ๐‘Ž๐‘ฃ๐‘” ๐’”1 , we also need its length. Namely, ๐ˆ ๐’”1 = {๐‘™๐‘’๐‘›(๐’”1)}.

Page 11: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Semi-Memoryless Stream Queries MapReduce Model: The processors in Map phase should not only give the

result of kernel/map function ๐พ(๐’”๐’Š), it should also give the information set

(or called tag) ๐ˆ ๐’”๐’Š .

Page 12: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Memorable Stream Queries

Property: To obtain the result of new coming stream, in addition to the

outputs of previous streams, a โ€œlargeโ€ set of information, or even the

whole previous input stream ๐’” is required.

Memorable query is just a special semi-memoryless query with the

cardinality of information set ๐ˆ ๐’”๐’Š in the ๐›ฉ-class1 of the cardinality of the

stream ๐’”๐’Š.

Mathematically,

1. ฮ˜-Class is the intersection of O-Class and ฮฉ-Class which provides an asymptotically tight bound for functions. For example,

3๐‘ฅ is in the ๐‘‚(๐‘ฅ2) and ๐›ฉ(๐‘ฅ), but NOT in ๐›ฉ(๐‘ฅ2), since ๐‘ฅ2 is not the tight bound of 3๐‘ฅ.

Page 13: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Memorable Stream Queries

Example: Query ๐’ฌ3 returns the median of the numbers arrived so far.

Given any sequence of numbers, every single number can be the

candidate of the median, which means that the whole stream ๐’” is

required to determine the median. Therefore, when denoting previous

stream as ๐’”๐Ÿ and current stream as ๐’”๐Ÿ, we have ๐ˆ ๐’”๐Ÿ = ๐’”๐Ÿ, and

๐ˆ ๐’”๐Ÿ = ๐’”๐Ÿ.

MapReduce Model: Since Memorable query is simply a special type of

semi-memoryless query, the MapReduce model is same.

Introduced Problems: When handling memorable stream query via

MapReduce, it requires a large amount of computing resources and a

large storage space. In this case, the BSP model, which has finite and

fixed number of agents, does not capture it well.

Page 14: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Methodology: Extended BSP Model (Inf-Ag-BSP)

The extended BSP Model is named as Infinite-Agent-BSP (Inf-Ag-

BSP) Model.

Features of Inf-Ag-BSP Model:

โ€ข Inherit most features of general BSP Model;

โ€ข Additional feature 1: Total number of available agents can be

countably infinite1;

โ€ข Additional feature 2: The agents can join or leave the set of

active agents in the BSP computation.

1. In theory, we can assume that the number is countably infinite, provided we restrict the model such that only finitely many of

them will be simultaneously active.

Page 15: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Inf-Ag-BSP Model

Handling additional features:

โ€ข (1) Modify the number of available agents to countably infinite

with other corresponding modifications: It is feasible since

BSP computation is restricted concurrent computation, which

allows infinite number of agents in total.

โ€ข (2.1) Introducing local variable ๐‘—๐‘œ๐‘–๐‘›๐‘— to the local signature of

each algorithm, and a joining set ๐’ฅ.

โ€ข (2.2) Agent should only join in the communication phase, and

leave in the computation phase.

Page 16: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Inf-Ag-BSP: Signature Restriction Signature1 Restriction

1. A signature โˆ‘ is a finite set of function symbols, and each ๐‘“ โˆˆ โˆ‘ is associated with arity ๐‘Ž๐‘Ÿ(๐‘“) โˆˆ โ„•.

Page 17: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Inf-Ag-BSP: Communication-and-Joining Postulation

Page 18: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Inf-Ag-BSP Abstract State Machine: Rules Intuitive interpretation for rules:

โ€ข process rules: rules in computation phase

โ€ข barrier rules: rules in communication phase

โ€ข join rules: rules for inactive agents to join in the BSP computation

Rules for Inf-Ag-BSP Abstract State Machine:

โ€ข process rule: Assignment rule ๐‘“ ๐‘ก1, โ€ฆ , ๐‘ก๐‘› โ‰” ๐‘ก0 with ๐‘“ โˆˆ โˆ‘๐‘–,๐‘™๐‘œ๐‘\ {๐‘๐‘Ž๐‘Ÿ๐‘– , ๐‘—๐‘œ๐‘–๐‘›๐‘–} is

a process rule. Assignment rules ๐‘๐‘Ž๐‘Ÿ๐‘– โ‰” ๐ญ๐ซ๐ฎ๐ž and ๐‘—๐‘œ๐‘–๐‘›๐‘– โ‰” ๐Ÿ๐š๐ฅ๐ฌ๐ž are also

process rules.

โ€ข barrier rule: Rule ๐‘๐‘–,๐‘— ๐‘ก1, โ€ฆ , ๐‘ก๐‘› โ‰” ๐‘ก0 with ๐‘— โ‰  ๐‘– is a barrier rule. Rule

๐‘๐‘Ž๐‘Ÿ๐‘– โ‰” ๐Ÿ๐š๐ฅ๐ฌ๐ž is also a barrier rule.

โ€ข join rule : The only possible join rule is the parallel composition of ๐‘—๐‘œ๐‘–๐‘›๐‘– โ‰”๐ญ๐ซ๐ฎ๐ž and ๐‘๐‘Ž๐‘Ÿ๐‘– โ‰” ๐ญ๐ซ๐ฎ๐ž.

โ€ข The parallel composition (๐‘Ÿ1|๐‘Ÿ2 โ€ฆ ๐‘Ÿ๐‘š and conditional composition

(๐ˆ๐… ๐œ‘ ๐“๐‡๐„๐ ๐‘Ÿ) of process rule (or barrier rule respectively) is also process rule

(or barrier rule respectively).

Page 19: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Inf-Ag-BSP Abstract State Machine

Definition: Inf-Ag-BSP Abstract State Machine

โ€ข Concurrent ASM {(๐‘Ž๐‘– , โ„ณ๐‘–)|๐‘– โˆˆ โ„•} โ€ข Rule for โ„ณ0: used to switch between computation and communication phase.

๐’ฅ is a finite set with ๐’ฅ = ๐‘– ๐‘ฃ๐‘Ž๐‘™ ๐‘—๐‘œ๐‘–๐‘›๐‘– = ๐ญ๐ซ๐ฎ๐ž}.

โ€ข Rule for โ„ณ๐‘– ๐‘– โ‰  0 : Inf-Ag-BSP rule defined as below. ๐‘Ÿ๐‘–,๐‘๐‘Ÿ๐‘œ๐‘ is process rule,

๐‘Ÿ๐‘–,๐‘๐‘œ๐‘š๐‘š is barrier rule and ๐‘Ÿ๐‘–,๐‘—๐‘œ๐‘–๐‘› is join rule.

Page 20: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Inf-Ag-BSP: Characterization Theorem

Characterization theorem

โ€ข (1) Inf-Ag-BSP-ASM satisfies the signature restriction

and communication-and-joining postulate in the

axiomatization

โ€ข (2) Every Inf-Ag-BSP algorithm defined by the

axiomatization can be simulated by an Inf-Ag-BSP-ASM.

Page 21: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

MapReduce for Memoryless Query via

Inf-Ag-BSP Model Compared with general BSP model, the main difference is allowing join and

leave behavior.

General BSP Model Inf-Ag-BSP Model

Page 22: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

MapReduce for Semi-Memoryless

Query via Inf-Ag-BSP Model

General BSP Model Inf-Ag-BSP Model

Page 23: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Next Steps

What we have now is a computation model, however, to really

achieve good performance in practice, we should develop a

scheduler to assign work to active processors efficiently and

effectively.

The extended model, Inf-Ag-BSP model is not only bounded to

implement MapReduce algorithm, its application on various fields can

be investigated further.

The API for extended BSP model should be developed as the

general BSP model.

Page 24: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Conclusions MapReduce algorithm can be extended to handle stream, which is

assumed to come indefinitely, via BSP model.

For different classes of stream queries (memoryless, semi-

memoryless and memorable), there are different MapReduce models

to capture them.

The requirement for large computing resources and large storage

space incentives the extension of general BSP model:

โ€ข BSP model gets extended to allow countably infinite number of

agents in total, and each agent can join/leave the BSP

computation.

โ€ข A behavioral theory is developed on the basis of general BSP

model.

Page 25: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Thanks for your attention!

Page 26: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Appendix: Proof for Characterization Theorem Definition of signature of ASM is consistent with signature restriction, it is satisfied trivially.

Proof: Inf-Ag-BSP-ASM satisfies communication-and-joining postulate.

(1) The cardinality of joining set ๐’ฅ๐‘› is finite;

โ€ข Trivially satisfied since ๐’ฅ is required to be always finite in the definition of โ„ณ.

(2) If ๐‘– โˆ‰ ๐’ฅ๐‘›, then either ๐‘Ž๐‘– โˆ‰ ๐ด๐‘”๐‘› or (๐‘—๐‘œ๐‘–๐‘›๐‘– , ๐ญ๐ซ๐ฎ๐ž) โˆˆ ฮ”๐‘›;

โ€ข From Inf-Ag-BSP rule, if val(๐‘—๐‘œ๐‘–๐‘›๐‘–) = ๐Ÿ๐š๐ฅ๐ฌ๐ž, the first two if-conditions of rule cannot

be satisfied and third one must be satisfied. Then the machine either ๐‘ ๐‘˜๐‘–๐‘ (i.e. ๐‘– โˆ‰ ๐’ฅ๐‘›)

or sets both ๐‘—๐‘œ๐‘–๐‘›๐‘– and ๐‘๐‘Ž๐‘Ÿ๐‘– to ๐ญ๐ซ๐ฎ๐ž (thus (๐‘—๐‘œ๐‘–๐‘›๐‘– , ๐ญ๐ซ๐ฎ๐ž) โˆˆ ฮ”๐‘›).

(3) & (4) conditions are related to barrier synchronization which are already proved in

the characterization theorem of general BSP modelโ€™s behavioral theorem.

(5) If non-trivial update (๐‘—๐‘œ๐‘–๐‘›๐‘– , ๐ญ๐ซ๐ฎ๐ž) โˆˆ ฮ”๐‘›, then (๐‘๐‘Ž๐‘Ÿ๐‘– , ๐ญ๐ซ๐ฎ๐ž) โˆˆ ฮ”๐‘›;

โ€ข Non-trivial update (๐‘—๐‘œ๐‘–๐‘›๐‘– , ๐ญ๐ซ๐ฎ๐ž) indicates that the initial value of ๐‘—๐‘œ๐‘–๐‘›๐‘– is ๐Ÿ๐š๐ฅ๐ฌ๐ž, so it

corresponds to the join rule ๐‘Ÿ๐‘–,๐‘—๐‘œ๐‘–๐‘›. According to the definition of join rule, we must

also have ๐‘๐‘Ž๐‘Ÿ๐‘– โ‰” ๐Ÿ๐š๐ฅ๐ฌ๐ž โˆˆ ๐‘Ÿ๐‘–,๐‘—๐‘œ๐‘–๐‘›. Hence, (๐‘๐‘Ž๐‘Ÿ๐‘– , ๐ญ๐ซ๐ฎ๐ž) โˆˆ ฮ”๐‘›.

Page 27: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Appendix: Proof for Characterization Theorem (contโ€™d)

Proof: Inf-Ag-BSP-ASM satisfies communication-and-joining postulate (contโ€™d) (6) If non-trivial update (๐‘—๐‘œ๐‘–๐‘›๐‘– , ๐Ÿ๐š๐ฅ๐ฌ๐ž) โˆˆ ฮ”๐‘›, then val๐‘†๐‘Ž๐‘– ๐‘›

(๐‘๐‘Ž๐‘Ÿ๐‘–) = ๐Ÿ๐š๐ฅ๐ฌ๐ž;

โ€ข The update (๐‘—๐‘œ๐‘–๐‘›๐‘– , ๐Ÿ๐š๐ฅ๐ฌ๐ž) โˆˆ ฮ”๐‘› must stem from a process rule, which can only be

executed if val ๐‘๐‘Ž๐‘Ÿ๐‘– = ๐Ÿ๐š๐ฅ๐ฌ๐ž.

(7) Whenever (๐‘๐‘Ž๐‘Ÿ๐‘– โˆง ยฌ๐‘๐‘Ž๐‘Ÿ๐‘Ÿ๐‘–๐‘’๐‘Ÿ) โˆจ (ยฌ๐‘๐‘Ž๐‘Ÿ๐‘– โˆง ๐‘๐‘Ž๐‘Ÿ๐‘Ÿ๐‘–๐‘’๐‘Ÿ) holds for any agent ๐‘– โˆˆ ๐’ฅ๐‘› in state

๐‘†๐‘›, then ฮ”๐‘Ž๐‘– ๐‘› ๐‘Ÿ๐‘’๐‘  ๐‘†๐‘›, ฮฃ๐‘– = โˆ…;

โ€ข (๐‘๐‘Ž๐‘Ÿ๐‘– โˆง ยฌ๐‘๐‘Ž๐‘Ÿ๐‘Ÿ๐‘–๐‘’๐‘Ÿ) โˆจ (ยฌ๐‘๐‘Ž๐‘Ÿ๐‘– โˆง ๐‘๐‘Ž๐‘Ÿ๐‘Ÿ๐‘–๐‘’๐‘Ÿ) for ๐‘– โˆˆ ๐’ฅ๐‘› is equivalent to (๐‘—๐‘œ๐‘–๐‘›๐‘– โˆง ๐‘๐‘Ž๐‘Ÿ๐‘– โˆงยฌ๐‘๐‘Ž๐‘Ÿ๐‘Ÿ๐‘–๐‘’๐‘Ÿ) โˆจ (๐‘—๐‘œ๐‘–๐‘›๐‘– โˆง ยฌ๐‘๐‘Ž๐‘Ÿ๐‘– โˆง ๐‘๐‘Ž๐‘Ÿ๐‘Ÿ๐‘–๐‘’๐‘Ÿ). In this case, none of the three if-conditions of

Inf-Ag-BSP rule are satisfied, so ASM does nothing and the update set is empty.

(8) The location ๐‘๐‘Ž๐‘Ÿ๐‘Ÿ๐‘–๐‘’๐‘Ÿ is lazy, it only changes value when all ๐‘๐‘Ž๐‘Ÿ๐‘– for ๐‘– โˆˆ ๐’ฅ๐‘› have

changed their values from ๐ญ๐ซ๐ฎ๐ž to ๐Ÿ๐š๐ฅ๐ฌ๐ž (or ๐Ÿ๐š๐ฅ๐ฌ๐ž to ๐ญ๐ซ๐ฎ๐ž respectively).

โ€ข According to the definition of SWITCH rule, ๐‘๐‘Ž๐‘Ÿ๐‘Ÿ๐‘–๐‘’๐‘Ÿ gets changed only if all active

agents have same values.

Page 28: Unbounded Barrier-Synchronized Concurrent ASMs for

ZJUI

Proof: Every Inf-Ag-BSP algorithm defined by the axiomatization can be simulated

by an Inf-Ag-BSP-ASM.

โ€ข For an Inf-Ag-BSP algorithm โ„ฌ = {(๐‘Ž๐‘– , ๐’œ๐‘–|๐‘– โˆˆ โ„•}, we assume to have a concurrent

ASM โ„ณ = {(๐‘Ž๐‘– , โ„ณ๐‘–|๐‘– โˆˆ โ„•} with the same signature and concurrent runs.

โ€ข For โ„ณ0, its rule ๐‘Ÿ0 which is used to take care of the barrier synchronization and

the joining behavior of Inf-Ag-BSP algorithm.

โ€ข For other concurrent machines โ„ณ๐‘–, their rules ๐‘Ÿ๐‘– have general form of:

IF ๐œ‘1 THEN ๐‘Ÿ๐‘–,1 โ€ฆ IF ๐œ‘๐‘š THEN ๐‘Ÿ๐‘–,๐‘š

โ€ข Exploit the proof in general BSP mode gives that the general form can be written

as the Inf-Ag-BSP rule with:

๐‘Ÿ๐‘–,๐‘๐‘Ÿ๐‘œ๐‘ = IF ๐œ‘1 THEN ๐‘Ÿ๐‘–,1 โ€ฆ IF ๐œ‘๐‘šโ€ฒ THEN ๐‘Ÿ๐‘–,๐‘šโ€ฒ

๐‘Ÿ๐‘–,๐‘๐‘œ๐‘š๐‘š = IF ๐œ‘๐‘šโ€ฒ+1 THEN ๐‘Ÿ๐‘–,๐‘šโ€ฒ+1 โ€ฆ IF ๐œ‘๐‘šโ€ฒโ€ฒ THEN ๐‘Ÿ๐‘–,๐‘šโ€ฒโ€ฒ

๐‘Ÿ๐‘–,๐‘—๐‘œ๐‘–๐‘› = IF ๐œ‘๐‘šโ€ฒโ€ฒ+1 THEN ๐‘Ÿ๐‘–,๐‘šโ€ฒโ€ฒ+1 โ€ฆ IF ๐œ‘๐‘š THEN ๐‘Ÿ๐‘–,๐‘š

Appendix: Proof for Characterization Theorem (contโ€™d)