cascalog workshop

Post on 29-Jan-2018

2.741 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Cascalog Workshop

Example query

Execution

1. Pre-aggregation

2. Aggregation

3. Post-aggregation

Variable dependencies

Pre-aggregation

• Start from generator variables

• Resolve as many variables as possible using:

• Joins

• Functions

• Use as many filters as possible

• Join all sources into one set of tuples

Aggregation

• Group by resolved output variables

• Apply all aggregators to each group

Post-aggregation

• Resolve the rest of the variables

• Apply rest of filters

Example query

Query planner

Start with generators

Query planner

[?person2 ?age2 ?double-age2]

Add functions and filters until fixed point

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

Do a join

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

Add functions and filters until fixed point

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

Do a join

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

Add functions and filters until fixed point

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

Group by ?delta

Group by already satisfied output vars

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Execute aggregators on each group

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Add functions and filters until fixed point

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Cascading pipes

• Each: can occur in Map or Reduce

• GroupBy: Causes a Reduce step

• Every: One or more follow GroupBy

• CoGroup: Join implementation, causes Reduce step

To Cascading

To Cascading

[?person2 ?age2 ?double-age2]

Each

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]CoGroup

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]CoGroup

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

Each

Each

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

Group by ?delta

GroupBy

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Execute aggregators on each group

Every

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?deltaEach

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Each

To MapReduce

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Job 1

To MapReduce

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Job 2

To MapReduce

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Job 3

defmapop

[A1, B1, C1]

[A2, B2, C2]

[A3, B3, C3]

[A1, B1, C1, D1, E1]

[A2, B2, C2, D2, E2]

[A3, B3, C3, D3, E3]

Appends fields to tuple

deffilterop

[A1, B1, C1]

[A2, B2, C2]

[A3, B3, C3]

true

false

true

[A1, B1, C1]

[A3, B3, C3]

defmapcatop

[“a red dog”]

[“ ”]

[“hello”]

[“a red dog”, “a”]

[“a red dog”, “red”]

[“hello”, “hello”]

[“a red dog”, “dog”]

[“a red dog”, “a”]

[“a red dog”, “red”]

[“hello”, “hello”]

[“a red dog”, “dog”]

Map Concat

[

[ ]

]

[ ]

Aggregators

[“key1”, 1]

[“key1”, 2]

[“key2”, 3]

[“key3”, 3]

[“key3”, 1]

Map Task 2

Map Task 1

[“key1”, 1]

[“key1”, 2]

[“key2”, 3]

[“key3”, 3]

[“key3”, 1]

Reduce Task 2

Reduce Task 1

Regular aggregators - all data goes to reducers

[“key1”, 3]

[“key2”, 3]

[“key3”, 4]

defparallelagg[“nathan”]

[“nathan”]

[“sally”]

[“alice”]

[“nathan”]

Map Task 1

Map Task 2

[“nathan”, 1]

[“alice”, 1]

[“nathan”, 1]

Map Task 1

[“nathan”, 1]

[“sally”, 1]

Map Task 2

Init

[“nathan”, 2]

[“alice”, 1]

Map Task 1

[“nathan”, 1]

[“sally”, 1]

Map Task 2

Combine(Map)

[“nathan”, 3]

Reduce Task 1

[“sally”, 1]

[“alice”, 1]

Reduce Task 2

Combine(Reduce)

Parallel aggregators - partial aggregation done in mappers

combine[1]

[2]

[3]

[3]

[4]

[5]

[1]

[2]

[3]

[4]

[5]

[3]

union[1]

[2]

[3]

[3]

[4]

[5]

[1]

[2]

[3]

[4]

[5]

ElephantDB

Generation of domain of data

Key/Value pairs

Pre-shardand index inMapReduce

Shard 0

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

Distributed Filesystem

ElephantDB

Shard 0

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

ElephantDB Server

ElephantDB Server

ElephantDB Server

Serving domain of data

DFS

top related