cascalog workshop

40
Cascalog Workshop

Upload: nathanmarz

Post on 29-Jan-2018

2.741 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Cascalog workshop

Cascalog Workshop

Page 2: Cascalog workshop

Example query

Page 3: Cascalog workshop

Execution

1. Pre-aggregation

2. Aggregation

3. Post-aggregation

Page 4: Cascalog workshop

Variable dependencies

Page 5: Cascalog workshop

Pre-aggregation

• Start from generator variables

• Resolve as many variables as possible using:

• Joins

• Functions

• Use as many filters as possible

• Join all sources into one set of tuples

Page 6: Cascalog workshop

Aggregation

• Group by resolved output variables

• Apply all aggregators to each group

Page 7: Cascalog workshop

Post-aggregation

• Resolve the rest of the variables

• Apply rest of filters

Page 8: Cascalog workshop

Example query

Page 9: Cascalog workshop

Query planner

Start with generators

Page 10: Cascalog workshop

Query planner

[?person2 ?age2 ?double-age2]

Add functions and filters until fixed point

Page 11: Cascalog workshop

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

Do a join

Page 12: Cascalog workshop

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

Add functions and filters until fixed point

Page 13: Cascalog workshop

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

Do a join

Page 14: Cascalog workshop

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

Add functions and filters until fixed point

Page 15: Cascalog workshop

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

Group by ?delta

Group by already satisfied output vars

Page 16: Cascalog workshop

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Execute aggregators on each group

Page 17: Cascalog workshop

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Add functions and filters until fixed point

Page 18: Cascalog workshop

Query planner

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Page 19: Cascalog workshop

Cascading pipes

• Each: can occur in Map or Reduce

• GroupBy: Causes a Reduce step

• Every: One or more follow GroupBy

• CoGroup: Join implementation, causes Reduce step

Page 20: Cascalog workshop

To Cascading

Page 21: Cascalog workshop

To Cascading

[?person2 ?age2 ?double-age2]

Each

Page 22: Cascalog workshop

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]CoGroup

Page 23: Cascalog workshop

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]CoGroup

Page 24: Cascalog workshop

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

Each

Each

Page 25: Cascalog workshop

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

Group by ?delta

GroupBy

Page 26: Cascalog workshop

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Execute aggregators on each group

Every

Page 27: Cascalog workshop

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?deltaEach

Page 28: Cascalog workshop

To Cascading

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Each

Page 29: Cascalog workshop

To MapReduce

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Job 1

Page 30: Cascalog workshop

To MapReduce

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Job 2

Page 31: Cascalog workshop

To MapReduce

[?person2 ?age2 ?double-age2]

[?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

[?delta ?count]Group by ?delta

Project fields to [?delta ?count]

Job 3

Page 32: Cascalog workshop

defmapop

[A1, B1, C1]

[A2, B2, C2]

[A3, B3, C3]

[A1, B1, C1, D1, E1]

[A2, B2, C2, D2, E2]

[A3, B3, C3, D3, E3]

Appends fields to tuple

Page 33: Cascalog workshop

deffilterop

[A1, B1, C1]

[A2, B2, C2]

[A3, B3, C3]

true

false

true

[A1, B1, C1]

[A3, B3, C3]

Page 34: Cascalog workshop

defmapcatop

[“a red dog”]

[“ ”]

[“hello”]

[“a red dog”, “a”]

[“a red dog”, “red”]

[“hello”, “hello”]

[“a red dog”, “dog”]

[“a red dog”, “a”]

[“a red dog”, “red”]

[“hello”, “hello”]

[“a red dog”, “dog”]

Map Concat

[

[ ]

]

[ ]

Page 35: Cascalog workshop

Aggregators

[“key1”, 1]

[“key1”, 2]

[“key2”, 3]

[“key3”, 3]

[“key3”, 1]

Map Task 2

Map Task 1

[“key1”, 1]

[“key1”, 2]

[“key2”, 3]

[“key3”, 3]

[“key3”, 1]

Reduce Task 2

Reduce Task 1

Regular aggregators - all data goes to reducers

[“key1”, 3]

[“key2”, 3]

[“key3”, 4]

Page 36: Cascalog workshop

defparallelagg[“nathan”]

[“nathan”]

[“sally”]

[“alice”]

[“nathan”]

Map Task 1

Map Task 2

[“nathan”, 1]

[“alice”, 1]

[“nathan”, 1]

Map Task 1

[“nathan”, 1]

[“sally”, 1]

Map Task 2

Init

[“nathan”, 2]

[“alice”, 1]

Map Task 1

[“nathan”, 1]

[“sally”, 1]

Map Task 2

Combine(Map)

[“nathan”, 3]

Reduce Task 1

[“sally”, 1]

[“alice”, 1]

Reduce Task 2

Combine(Reduce)

Parallel aggregators - partial aggregation done in mappers

Page 37: Cascalog workshop

combine[1]

[2]

[3]

[3]

[4]

[5]

[1]

[2]

[3]

[4]

[5]

[3]

Page 38: Cascalog workshop

union[1]

[2]

[3]

[3]

[4]

[5]

[1]

[2]

[3]

[4]

[5]

Page 39: Cascalog workshop

ElephantDB

Generation of domain of data

Key/Value pairs

Pre-shardand index inMapReduce

Shard 0

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

Distributed Filesystem

Page 40: Cascalog workshop

ElephantDB

Shard 0

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

ElephantDB Server

ElephantDB Server

ElephantDB Server

Serving domain of data

DFS