8 advanced database systems - cmu 15-721 · 2020. 6. 11. · partitioning phase approach #1:...
TRANSCRIPT
![Page 1: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/1.jpg)
Le
ctu
re #
18
Parallel Join Algorithms(Sorting)@Andy_Pavlo // 15-721 // Spring 2020
ADVANCEDDATABASE SYSTEMS
![Page 2: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/2.jpg)
15-721 (Spring 2020)
PROJECT #2
This Week→ Status Meetings
Wednesday April 8th
→ Code Review Submission→ Update Presentation→ Design Document
2
![Page 3: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/3.jpg)
15-721 (Spring 2020)
PARALLEL JOIN ALGORITHMS
Perform a join between two relations on multiple threads simultaneously to speed up operation.
Two main approaches:→ Hash Join→ Sort-Merge Join
3
![Page 4: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/4.jpg)
15-721 (Spring 2020)
Background
Sorting Algorithms
Parallel Sort-Merge Join
Evaluation
4
![Page 5: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/5.jpg)
15-721 (Spring 2020)
SORT-MERGE JOIN (R⨝S)
Phase #1: Sort→ Sort the tuples of R and S based on the join key.
Phase #2: Merge→ Scan the sorted relations and compare tuples.→ The outer relation R only needs to be scanned once.
5
![Page 6: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/6.jpg)
15-721 (Spring 2020)
SORT-MERGE JOIN (R⨝S)
6
Relation R Relation S
![Page 7: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/7.jpg)
15-721 (Spring 2020)
SORT-MERGE JOIN (R⨝S)
6
Relation R Relation S
SORT! SORT!
![Page 8: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/8.jpg)
15-721 (Spring 2020)
SORT-MERGE JOIN (R⨝S)
6
Relation R Relation S
⨝SORT! SORT!
MERGE!
![Page 9: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/9.jpg)
15-721 (Spring 2020)
SORT-MERGE JOIN (R⨝S)
6
Relation R Relation S
⨝SORT! SORT!
MERGE!
![Page 10: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/10.jpg)
15-721 (Spring 2020)
PARALLEL SORT-MERGE JOINS
Sorting is the most expensive part.
Use hardware correctly to speed up the join algorithm as much as possible.→ Utilize as many CPU cores as possible.→ Be mindful of NUMA boundaries.→ Use SIMD instructions where applicable.
7
MULTI-CORE, MAIN-MEMORY JOINS: SORT VS. HASH REVISITEDVLDB 2013
![Page 11: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/11.jpg)
15-721 (Spring 2020)
PARALLEL SORT-MERGE JOIN (R⨝S)
Phase #1: Partitioning (optional)→ Partition R and assign them to workers / cores.
Phase #2: Sort→ Sort the tuples of R and S based on the join key.
Phase #3: Merge→ Scan the sorted relations and compare tuples.→ The outer relation R only needs to be scanned once.
8
![Page 12: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/12.jpg)
15-721 (Spring 2020)
PARTITIONING PHASE
Approach #1: Implicit Partitioning→ The data was partitioned on the join key when it was
loaded into the database.→ No extra pass over the data is needed.
Approach #2: Explicit Partitioning→ Divide only the outer relation and redistribute among the
different CPU cores.→ Can use the same radix partitioning approach we talked
about last time.
9
![Page 13: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/13.jpg)
15-721 (Spring 2020)
SORT PHASE
Create runs of sorted chunks of tuples for both input relations.
It used to be that Quicksort was good enough and it usually still is.
We can explore other methods that try to take advantage of NUMA and parallel architectures …
10
![Page 14: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/14.jpg)
15-721 (Spring 2020)
CACHE-CONSCIOUS SORTING
Level #1: In-Register Sorting→ Sort runs that fit into CPU registers.
Level #2: In-Cache Sorting→ Merge Level #1 output into runs that fit into CPU caches.→ Repeat until sorted runs are ½ cache size.
Level #3: Out-of-Cache Sorting→ Used when the runs of Level #2 exceed the size of caches.
11
SORT VS. HASH REVISITED: FAST JOIN IMPLEMENTATION ON MODERN MULTI-CORE CPUSVLDB 2009
![Page 15: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/15.jpg)
15-721 (Spring 2020)
CACHE-CONSCIOUS SORTING
12
Level #1
Level #2
Level #3
SORTED
UNSORTED
![Page 16: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/16.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited
data dependencies and no branches.
13
9
5
3
6
Input Output
![Page 17: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/17.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited
data dependencies and no branches.
13
9
5
3
6
3
6
5
9
Input Output
![Page 18: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/18.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited
data dependencies and no branches.
13
9
5
3
6
3
6
5
9
5
3Input Output
![Page 19: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/19.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited
data dependencies and no branches.
13
9
5
3
6
3
6
5
9
5
3Input Output
3
![Page 20: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/20.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited
data dependencies and no branches.
13
9
5
3
6
3
6
5
9
9
6
5
3Input Output
3
9
![Page 21: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/21.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited
data dependencies and no branches.
13
9
5
3
6
3
6
5
9
9
6
5
3
5
6
Input Output3
5
6
9
![Page 22: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/22.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited
data dependencies and no branches.
13
9
5
3
6
3
6
5
9
9
6
5
3
5
6
Input Output3
5
6
9
wires = [9,5,3,6]
wires[0] = min(wires[0], wires[1])wires[1] = max(wires[0], wires[1])wires[2] = min(wires[2], wires[3])wires[3] = max(wires[2], wires[3])
wires[0] = min(wires[0], wires[2])wires[2] = max(wires[0], wires[2])wires[1] = min(wires[1], wires[3])wires[3] = max(wires[1], wires[3])
wires[1] = min(wires[1], wires[2])wires[2] = max(wires[1], wires[2])
![Page 23: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/23.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
Abstract model for sorting keys.→ Fixed wiring “paths” for lists with the same # of elements.→ Efficient to execute on modern CPUs because of limited
data dependencies and no branches.
13
9
5
3
6
3
6
5
9
9
6
5
3
5
6
Input Output3
5
6
9
wires = [9,5,3,6]
wires[0] = min(wires[0], wires[1])wires[1] = max(wires[0], wires[1])wires[2] = min(wires[2], wires[3])wires[3] = max(wires[2], wires[3])
wires[0] = min(wires[0], wires[2])wires[2] = max(wires[0], wires[2])wires[1] = min(wires[1], wires[3])wires[3] = max(wires[1], wires[3])
wires[1] = min(wires[1], wires[2])wires[2] = max(wires[1], wires[2])
11
22
3
3
![Page 24: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/24.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
14
12 21 4 13
9 8 6 7
1 14 3 0
5 11 15 10
<64-bit Join Key, 64-bit Tuple Pointer>
![Page 25: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/25.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
14
12 21 4 13
9 8 6 7
1 14 3 0
5 11 15 10
Instructions:→ 4 LOAD
![Page 26: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/26.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
14
12 21 4 13
9 8 6 7
1 14 3 0
5 11 15 10
Sort Across Registers
Instructions:→ 4 LOAD
![Page 27: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/27.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
14
12 21 4 13
9 8 6 7
1 14 3 0
5 11 15 10
1 8 3 0
5 11 4 7
9 14 6 10
12 21 15 13
Sort Across Registers
Instructions:→ 4 LOAD
Instructions:→ 10 MIN/MAX
![Page 28: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/28.jpg)
15-721 (Spring 2020)
LEVEL #1 SORTING NETWORKS
14
12 21 4 13
9 8 6 7
1 14 3 0
5 11 15 10
1 8 3 0
5 11 4 7
9 14 6 10
12 21 15 13
1 5 9 12
8 11 14 21
3 4 6 15
0 7 10 13
Sort Across Registers
Transpose Registers
Instructions:→ 4 LOAD
Instructions:→ 10 MIN/MAX
Instructions:→ 8 SHUFFLE→ 4 STORE
![Page 29: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/29.jpg)
15-721 (Spring 2020)
LEVEL #2 BITONIC MERGE NETWORK
Like a Sorting Network but it can merge two locally-sorted lists into a globally-sorted list.
Can expand network to merge progressively larger lists up to ½ LLC size.
Intel’s Measurements→ 2.25–3.5× speed-up over SISD implementation.
15
EFFICIENT IMPLEMENTATION OF SORTING ON MULTI-COREVLDB 2008
![Page 30: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/30.jpg)
15-721 (Spring 2020)
LEVEL #2 BITONIC MERGE NETWORK
16
Input Output
b4
b3
b2
b1
Sorted Run
Reverse Sorted Run
a1
a2
a3
a4
S
H
U
F
F
L
E
S
H
U
F
F
L
E
Sorted Run
min/max min/max min/max
![Page 31: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/31.jpg)
15-721 (Spring 2020)
LEVEL #3 MULTI-WAY MERGING
Use the Bitonic Merge Networks but split the process up into tasks.→ Still one worker thread per core.→ Link together tasks with a cache-sized FIFO queue.
A task blocks when either its input queue is empty, or its output queue is full.
Requires more CPU instructions but brings bandwidth and compute into balance.
17
![Page 32: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/32.jpg)
15-721 (Spring 2020)
Sorted Runs
LEVEL #3 MULTI-WAY MERGING
18
MERGE
MERGE
MERGE
MERGE
MERGE
MERGE
MERGE
Cache-Sized Queue
![Page 33: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/33.jpg)
15-721 (Spring 2020)
IN-PL ACE SUPERSCAL AR SAMPLESORT
Recursively partition the table by sampling keys to determine partition boundaries.
It copies data into output buffers during the partitioning phases.But when a buffer gets full, it writes it back into portions of the input array already distributed instead of allocating a new buffer.
19
IN-PLACE PARALLEL SUPER SCALAR SAMPLESORTESA 2017
![Page 34: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/34.jpg)
15-721 (Spring 2020)
MERGE PHASE
Iterate through the outer table and inner table in lockstep and compare join keys.
May need to backtrack if there are duplicates.
Can be done in parallel at the different cores without synchronization if there are separate output buffers.
20
![Page 35: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/35.jpg)
15-721 (Spring 2020)
SORT-MERGE JOIN VARIANTS
Multi-Way Sort-Merge (M-WAY)
Multi-Pass Sort-Merge (M-PASS)
Massively Parallel Sort-Merge (MPSM)
21
![Page 36: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/36.jpg)
15-721 (Spring 2020)
MULTI-WAY SORT-MERGE
Outer Table→ Each core sorts in parallel on local data (levels #1/#2).→ Redistribute sorted runs across cores using the multi-
way merge (level #3).
Inner Table→ Same as outer table.
Merge phase is between matching pairs of chunks of outer/inner tables at each core.
22
MULTI-CORE, MAIN-MEMORY JOINS: SORT VS. HASH REVISITEDVLDB 2013
![Page 37: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/37.jpg)
15-721 (Spring 2020)
MULTI-WAY SORT-MERGE
23
Local-NUMA Partitioning
![Page 38: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/38.jpg)
15-721 (Spring 2020)
MULTI-WAY SORT-MERGE
23
Local-NUMA Partitioning Sort
![Page 39: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/39.jpg)
15-721 (Spring 2020)
MULTI-WAY SORT-MERGE
23
Local-NUMA Partitioning Sort
Multi-Way Merge
![Page 40: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/40.jpg)
15-721 (Spring 2020)
MULTI-WAY SORT-MERGE
23
Local-NUMA Partitioning Sort
Multi-Way Merge
![Page 41: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/41.jpg)
15-721 (Spring 2020)
MULTI-WAY SORT-MERGE
23
Local-NUMA Partitioning Sort
Multi-Way Merge
![Page 42: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/42.jpg)
15-721 (Spring 2020)
MULTI-WAY SORT-MERGE
23
SORT!
SORT!
SORT!
SORT!
Local-NUMA Partitioning Sort
Multi-Way Merge
Same steps asOuter Table
![Page 43: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/43.jpg)
15-721 (Spring 2020)
MULTI-WAY SORT-MERGE
23
SORT!
SORT!
SORT!
SORT!
⨝
⨝
⨝
⨝
Local-NUMA Partitioning Sort
Multi-Way Merge
Local Merge Join
Same steps asOuter Table
![Page 44: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/44.jpg)
15-721 (Spring 2020)
MULTI-PASS SORT-MERGE
Outer Table→ Same level #1/#2 sorting as Multi-Way.→ But instead of redistributing, it uses a multi-pass naïve
merge on sorted runs.
Inner Table→ Same as outer table.
Merge phase is between matching pairs of chunks of outer table and inner table.
24
MULTI-CORE, MAIN-MEMORY JOINS: SORT VS. HASH REVISITEDVLDB 2013
![Page 45: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/45.jpg)
15-721 (Spring 2020)
MULTI-PASS SORT-MERGE
25
Local-NUMA Partitioning
Local-NUMA Partitioning
![Page 46: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/46.jpg)
15-721 (Spring 2020)
MULTI-PASS SORT-MERGE
25
Local-NUMA Partitioning Sort
Local-NUMA PartitioningSort
![Page 47: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/47.jpg)
15-721 (Spring 2020)
MULTI-PASS SORT-MERGE
25
Local-NUMA Partitioning Sort
Global Merge Join
⨝
Local-NUMA PartitioningSort
![Page 48: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/48.jpg)
15-721 (Spring 2020)
MULTI-PASS SORT-MERGE
25
Local-NUMA Partitioning Sort
Global Merge Join
⨝
Local-NUMA PartitioningSort
![Page 49: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/49.jpg)
15-721 (Spring 2020)
MASSIVELY PARALLEL SORT-MERGE
Outer Table→ Range-partition outer table and redistribute to cores.→ Each core sorts in parallel on their partitions.
Inner Table→ Not redistributed like outer table.→ Each core sorts its local data.
Merge phase is between entire sorted run of outer table and a segment of inner table.
26
MASSIVELY PARALLEL SORT-MERGE JOINS IN MAIN MEMORY MULTI-CORE DATABASE SYSTEMSVLDB 2012
![Page 50: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/50.jpg)
15-721 (Spring 2020)
MASSIVELY PARALLEL SORT-MERGE
27
Cross-NUMA Partitioning
![Page 51: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/51.jpg)
15-721 (Spring 2020)
MASSIVELY PARALLEL SORT-MERGE
27
Cross-NUMA Partitioning Sort
Globally Sorted
![Page 52: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/52.jpg)
15-721 (Spring 2020)
MASSIVELY PARALLEL SORT-MERGE
27
Cross-NUMA Partitioning Sort
![Page 53: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/53.jpg)
15-721 (Spring 2020)
MASSIVELY PARALLEL SORT-MERGE
27
SORT!
SORT!
SORT!
SORT!
Cross-NUMA Partitioning Sort
![Page 54: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/54.jpg)
15-721 (Spring 2020)
MASSIVELY PARALLEL SORT-MERGE
27
SORT!
SORT!
SORT!
SORT!
⨝
Cross-NUMA Partitioning Sort
Cross-Partition Merge Join
![Page 55: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/55.jpg)
15-721 (Spring 2020)
MASSIVELY PARALLEL SORT-MERGE
27
SORT!
SORT!
SORT!
SORT!
⨝
⨝
Cross-NUMA Partitioning Sort
Cross-Partition Merge Join
![Page 56: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/56.jpg)
15-721 (Spring 2020)
MASSIVELY PARALLEL SORT-MERGE
27
SORT!
SORT!
SORT!
SORT!
⨝
⨝
⨝
⨝
Cross-NUMA Partitioning Sort
Cross-Partition Merge Join
![Page 57: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/57.jpg)
15-721 (Spring 2020)
HYPER's RULES FOR PARALLELIZATION
Rule #1: No random writes to non-local memory→ Chunk the data, redistribute, and then each core
sorts/works on local data.
Rule #2: Only perform sequential reads on non-local memory→ This allows the hardware prefetcher to hide remote
access latency.
Rule #3: No core should ever wait for another→ Avoid fine-grained latching or sync barriers.
28
Source: Martina-Cezara Albutiu
![Page 58: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/58.jpg)
15-721 (Spring 2020)
EVALUATION
Compare the different join algorithms using a synthetic data set.→ Sort-Merge: M-WAY, M-PASS, MPSM→ Hash: Radix Partitioning
Hardware:→ 4 Socket Intel Xeon E4640 @ 2.4GHz→ 8 Cores with 2 Threads Per Core→ 512 GB of DRAM
29
MULTI-CORE, MAIN-MEMORY JOINS: SORT VS. HASH REVISITEDVLDB 2013
![Page 59: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/59.jpg)
15-721 (Spring 2020)
RAW SORTING PERFORMANCE
30
0
9
18
27
36
1 2 4 8 16 32 64 128 256Thr
ough
put (
M T
upl
es/s
ec)
Number of Tuples (in 220)
C++ STL Sort SIMD Sort
Source: Cagri Balkesen
Single-threaded sorting performance
2.5–3x Faster
![Page 60: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/60.jpg)
15-721 (Spring 2020)
COMPARISON OF SORT-MERGE JOINS
31
0
100
200
300
400
0
5
10
15
20
25
M-WAY M-PASS MPSM
Thr
ough
put (
M T
upl
es/s
ec)
Cyc
les
/ O
utp
ut T
upl
e
Partition Sort S-Merge J-Merge Throughput
13.6
Source: Cagri Balkesen
Workload: 1.6B⋈ 128M (8-byte tuples)
7.6
22.9
![Page 61: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/61.jpg)
15-721 (Spring 2020)
Hyper-Threading
M-WAY JOIN VS. MPSM JOIN
32
0
100
200
300
400
1 2 4 8 16 32 64Thr
ough
put (
M T
upl
es/s
ec)
Number of Threads
Multi-Way Massively Parallel
108 M/sec
315 M/sec
Source: Cagri Balkesen
Workload: 1.6B⋈ 128M (8-byte tuples)
130 M/sec
54 M/sec
259 M/sec
90 M/sec
![Page 62: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/62.jpg)
15-721 (Spring 2020)
SORT-MERGE JOIN VS. HASH JOIN
33
0
2
4
6
8
SORT HASH SORT HASH SORT HASH SORT HASH
128M⨝128M 1.6B⨝1.6B 128M⨝512M 1.6B⨝6.4B
Cyc
les
/ O
utp
ut T
upl
e
Partition Sort S-Merge J-Merge Build+Probe
Source: Cagri Balkesen
Workload: Different Table Sizes (8-byte tuples)
![Page 63: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/63.jpg)
15-721 (Spring 2020)
SORT-MERGE JOIN VS. HASH JOIN
34
0
150
300
450
600
750
128 256 384 512 768 1024 1280 1536 1792 1920Thr
ough
put (
M T
upl
es/s
ec)
Millions of Tuples
Multi-Way Sort-Merge Join Radix Hash Join
Source: Cagri Balkesen
Varying the size of the input relations
![Page 64: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/64.jpg)
15-721 (Spring 2020)
PARTING THOUGHTS
Both join approaches are equally important.
Every serious OLAP DBMS supports both.
We did not consider the impact of queries where the output needs to be sorted.
35
![Page 65: 8 ADVANCED DATABASE SYSTEMS - CMU 15-721 · 2020. 6. 11. · PARTITIONING PHASE Approach #1: Implicit Partitioning →The data was partitioned on the join key when it was loaded into](https://reader034.vdocuments.site/reader034/viewer/2022051603/5ff06f370050d871ff76c4e8/html5/thumbnails/65.jpg)
15-721 (Spring 2020)
NEXT CL ASS
Optimizers – The Hardest Topic in Databases
36