work-efficient parallel skyline computation for the...
TRANSCRIPT
Work-Efficient Parallel Skyline Computation for theGPU
Kenneth S. Bøgh, Sean Chester, Ira Assent
[email protected] Systems Group
Aarhus University, Denmark
Harvard University11 February 2016
What this talk will cover
1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)
2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:
I multicore CPUsI GPUs
5 Current research at DASlab
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22
What this talk will cover
1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)
2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:
I multicore CPUsI GPUs
5 Current research at DASlab
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22
What this talk will cover
1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)
2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:
I multicore CPUsI GPUs
5 Current research at DASlab
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22
What this talk will cover
1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)
2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:
I multicore CPUsI GPUs
5 Current research at DASlab
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22
What this talk will cover
1 An introduction to Genereal Purpose computing on GraphicsProcessing Units (GPGPU)
2 An introduction to the skyline operator3 A review of state-of-the-art algorithms for computing skylines4 An introduction of parallel search trees for:
I multicore CPUsI GPUs
5 Current research at DASlab
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 2 / 22
What is a GPU?
1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22
What is a GPU?
1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22
What is a GPU?
1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22
What is a GPU?
1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22
What is a GPU?
1 Graphics Processing Unit - Specialized hardware for graphics2 Massively parallel (2688 cores in our card)3 More power efficient than CPUs (21 vs 5 GFLOPS/watt)4 More processing power per $5 Using accelerator card - The extreme in terms of scale-up
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 3 / 22
Key differences between CPU and GPU
Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked
CPU
CPU RAM
Shared cache, 2MB per core
256KB 256KB
2x32KB 2x32KB
Core Core ...GPU
GPU RAM
Shared cache, 1.5MB
64KB R/W48KB RO
192 cores
64KB R/W48KB RO
192 cores ...
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22
Key differences between CPU and GPU
Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked
CPU
CPU RAM
Shared cache, 2MB per core
256KB 256KB
2x32KB 2x32KB
Core Core ...GPU
GPU RAM
Shared cache, 1.5MB
64KB R/W48KB RO
192 cores
64KB R/W48KB RO
192 cores ...
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22
Key differences between CPU and GPU
Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked
CPU
CPU RAM
Shared cache, 2MB per core
256KB 256KB
2x32KB 2x32KB
Core Core ...GPU
GPU RAM
Shared cache, 1.5MB
64KB R/W48KB RO
192 cores
64KB R/W48KB RO
192 cores ...
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22
Key differences between CPU and GPU
Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked
CPU
CPU RAM
Shared cache, 2MB per core
256KB 256KB
2x32KB 2x32KB
Core Core ...GPU
GPU RAM
Shared cache, 1.5MB
64KB R/W48KB RO
192 cores
64KB R/W48KB RO
192 cores ...
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22
Key differences between CPU and GPU
Seperate memory - data must be tranfered back and forthHigher memory bandwidth (x4) and latency (x2)No prefetcher, and a small cache (1.5MB for 2688 cores)2048 threads per 192 cores (2 threads per core on the CPU)Groups of 32 threads execute step locked
CPU
CPU RAM
Shared cache, 2MB per core
256KB 256KB
2x32KB 2x32KB
Core Core ...GPU
GPU RAM
Shared cache, 1.5MB
64KB R/W48KB RO
192 cores
64KB R/W48KB RO
192 cores ...
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 4 / 22
The CPU and GPU threading models
CPU threads execute independently
GPU threads execute in step-locked groups of 32 called warpsThreads of a warp must agree on what instruction to execute nextOtherwise some threads will halt while the others execute
C
B E
A D F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22
The CPU and GPU threading models
CPU threads execute independently
GPU threads execute in step-locked groups of 32 called warpsThreads of a warp must agree on what instruction to execute nextOtherwise some threads will halt while the others execute
C
CPU1 CPU2
B E
A D F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22
The CPU and GPU threading models
CPU threads execute independentlyGPU threads execute in step-locked groups of 32 called warps
Threads of a warp must agree on what instruction to execute nextOtherwise some threads will halt while the others execute
C
B E
A D F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22
The CPU and GPU threading models
CPU threads execute independentlyGPU threads execute in step-locked groups of 32 called warpsThreads of a warp must agree on what instruction to execute next
Otherwise some threads will halt while the others execute
C
WarpThread1−32
B E
A D F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22
The CPU and GPU threading models
CPU threads execute independentlyGPU threads execute in step-locked groups of 32 called warpsThreads of a warp must agree on what instruction to execute nextOtherwise some threads will halt while the others execute
C
WarpThread1 WarpThread2−32 (halted)
B E
A D F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 5 / 22
Example - Finding a conference hotel
Close to the conferencelocation - to make you happyCheap - to make yourdepartment happySkyline query: Minimize priceand distance, returning all besttrade-offs p
q
PriceD
ista
nce
Price
Dis
tanc
e
Price
Dis
tanc
e
Price
Dis
tanc
e
*This is the same concept as pareto dominance from Economics, but applied to databases.
[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22
Example - Finding a conference hotel
Close to the conferencelocation - to make you happy
Cheap - to make yourdepartment happySkyline query: Minimize priceand distance, returning all besttrade-offs p
q
Price
Dis
tanc
e
PriceD
ista
nce
Price
Dis
tanc
e
Price
Dis
tanc
e
*This is the same concept as pareto dominance from Economics, but applied to databases.
[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22
Example - Finding a conference hotel
Close to the conferencelocation - to make you happyCheap - to make yourdepartment happy
Skyline query: Minimize priceand distance, returning all besttrade-offs p
q
Price
Dis
tanc
e
Price
Dis
tanc
e
PriceD
ista
nce
Price
Dis
tanc
e
*This is the same concept as pareto dominance from Economics, but applied to databases.
[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22
Example - Finding a conference hotel
Close to the conferencelocation - to make you happyCheap - to make yourdepartment happySkyline query: Minimize priceand distance, returning all besttrade-offs
p
q
Price
Dis
tanc
e
Price
Dis
tanc
e
Price
Dis
tanc
e
PriceD
ista
nce
*This is the same concept as pareto dominance from Economics, but applied to databases.
[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22
Example - Finding a conference hotel
A point p dominates* anotherpoint q if:
I p is preferable or equivalentto q in all dimensions
I p is strictly preferable to q inat least one dimension
The skyline [1] consists ofpoints that are not dominated
p
q
Price
Dis
tanc
e
Price
Dis
tanc
e
Price
Dis
tanc
e
PriceD
ista
nce
*This is the same concept as pareto dominance from Economics, but applied to databases.
[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22
Example - Finding a conference hotel
A point p dominates* anotherpoint q if:
I p is preferable or equivalentto q in all dimensions
I p is strictly preferable to q inat least one dimension
The skyline [1] consists ofpoints that are not dominated
p
q
Price
Dis
tanc
e
Price
Dis
tanc
e
Price
Dis
tanc
e
PriceD
ista
nce
*This is the same concept as pareto dominance from Economics, but applied to databases.
[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22
Example - Finding a conference hotel
A point p dominates* anotherpoint q if:
I p is preferable or equivalentto q in all dimensions
I p is strictly preferable to q inat least one dimension
The skyline [1] consists ofpoints that are not dominated
p
q
Price
Dis
tanc
e
Price
Dis
tanc
e
Price
Dis
tanc
e
PriceD
ista
nce
*This is the same concept as pareto dominance from Economics, but applied to databases.
[1] S. Börzsönyi et al. "The skyline operator." In Proc. ICDE (2001).
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 6 / 22
The state of parallel skylines
GGS [3] is the state-of-the-artGPU skyline algorithmRun on Nvidia GTX Titan with2688 cores at 0.8 Ghz
BSkyTree [7] is the sequentialstate-of-the-artRun on a 3.4 Ghz Inteli7-3770
[3] K.S. Bøgh et al., “Efficient GPU-based skyline computation”,Proc. DaMoN, 2013.[7] J. Lee and S.-w. Hwang, “Scalable skyline computationusing a balanced pivot selection technique”, Inf. Syst., 2014.
1 2 4 6 80
20
40
Tim
e(s
)1 2 4 6 8
103
104
Cardinality, ×106
Dom
test
s/n
BSkyTree GGS
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 7 / 22
Monotonic sorting
1. Compute monotonic score for each data point
2. Sort the data by the score
3. for i = 0, . . . ,n − 1 do
4. Append point i to candidate buffer if nopoint in candidate buffer dominates i
candidate buffer. . .
unprocessed points
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 8 / 22
Object-based partitioning
Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests
CB
A
ED
F
F
C
B E
A D
F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22
Object-based partitioning
Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests
CB
A
ED
F
F
C
B E
A D
F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22
Object-based partitioning
Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests
CB
A
ED
F
F
C
B E
A D
F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22
Object-based partitioning
Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests
CB
A
ED
F
F
C
B E
A D
F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22
Object-based partitioning
Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests
CB
A
ED
F
F
C
B E
A D
F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22
Object-based partitioning
Partitions the data recursivelyBuilds a search tree on the fly to minimize data point comparisonsStores bit masks in nodes to minimize dominance tests
CB
A
ED
F
F
C
B E
A D F
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 9 / 22
Control flow of Hybrid
Phase Isolution tree
. . .α α
Phase IIsolution tree
. . .α α
Updatesolution tree
. . .α
Phase I is ideal; Phase II is cache-resident; Update phase is sequential
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 10 / 22
Control flow of Hybrid
Phase Isolution tree
. . .α α
Phase IIsolution tree
. . .α α
Updatesolution tree
. . .α
Phase I is ideal; Phase II is cache-resident; Update phase is sequential
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 10 / 22
Control flow of Hybrid
Phase Isolution tree
. . .α α
Phase IIsolution tree
. . .α α
Updatesolution tree
. . .α
Phase I is ideal; Phase II is cache-resident; Update phase is sequential
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 10 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = −− Q0 = −−
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = −− Q0 = −−
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = 0− Q0 = −−
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = 0− Q0 = −−
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = 01 Q0 = −−
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = 01 Q0 = −−
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = 01 Q0 = 1−
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = 01 Q0 = 1−
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = 01 Q0 = 10
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = 01 Q0 = 10p1 M1 = 11 Q1 = 00p2 M2 = 10 Q2 = 11p3 M3 = 10 Q3 = 10p4 M4 = 10 Q4 = 01p5 M5 = 01 Q5 = 01p6 M6 = 01 Q6 = 11
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
p0
p0
p1
p2
p3
p5 p6
p4
Point Median Quartilep0 M0 = 01 Q0 = 10p6 M6 = 01 Q6 = 11p5 M5 = 01 Q5 = 01p4 M4 = 10 Q4 = 01p3 M3 = 10 Q3 = 10p2 M2 = 10 Q2 = 11p1 M1 = 11 Q1 = 00
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
Static median/quartile based partitioning
Fixed two-level tree, based on median and quartile valuesCan be built in parallelEnables predictable branching
01
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1
Point Median Quartilep0 M0 = 01 Q0 = 10p6 M6 = 01 Q6 = 11p5 M5 = 01 Q5 = 01p4 M4 = 10 Q4 = 01p3 M3 = 10 Q3 = 10p2 M2 = 10 Q2 = 11p1 M1 = 11 Q1 = 00
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 11 / 22
The SkyAlign workflow
w3P3
P2
CompareCompareDescentCompareDominance testCompareDominance testCompare
01
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1
w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3
P2
CompareCompareDescentCompareDominance testCompareDominance testCompare
01
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3P2
CompareCompareDescentCompareDominance testCompareDominance testCompare
01
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3P2
Compare
CompareDescentCompareDominance testCompareDominance testCompare
01
w3
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3P2
Compare
Compare
DescentCompareDominance testCompareDominance testCompare
01
w3
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3P2
CompareCompare
Descent
CompareDominance testCompareDominance testCompare
01w3
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3P2
CompareCompareDescent
Compare
Dominance testCompareDominance testCompare
01w3
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3P2
CompareCompareDescentCompare
Dominance test
CompareDominance testCompare
01w3
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3P2
CompareCompareDescentCompareDominance test
Compare
Dominance testCompare
01w3
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3P2
CompareCompareDescentCompareDominance testCompare
Dominance test
Compare
01w3
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
The SkyAlign workflow
w3P3
P2CompareCompareDescentCompareDominance testCompareDominance test
Compare
01w3
10
P0
11
P6
01
P5
M
Q
10
01
P4
10
P3
11
P2
11
00
P1w1 w2 w3 w4
p0
p1
p2
p3
p5 p6
p4
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 12 / 22
Experimental setup
Intel i7-3770 with 4 cores at 3.4Ghz and hyperthreading enabledNvidia GTX Titan with 2688 cores at 0.8 GhzTransfer of data to and from GPU are included in the running timeTree building is included in the running time
Compared algorithms:I BSkyTree [7]: State-of-the-art sequential algorithmI Hybrid [4]: The proposed multicore algorithm (run with 8 threads)I GGS [3]: Previous state-of-the-art, tree-less GPU algorithmI SkyAlign [2]: The proposed GPU algorithm
Download all code: http://cs.au.dk/research-at-cs/data-intensive-systems/repository/
[2] K.S. Bøgh et al., “Work-efficient parallel skyline computation for the GPU”, PVLDB, 8:9, 962–973. 2015.[3] K.S. Bøgh et al., “Efficient GPU-based skyline computation”, Proc. DaMoN, 2013.[4] S. Chester et al., “Scalable parallelization of skyline computation for multi-core processors”, ICDE, 2015.[7] J. Lee and S.-w. Hwang, “Scalable skyline computation using a balanced pivot selection technique”, Inf. Syst., 2014.
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 13 / 22
Experimental setup
Intel i7-3770 with 4 cores at 3.4Ghz and hyperthreading enabledNvidia GTX Titan with 2688 cores at 0.8 GhzTransfer of data to and from GPU are included in the running timeTree building is included in the running timeCompared algorithms:
I BSkyTree [7]: State-of-the-art sequential algorithmI Hybrid [4]: The proposed multicore algorithm (run with 8 threads)I GGS [3]: Previous state-of-the-art, tree-less GPU algorithmI SkyAlign [2]: The proposed GPU algorithm
Download all code: http://cs.au.dk/research-at-cs/data-intensive-systems/repository/
[2] K.S. Bøgh et al., “Work-efficient parallel skyline computation for the GPU”, PVLDB, 8:9, 962–973. 2015.[3] K.S. Bøgh et al., “Efficient GPU-based skyline computation”, Proc. DaMoN, 2013.[4] S. Chester et al., “Scalable parallelization of skyline computation for multi-core processors”, ICDE, 2015.[7] J. Lee and S.-w. Hwang, “Scalable skyline computation using a balanced pivot selection technique”, Inf. Syst., 2014.
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 13 / 22
Evaluating running time
1 2 4 6 8
104
106
Tim
e(m
s)
4 8 12 16102103104
1 2 4 6 8102
103
104
Cardinality, ×106
Tim
e(m
s)
4 8 12 16101102103104
Dimensionality
ANTICORRELATED
INDEPENDENT
BSkyTree Hybrid GGS SkyAlign
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 14 / 22
Evaluating dominance tests
1 2 4 6 8102
104
106
Dom
test
s/n
4 8 12 16101
103
105
1 2 4 6 8102103104105
Cardinality, ×106
Dom
test
s/n
4 8 12 16100102104
Dimensionality
ANTICORRELATED
INDEPENDENT
BSkyTree Hybrid GGS SkyAlign
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 15 / 22
Evaluating work
1 2 4 6 8
106
108
Wor
k
4 8 12 16103
105
107
1 2 4 6 8104
105
106
Cardinality, ×106
Wor
k
4 8 12 16101
104
107
Dimensionality
ANTICORRELATED
INDEPENDENT
BSkyTree Hybrid GGS SkyAlign
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 16 / 22
Evaluating running time on the CPU
1 2 4 6 8102
104
106
Tim
e(m
s)
4 8 12 16102
104
106
1 2 4 6 8102
103
104
Tim
e(m
s)
4 8 12 16102
104
106
ANTICORRELATED
INDEPENDENT
Hybrid GGS SkyAlign
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 17 / 22
Scalability
1 4 7 14 280
10
20
Tim
e(m
s)
1 4 8 16 320
102030
1 4 7 14 280
10
20
Cores, 2x14
Tim
e(m
s)
1 4 8 16 320
102030
Cores, 4x8
ANTICORRELATED
INDEPENDENT
Hybrid GGS SkyAlign
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 18 / 22
Evaluating Clock per instruction
1 2 4 6 8
0.4
0.6
CP
I
4 8 12 160.20.40.60.8
1
1 2 4 6 8
0.4
0.6
0.8
CP
I
4 8 12 16
0.5
1
1.5
ANTICORRELATED
INDEPENDENT
Hybrid GGS SkyAlign
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 19 / 22
Current research: The RUM conjecture
Trade-offs are present in all parts of computer scienceEach field have its own major components between whichtrade-offs are madeThe Data Systems Laboratory have recently formalized this datasystemsThe result is the RUM-conjecture
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 20 / 22
Current research: The RUM conjecture
Trade-offs are present in all parts of computer scienceEach field have its own major components between whichtrade-offs are madeThe Data Systems Laboratory have recently formalized this datasystemsThe result is the RUM-conjecture
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 20 / 22
Current research: The RUM conjecture
Trade-offs are present in all parts of computer scienceEach field have its own major components between whichtrade-offs are madeThe Data Systems Laboratory have recently formalized this datasystemsThe result is the RUM-conjecture
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 20 / 22
Current research: The RUM conjecture
Trade-offs are present in all parts of computer scienceEach field have its own major components between whichtrade-offs are madeThe Data Systems Laboratory have recently formalized this datasystemsThe result is the RUM-conjecture
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 20 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading data
Update-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read
Update
Memory
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating data
Memory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage used
Optimize for at most two - at the cost of the third
Read Update
Memory
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
8 4 9 1 5 0 2
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
8 4 9 1 5 0 2 3
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
8 4 7 1 5 0 2 3
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
0 1 2 4 5 8 9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
0 1 2 4 5 8 9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
0 1 2 3 4 5 8 9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
1 2 0 5 8 4 9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
1 2 0 5 8 4 9
<4 <9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
1 2 0 5 8 4 9
<4 <9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
1 2 0 8 4 5 9
<4 <9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
1 2 0 3 8 4 5 9
<4 <9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
1 2 0 G 5 8 4 G 9 G
<4 <9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Current research: The RUM conjecture
Read-overhead - The overhead of reading dataUpdate-overhead - The overhead of updating dataMemory-overhead - The additional storage usedOptimize for at most two - at the cost of the third
Read Update
Memory
1 2 0 3 5 8 4 G 9 G
<4 <9
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 21 / 22
Open questions
Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22
Open questions
Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22
Open questions
Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22
Open questions
Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22
Open questions
Which of the approaches is better?How many partitions should we choose?How should the partitions be distributed?How should ghost values be distributed?Can we extend this idea to indexes?
Kenneth S. Bøgh, Sean Chester, Ira Assent (Aarhus University)Parallel Skyline Computation HU, 11 Feb 2016 22 / 22