implementing database operations using simd instructions ... · the problem databases have become...
TRANSCRIPT
![Page 1: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/1.jpg)
Implementing Database Operations
Using SIMD Instructions
By: Jingren Zhou, Kenneth A. Ross
Presented by: Ioan Stefanovici
CSC2531: Advanced Topics in Database Systems, Fall2011
![Page 2: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/2.jpg)
The Problem
Databases have become bottlenecked on CPU and
memory performance
Need to fully utilize available architectures’
features to maximize performance
Cache performance
e.g.: cache-conscious B+ trees, PAX, etc.
Proposal: use SIMD instructions
![Page 3: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/3.jpg)
Single-Instruction, Multiple-Data (SIMD)
X0 X1 X2 X3
Y0 Y1 Y2 Y3
X0 OP Y0 X1 OP Y1 X2 OP Y2 X3 OP Y3
OP OP OP OP
![Page 4: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/4.jpg)
Single-Instruction, Multiple-Data (SIMD)
X0 X1 X2 X3
Y0 Y1 Y2 Y3
X0 OP Y0 X1 OP Y1 X2 OP Y2 X3 OP Y3
OP OP OP OPSame
Operation
Let S = #operands (degree of parallelism)
![Page 5: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/5.jpg)
Single-Instruction, Multiple-Data (SIMD)
Focus
Goal
Achieve speed-ups close to (or higher!) than S (the degree of parallelization)
![Page 6: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/6.jpg)
Outline
Motivation & Problem Statement
SIMD Instructions and Implementation Details
Algorithm Improvements:
Scan algorithms
Index traversals
Join algorithms
![Page 7: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/7.jpg)
A few points...
Compiler auto-parallelization is difficult
Explicit use of SIMD instructions
SIMD data alignment
Column-oriented storage
Targets
Scan-like operations
Index traversals
Join algorithms
![Page 8: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/8.jpg)
Comparison Result Example
Want to perform: X < Y
0x00000001 0x00000003 0x00000004 0x00000007
0x00000002 0x00000003 0x00000005 0x00000006
0xFFFFFFFF 0x00000000 0xFFFFFFFF 0x00000000
< < < <
X
Y
![Page 9: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/9.jpg)
Comparison Result Example
Want to perform: X < Y
0x00000001 0x00000003 0x00000004 0x00000007
0x00000002 0x00000003 0x00000005 0x00000006
0xFFFFFFFF 0x00000000 0xFFFFFFFF 0x00000000
< < < <
X
Y
1 0 1 0
SIMD_bit_vector
![Page 10: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/10.jpg)
Scan
Typical scan:for i = 1 to N{
if (condition(x[i])) then process1(y[i]);
else process2(y[i]);
}
y (data)x (condition)
...... ...x1
x2
x3
x4
x5
x6
y1
y2
y3
y4
y5
y6
![Page 11: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/11.jpg)
SIMD Scan
Typical SIMD scan:
for i = 1 to N step S {Mask[1..S] = SIMD_condition(x[i..i+S-1]);SIMD_Process(Mask[1..S], y[i..i+S-1]);
}
x1
x2
x3
x4
x5
x6
y1
y2
y3
y4
y5
y6
y (data)x (condition)
...... ...
For S=4
![Page 12: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/12.jpg)
Scan: Return First Match
SIMD Return First Match
SIMD_Process(mask[1..S], y[1..S]){V = SIMD_bit_vector(mask);/* V = number between 0 and 2^S-1 */if (V != 0){
for j = 1 to Sif ( (V >> (S-j)) & 1 ) /* jth bit */
{ result = y[j]; return; }}}
![Page 13: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/13.jpg)
Scan: Return All Matches
SIMD All Matches Alternative 1
SIMD All Matches Alternative 2
SIMD_Process(mask[1..S], y[1..S]){V = SIMD_bit_vector(mask);/* V = number between 0 and 2^S-1 */if (V != 0){
for j = 1 to Sif ( (V >> (S-j)) & 1 ) /* jth bit */
{ result[pos++] = y[j]; }}
SIMD_Process(mask[1..S], y[1..S]){V = SIMD_bit_vector(mask);/* V = number between 0 and 2^S-1 */if (V != 0){
for j = 1 to Stmp = (V >> (S-j)) & 1 /* jth bit */result[pos] = y[j];pos += tmp; } }
}
![Page 14: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/14.jpg)
Scan: Return All Matches Performance
![Page 15: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/15.jpg)
Index Structures (B+ trees)
(Source: Wikipedia)
Log2 (n)
Height
Example of a B+ -tree internal node
![Page 16: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/16.jpg)
Internal Node Search
5 Ways to Search
Binary Search (SISD)
SIMD Binary Search
SIMD Sequential Search 1
SIMD Sequential Search 2
Hybrid Search
![Page 17: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/17.jpg)
Internal Node Search
Naive SIMD Binary Search (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
![Page 18: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/18.jpg)
Internal Node Search
Naive SIMD Binary Search (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
0 0 0 0
![Page 19: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/19.jpg)
Internal Node Search
Naive SIMD Binary Search (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
0 0 0 0
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
0 1 0 0 Got it!
![Page 20: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/20.jpg)
Internal Node Search
SIMD Sequential Search 1 (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
![Page 21: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/21.jpg)
Internal Node Search
SIMD Sequential Search 1 (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
1 1 1 0
≤ 4
Total ≤ 4:
3
![Page 22: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/22.jpg)
Internal Node Search
SIMD Sequential Search 1 (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
1 1 1 0
≤ 4
Total ≤ 4:
3
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
0 0 0 0
≤ 4
Total ≤ 4:
3
![Page 23: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/23.jpg)
Internal Node Search
SIMD Sequential Search 1 (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
0 0 0 0
≤ 4
Total ≤ 4:
3
![Page 24: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/24.jpg)
Internal Node Search
SIMD Sequential Search 1 (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
0 0 0 0
≤ 4
Total ≤ 4:
3
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
0 0 0 0
≤ 4
Total ≤ 4:
3 Got it!
![Page 25: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/25.jpg)
Internal Node Search
SIMD Sequential Search 2 (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
![Page 26: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/26.jpg)
Internal Node Search
SIMD Sequential Search 2 (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
1 1 1 0
≤ 4
Total ≤ 4:
3 Is there a key > the search key in the SIMD unit?Yes! Got it!
![Page 27: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/27.jpg)
Internal Node Search
SIMD Sequential Search 2 (looking for “4”)
Pro: processes fewer keys (50% fewer on average)
Con: extra conditional test
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
1 1 1 0
≤ 4
Total ≤ 4:
3 Is there a key > the search key in the SIMD unit?Yes! Got it!
![Page 28: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/28.jpg)
Internal Node Search
Hybrid Search (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
Pick some L (say L = 3)
...
![Page 29: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/29.jpg)
Internal Node Search
Hybrid Search (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
Pick some L (say L = 3)
...
Binary Search on last element of each “segment”
![Page 30: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/30.jpg)
Internal Node Search
Hybrid Search (looking for “4”)
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32
Pick some L (say L = 3)
...
Binary Search on last element of each “segment”
1 3 4 5 7 8 10 13 14 17 19 20 23 24 25 32 ...
Sequential SIMD scan inside the correct segment
![Page 31: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/31.jpg)
Internal Node Search Performance
![Page 32: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/32.jpg)
Internal Node Search – Branch Misprediction
![Page 33: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/33.jpg)
Nested Loop Join – O(n2)
Nested Loop
2
4
1
16
9
3
18
2
34
80
5
4
80
8
9
7
10
Outer Loop Inner Loop
![Page 34: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/34.jpg)
Nested Loop Join – O(n2)
SISD Algorithm
2
4
1
16
9
3
18
2
34
80
5
4
80
8
9
7
10
Outer Loop Inner Loop
Iterate 1
at a time
Iterate 1
at a time
![Page 35: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/35.jpg)
Nested Loop Join – O(n2)
SIMD Duplicate-Outer
2
4
1
16
9
3
18
2
34
80
5
4
80
8
9
7
10
Outer Loop Inner Loop
Fix & duplicate
S timesIterate S
at a time
![Page 36: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/36.jpg)
Nested Loop Join – O(n2)
SIMD Duplicate-Inner
2
4
1
16
9
3
18
2
34
80
5
4
80
8
9
7
10
Outer Loop Inner Loop
Fix & duplicate
S times
Iterate S
at a time
![Page 37: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/37.jpg)
Nested Loop Join – O(n2)
SIMD Rotate-Inner (Rotate & Compare S times)
2
4
1
16
9
3
18
2
34
80
5
4
80
8
9
7
10
Outer Loop Inner Loop
Iterate S
at a timeIterate S
at a time
![Page 38: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/38.jpg)
Nested Loop Join – Performance
QueriesQ1. SELECT ... FROM R, S WHERE R.Key = S.Key (integer)Q2. SELECT ... FROM R, S WHERE R.Key = S.Key (floating-point)Q3. SELECT ... FROM R, S WHERE R.Key < S.Key < 1.01 * R.KeyQ4. SELECT ... FROM R, S WHERE R.Key < S.Key < R.Key + 5
![Page 39: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/39.jpg)
Nested Loop Join Branch Misprediction
![Page 40: Implementing Database Operations Using SIMD Instructions ... · The Problem Databases have become bottlenecked on CPU and memory performance Need to fully utilize available architectures’](https://reader034.vdocuments.site/reader034/viewer/2022042909/5f3a2143ad60ac495d0c16b6/html5/thumbnails/40.jpg)
Conclusion
Thank you!
?Questions