a relational algebra processor
DESCRIPTION
A Relational Algebra Processor. 6.375 Final Project Ming Liu, Shuotao Xu. Motivation. Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU DBMS frequently used in analytics and scientific computing, but bottlenecked by: - PowerPoint PPT PresentationTRANSCRIPT
A Relational Algebra Processor
6.375 Final ProjectMing Liu, Shuotao Xu
2
Motivation
Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU
DBMS frequently used in analytics and scientific computing, but bottlenecked by: Processor speed, software overhead, latency &
bandwidth Proposal: FPGA Based Relational Algebra
ProcessorHost PC
(DBMS)
FPGARelational Algebra
ProcessorPhysical Storage
3
Background|Relational Algebra (RA) Many database queries are fundamentally
decomposable to five basic RA operators Although SQL is capable of much more
Operator FunctionsSelection Filter rows based on a Boolean conditionProjection Eliminate selected attributes (columns) of a table;
remove duplicated resultsCartesian Product
Combine several tables with unique attributes
Union Combine several tables with the same attributesDifference Select rows of several tables where the rows do not
matchDesign dedicated processors on the FPGA for each operator
4
Project Goal
Design and implement an in-memory relational algebra processor on the FPGA
Explore the types of queries that can benefit from FPGA acceleration
Secondary: Outperform SQLite!
Some assumptions: 32-bit wide table entries Tables fit in memory Max number of columns is 32 Read only
5
Microarchitecture | Host Software
FPGA
6
Microarchitecture | Top-Level RAProcessor
Host PC(C++
functions)
RA Processo
r DRAMPCIe
Host PC(DBMS)
RA Processo
r
Physical
Storage
7
Microarchitecture | Row Marshaller
Exposes a simple interface for operators to access tables in DRAM
Address translation, burst aggregation, truncation & alignment
Multiplexes requests Table values
sent/received as 32-bit bursts
8
Microarchitecture | Selection
Filters rows based on predicates (e.g. age < 40)
16 predicate evaluators Internally comparators
A tree of gates to qualify the predicates Max: 4 ORs of 4 ANDs
9
Microarchitecture | Projection Select columns of a table Column mask one-hot encoded Do not need to buffer row; operate directly on
data bursts
10
Microarchitecture | Binary Operators Cartesian Product, Union, Difference and
Deduplication Nested loop implementation
11
Microarchitecture|Inter-operator Bypassing Operators enabled
concurrently; data passed between operators No intermediate storage
Conditions: 1. A singly link of unary operators2. Each operator has a single
target output3. No structural hazard
Software reorders and schedules the RA commands Data source/destination encoded
in command
12
Microarchitecture|Inter-operator Bypassing Multiple 32-bit wide output FIFOs to other
operators
13
Implementation Evaluation Timing
Maximum Frequency: 55.786MHz Critical Path: Row Marshaller mux
Area Slice Registers: 50% LUTs: 85% BRAM/FIFOs: 47%
Modules Slice Registers LUTs BRAM/FIFOs TOTAL 34649 (50%) 59328 (85%) 71 (47%) Row Mashaller 2804 6627 0 Controller 4570 6277 29 Selection 3137 19633 0 Projection 739 654 0 Cartesian Product
1935 1478 0
Union 1939 1983 0 Difference 1875 1949 0 Deduplication 1822 1970 0
14
Performance Benchmark | Setup SQLite
Internal SQLite timer to report execution time of the query Thinkpad T430, Core i7-3520M @ 2.90Ghz, 1x8GB DDR3-1600
RA Processor Performance counters: cycles from start to ack of an operatorTable Relational Algebra Query SQL Query
1 table100k x 30
SELECT,starLong,tableOut, mass,>,80000,AND,pos_x,>,10, OR,pos_x,<,pos_z, OR,col12,>,col14, AND,col20,<,col21
SELECT * FROM starLongWHERE mass > 80000 AND pos_x > 10 OR pos_x < pos_z OR col12 > col14 AND col20 < col21;
1 table100k x 30
PROJECT,starLong,tableOut,pos_x,col19,col25,col29
SELECT pos_x,col19, col25, col29 FROM starLong;
2 tables1k x 30
UNION,starMed1,starMed2,starUnion SELECT * FROM starMed1 UNION SELECT * FROM starMed2;
2 tables1k x 30
XPROD,starMed1,starMed2,starXprodRENAME,starXprod,0,iOrder0,1,mass0,8,
phi0SELECT,starXprod,starFiltered,
iOrder0,=,iOrder, AND,phi0,>,1,AND,mass0,>,mass
PROJECT,starFiltered,starOut,mass0
SELECT s1.mass FROM starMed1 s1, starMed2 s2WHERE s1.vx > s2.vx AND s1.phi > 1 AND s1.mass > s2.mass;
15
Performance Benchmark | Results
Select Project Union Difference Xprod Dedup Complex Join0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Query Execution Time
FPGA RA Processor SW SQLite
Query
Tim
e (s
) - L
ower
is b
ette
r
Limitation: Memory Bandwidth: 200MB/s vs 12.8GB/s
16
Performance Benchmark | Results
1 2 4 8 160
0.02
0.04
0.06
0.08
0.1
0.12
Select (Filter) Execution Time with Varying Number of Predicates
FPGA RA Processor SW SQLite
Number of Predicates
Tim
e (s
) - L
ower
is b
ette
r Select operator most competitive with SQLite
What happens with more predicates?
17
Improvements
Increasing data burst width 32-bit to 256-bit: potential 8x
speedup Area/critical path increase
Maximizing memory bandwidth Additional row buffers to buffer
data from DDR2 Memory Larger, faster DRAM;
Higher clock speed
18
Conclusion & Future Work Complex filtering operations performs well on
the FPGA Better than SQLite with sufficient memory
bandwidth Data intensive operators do not perform well Future opportunities:
An accelerator alongside SQLite Integration with HDD/SSD controller