b17 eliminating the database bottleneck
TRANSCRIPT
![Page 1: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/1.jpg)
Eliminating the Database Bottleneck What makes Vectorwise so fast
Mark Van de Wiel
Thursday, November 01, 2012
Director Product Management, Vectorwise
1 of 9 1 of 9 1 of 9 1 of 9 1 of 9 Confidential © 2012 Actian Corporation
![Page 2: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/2.jpg)
Agenda
2
Why traditional RDBMSs are slow for analytics
Why Vectorwise is fast
The I/O challenge
Efficient updates
Confidential © 2012 Actian Corporation
![Page 3: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/3.jpg)
100x (+) Performance Difference – 2003 Custom C versus Relational Database
Confidential © 2012 Actian Corporation 3
26.2 28.1
0.2 0.6 0
5
10
15
20
25
30
MySQL DBMS 'X' C program Vectorwise
TPC-H 1 GB query 1 (runtime in s)
MySQL DBMS 'X' C program Vectorwise
![Page 4: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/4.jpg)
Traditional Relational Database for Analytics Inefficiencies
Confidential © 2012 Actian Corporation 4
Inefficient storage
Inefficient processing
![Page 5: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/5.jpg)
Inefficient Storage for Analytics
Confidential © 2012 Actian Corporation 5
Row-based storage model Predominant in 2003, still very common today
Works well for OLTP
Joe 101 27 Black
Edward 103 21 Scissorhand
![Page 6: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/6.jpg)
Inefficient Storage – Row-based
Confidential © 2012 Actian Corporation 6
Pages on disk – example
27 Black Joe
21 Scissorhand Edward
101
103
pointers to tuples
Var-width attribute pointers
![Page 7: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/7.jpg)
Issues with Row-based Storage
Confidential © 2012 Actian Corporation 7
Always read all attributes Poor bandwidth
Poor use of memory buffer
Complex row structure and navigation E.g. compressing out null fields
E.g. row chaining
![Page 8: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/8.jpg)
Efficient Storage for Analytics
Confidential © 2012 Actian Corporation 8
Columnar storage: store attributes separtely
Retrieve only attributes required by the query
Used by “traditional” column stores, e.g. Sybase IQ, Vertica
![Page 9: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/9.jpg)
Inefficient Processing
Confidential © 2012 Actian Corporation 9
How a traditional database runs a query
Query: SELECT name, salary*.19 AS tax FROM employee WHERE age > 25
![Page 10: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/10.jpg)
Inefficient Processing
Confidential © 2012 Actian Corporation 10
How a traditional database runs a query
Tuple-at-a-time iterator interface: - open() - next(): tuple - close() next() is called: - for each operator - for each tuple Complex code repeated over and over
![Page 11: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/11.jpg)
Inefficient Processing
Confidential © 2012 Actian Corporation 11
How a traditional database runs a query
Data-specific computational functionality Called once for every operation on every tuple Worse for complex tuple representations
![Page 12: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/12.jpg)
Inefficient Processing (Part 1 of 2)
Confidential © 2012 Actian Corporation 12
Lots of repeated, unnecessary code Operator logic
Function calls
Attribute access
Most instructions interpreting a query
Very few instructions processing actual data!
Many instructions per tuple
![Page 13: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/13.jpg)
CPU Features – Inefficient Processing Part 2
Confidential © 2012 Actian Corporation 13
In the last 20 years… Chip cache because RAM access is too slow and congested
Branch-sensitive CPU pipelines
Superscalar features
SIMD instructions (SSE and AVX)
Great for multimedia processing, scientific computing…
… but NOT for traditional relational databases Complex code: function calls, branches
Poor use of CPU cache (both data and instructions)
Processing one value at a time
![Page 14: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/14.jpg)
Inefficient Processing
Confidential © 2012 Actian Corporation 14
Traditional RDBMS Many instructions per tuple
Many cycles per instruction
Very many cycles per tuple
![Page 15: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/15.jpg)
Vectorwise – Vector-based Processing
Confidential © 2012 Actian Corporation 15
Query: SELECT name, salary*.19 AS tax FROM employee WHERE age > 25
![Page 16: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/16.jpg)
Vectorwise – Vector-based Processing
Confidential © 2012 Actian Corporation 16
Vector contains data of multiple tuples (1024) All operations consume and produce entire vectors Effect: much less operator.next() and primitive calls. AND: pipelined query evaluation
![Page 17: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/17.jpg)
Why is Vectorwise so Fast?
Confidential © 2012 Actian Corporation 17
Reduced interpretation overhead 100+ times fewer function calls
Good CPU cache use High locality in primitives
Cache-conscious algorithms
No tuple navigation Primitives only see arrays
Vectorization allows algorithmic optimization CPU and compiler-friendly function bodies
Multiple work units, loop-pipelining, SIMD…
BONUS: PARALLEL QUERY
![Page 18: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/18.jpg)
Some Numbers
Confidential © 2012 Actian Corporation 18
Traditional RDBMS: <200 MB/s per core
Vectorwise (lab environment): >1.5 GB/s per core
![Page 19: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/19.jpg)
Addressing the I/O Challenge
Confidential © 2012 Actian Corporation 19
Columnar storage
Smart column buffer (memory)
Data compression On disk: less I/O
In memory: best use of column buffer
Ultra-efficient decompression algorithms to get sufficient throughput
Large contiguous data blocks for optimum disk I/O
In-memory min-max indexes per block (i.e. per column) Eliminate data blocks based on implicit/explicit filter criteria
![Page 20: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/20.jpg)
Efficient Updates in a Column Store
Confidential © 2012 Actian Corporation 20
Positional Delta Trees (PDTs) In-memory representation of small data changes
Efficiently merged with on-disk data
Periodically propagated to disk
Provide snapshot read consistency
ACID compliant
![Page 21: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/21.jpg)
Agenda
21
Why traditional RDBMSs are slow for analytics
Why Vectorwise is fast
The I/O challenge
Efficient updates
Confidential © 2012 Actian Corporation
![Page 22: B17 Eliminating the database bottleneck](https://reader033.vdocuments.site/reader033/viewer/2022052322/557567bcd8b42a2e248b49e6/html5/thumbnails/22.jpg)
Confidential © 2012 Actian Corporation