m.kersten 2008 1 monetdb/sql : the challenges of a scientific database, milena ivanova, niels nes,...
TRANSCRIPT
M.Kersten 2008 1
MonetDB/SQL :the Challenges of a Scientific Database,
Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten
CWI, Amsterdam
M.Kersten 2008 3
Outline
• MonetDB/SQL, a column-store• Breaking the Memory Wall in MonetDB
Communication ACM, Dec 2008
• Database Cracking• SkyServer porting• Challenges
M.Kersten 2008 6
John 32 HoustonOK
Early 80s: tuple storage structures for PCs were simple
Mary 31 HoustonOK
Easy to access at the cost of wasted space
Try to keep things simple
M.Kersten 2008 7
Slotted pages Logical pages equated physical pages
32 John Houston
31 Mary Houston
Try to keep things simple
M.Kersten 2008 8
Slotted pages Logical pages equated multiple physical pages
32 John Houston
31 Mary Houston
Try to keep things simple
M.Kersten 2008 10
A column orientation is as simple and acts like an array
Attributes of a tuple are correlated by offset
Avoid moving too much around
M.Kersten 2008 11
• MonetDB Binary Association TablesID Day Discount
10 4/4/98 0.19511 9/4/98 0.06512 1/2/98 0.17513 7/2/98 0
OID ID100 10101 11102 12103 13104 14
OID Day100 4/4/98101 9/4/98102 1/2/98103 7/2/98104 1/2/99
OID Discount100 0.195101 0.065102 0.175103 0104 0.065
Try to keep things simple
M.Kersten 2008 12
Physical data organization• Binary Association Tables
head tail
100 10101 11102 12103 13104 14
Bat Unitfixed size
Densesequence
Memory mappedfiles
Try to avoid doing things twice
M.Kersten 2008 13
• Binary Association Tables acceleratorsOID ID
100 10101 11102 12103 13104 14Hash-based
access
Try to avoid doing things twice
Column properties:key-nessnon-nulldenseordered
M.Kersten 2008 14
• Binary Association Tables storage controlOID ID
100 Amsterdam101 Seattle102 New York103 London104 Paris
ID
AmsterdamSeattleNew YorkLondon
OID ID
100
101
102
103
104
A BAT can be usedas an encoding table
A VID datatypecan be used torepresent denseenumerations
Type remappingsare used to squeezespace
100
Try to avoid doing things twice
M.Kersten 2008 15
• Column orientation benefits datawarehousing
• Brings a much tighter packaging and improves transport through the memory hierarchy
• Each column can be more easily optimized for storage using compression schemes
• Each column can be replicated for read-only access
Mantra: Try to keep things simple
M.Kersten 2008 16
Try to maximize performance
Paste
Present
PotencyMaterialize All Model
Vectorizedmodel
Volcanomodel
M.Kersten 2008 17
Volcano Refresher
Query
SELECT name, salary*.19 AS tax
FROMemployee
WHERE age > 25
Try to maximize performance
M.Kersten 2008 18
Volcano Refresher
Operators
Iterator interface-open()-next(): tuple-close()
Try to maximize performance
M.Kersten 2008 19
• The Volcano model is based on a simple pull-based iterator model for programming relational operators.
• The Volcano model minimizes the amount of intermediate store
• The Volcano model is CPU intensive and inefficient
Try to maximize performance
Volcano paradigm
M.Kersten 2008 20
MonetDB paradigm
• The MonetDB kernel is a programmable relational algebra machine
• Relational operators operate on ‘array’-like structures
• Based on experiences in database machines 25 years ago, RAP, CASSM,…ICL RAP… PRISMA… IDIOMS…
Try to use simple a software pattern
M.Kersten 2008 21
SQL
MonetDB Server
MonetDB Kernel
XQuery
MAL
function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); X16 := calc.oid(0@0); X18 := algebra.markT(X15,X16); X19 := bat.reverse(X18); X20 := aggr.count(X19); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,"");end s3_1;
select count(*) from photoobjall;
Try to use simple a software pattern
M.Kersten 2008 22
Operator implementation
• All algebraic operators materialize their result
• Local optimization decisions
• Heavy use of code expansion to reduce cost• 55 selection routines• 149 unary operations• 335 join/group operations• 134 multi-join operations• 72 aggregate operations
Try to use simple a software pattern
M.Kersten 2008 24
Micro-benchmark
MonetDB/SQL 0.34 N 44
MySQL 25.1 N 238
PostgreSQL 10.6 N 1230
Commercial 1 39.0 N 800
Commercial 2 17 N 150
In milliseconds/10KFixed cost in ms
• Keeping the query result in a new table is often too expensive
select * into tmp from tapestry where attr1>=0 and attr1 <=@range
create table tmp( attr0 int, attr1 int);insert into tmp select * from tapestry
where attr1>=0 and attr1 <=@range;
M.Kersten 2008 25
Multi-column tapestry
Experiments ran on Athlon 1.4, Linux
commercial
MonetDB/SQL
#joins
ms
M.Kersten 2008 26
• A column store should be designed from scratch to benefit from its characteristics
• Simulation of a column store on top of an n-ary system using the Volcano model does not work
M.Kersten 2008 27
Try to maximize performance
Paste
Present
Potency
Execution Paradigm
DatabaseStructures
Queryoptimizer
M.Kersten 2008 28
• Applications have different characteristics• Platforms have different characteristics• The actual state of computation is crucial
• A generic all-encompassing optimizer cost-
model does not work
Try to avoid the search space trap
M.Kersten 2008 29
SQL
MonetDB Server
MonetDB Kernel
XQuery
MAL
MAL
Operational optimizer:– Exploit everything you know at runtime– Re-organize if necessary
Try to disambiguate decisions
M.Kersten 2008 30
SQL
MonetDB Server
MonetDB Kernel
XQuery
MAL
MAL
Strategic optimizer:– Exploit the semantics of the language– Rely on heuristics
Operational optimizer:– Exploit everything you know at runtime– Re-organize if necessary
Try to disambiguate decisions
M.Kersten 2008 31
SQL
MonetDB Server
Tactical Optimizer
MonetDB Kernel
XQuery
MAL
MAL
y1:bat[:oid,:dbl]:= bpm.take("sys_photoobjall_ra");y2 := bpm.new(:oid,:oid);barrier rs:= bpm.newIterator(y1,A0,A1); t1:= algebra.uselect(rs,A0,A1); bpm.addSegment(y2,t1);redo rs:= bpm.hasMoreElements(y1,A0,A1);exit rs;
x1:bat[:oid,:dbl]:= sql.bind("sys","photoobjall","ra",0);x14:= algebra.uselect(x1,A0,A1);
Tactical MAL optimizer:– No changes in front-ends and no direct human guidance– Minimal changes in the engine
Try to disambiguate decisions
M.Kersten 2008 32
Code Inliner. Constant Expression Evaluator.
Accumulator Evaluations.Strength Reduction. Common Term Optimizer.
Join Path Optimizer. Ranges Propagation. Operator Cost Reduction. Foreign Key handling. Aggregate Groups.
Code Parallizer. Replication Manager. Result Recycler.
MAL Compiler. Dynamic Query Scheduler. Memo-based Execution. Vector Execution.
Alias Removal. Dead Code Removal. Garbage Collector.
Try to disambiguate decisions
M.Kersten 2008 33
Try to maximize performance
Paste
Present
Potency
Execution Paradigm
DatabaseStructures
Queryoptimizer
M.Kersten 2008 34
Execution paradigms
• The MonetDB kernel is set up to accommodate different execution engines
• The MonetDB assembler program is • Interpreted in the order presented• Interpreted in a dataflow driven manner• Compiled into a C program• Vectorised processing
• X100 project
No data from persistent store to the memory trash
M.Kersten 2008 35
MonetDB/x100
Combine Volcano model withvector processing.
All vectors together should fit the CPU cache
Vectors are compressed
Optimizer should tune this,given the query characteristics.
ColumnBM (buffer manager)
X100 query engine
CPUcache
networkedColumnBM-s
RAM
M.Kersten 2008 36
• Varying the vector size on TPC-H query 1
mysql, oracle,
db2
X100
MonetDB
low IPC, overhead
RAM bandwidth
bound
No data from persistent store to the memory trash
M.Kersten 2008 37
• Vectorized-Volcano processing can be used for both multi-core and distributed processing
• The architecture and the parameters are influenced heavily by• Hardware characteristics• Data distribution to compress columns
No data from persistent store to the memory trash
M.Kersten 2008 38
Does MonetDB stand a ‘real’ test?
Is the main memory orientation a bottleneck?
Is it functionally complete?
The proof of the pudding is in the eating
M.Kersten 2008 39
TPC-H
ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory
TPC-H 60K rows line_item table
Comfortably fit in memoryPerformance in milliseconds
M.Kersten 2008 40
TPC-H
ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory
Scale-factor 16M row line-item table
Out of the box performanceQueries produce empty
or erroneous results
M.Kersten 2008 43
• Code base for MonetDB/SQL is 1.2M lines of C• Nightly regression testing on 17 platforms
M.Kersten 2008 44
If you want to play with a generic column-store, we recommend you download the academic version of Vertica, the commercialization of the C-Store project, or download MonetDB, an open-source column-store.
M.Kersten 2008 45
Try to maximize performance
Paste
Present
Potency
CrackingB-tree, HashIndices
MaterializedViews
M.Kersten 2008 46
• Indices in database systems focus on:
• All tuples are equally important for fast retrieval
• There are ample resources to maintain indices
• MonetDB cracks the database into pieces based on actual query load
Find a trusted fortune teller
M.Kersten 2008
Cracking algorithms
Physical reorganization happens per column based on selection predicates.
Split a piece of a column in two new pieces
A<10
A>=10
A<10
M.Kersten 2008
Cracking algorithms
Physical reorganization happens per column
Split a piece of a column in two new pieces
Split a piece of a column in three new pieces
A<10
A>=10
A<10
5<A<10
A>=10
5<A<10
A<5
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12 >=10
>=10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
>=10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
>=10
<=5
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
>=10
<=5
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
>=10
<=5
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
>=10
<=5
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
>=10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
>=10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<=5
<=5
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<=5
>5 and <10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6 2
15
13
4
17
12
<=5
>5 and <10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6 2
15
13
4
17
12
<=5
>5 and <10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<=5
>5 and <10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
>5 and <10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
13
4
17
12
<= 5
>= 10
> 5
15
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
M.Kersten 2008
racking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
>3 and <14
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
>3 and <14
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
>3 and <14
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
>3 and <14
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
> 3
>= 10
> 5
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
> 3
>= 10
> 5
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
> 3
>= 10
> 5
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
1712
> 3
>= 10
> 5
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
1712
> 3
>= 10
> 5
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
> 3
>= 10
> 5
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
> 3
>= 10
> 5
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
> 3
>= 10
> 5
<=3
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
> 3
>= 14
> 5
<=3
>=10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
> 3
>= 14
> 5
<=3
>=10
M.Kersten 2008
Cracking example
3
8
6
2
15
13
4
17
12
select A>5 and A<10
3
8
6
2
15
13
4
17
12
<= 5
>= 10
> 5
Improve data access for
future queries
select A>3 and A<14
3
8
6
2
15
13
4
17
12
>3
>= 14
> 5
<=3
>=10
The more we crack the more
we learn
M.Kersten 2008
Design
The first time a range query is posed on an attribute A, a cracking DBMS makes a copy of column A, called the cracker column of A A cracker column is continuously physically reorganized based on queries that need to touch attribute such as the result is in a contiguous space
For each cracker column, there is a cracker index
Cracker Index
Cracker Column
M.Kersten 2008 91
• Cracking is easy in a column store and is part of the critical execution path
• Cracking works under high volume updates
Try to avoid useless investments
M.Kersten 2008
Updates
Base columns are updated as normally
We need to update the cracker column and the cracker index
Efficiently
Maintain the self-organization properties
Two issues: When How
M.Kersten 2008
When to propagate updates in cracking
Follow the workload to maintain self-organization
Updates become part of query processing
When an update arrives, it is not applied
For each cracker column there is a pending insertions column and a pending deletions column
Pending updates are applied only when a query needs the specific values
M.Kersten 2008
Updates aware select
We extended the cracker select operator to apply the needed updates before cracking
The select operator:1. Search the pending insertions column2. Search the pending deletions column3. If Steps 1 or 2 find tuples run an update algorithm4. Search the cracker index5. Physically reorganize the cracker column6. Update the cracker index7. Return a slice of the cracker column
M.Kersten 2008
Merging
7
2
10
29
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >12
Start position: 1values: >1
Insert a new tuple with value 9
The new tuple belongs to the blue piece 9
M.Kersten 2008
Merging
7
2
10
29
25
31
57
42
53
Start position: 8values: >35
Start position: 5values: >12
Start position: 1values: >1
Insert a new tuple with value 9
The new tuple belongs to the blue piece
9
Pieces in the cracker column are ordered
Tuples inside a piece are not ordered
Shifting is not a viable solution
M.Kersten 2008
Merging by Hopping
7
2
10
29
25
31
42
53 Start position: 8values: >35
Start position: 4values: >12
Start position: 1values: >1
57
9Insert a new tuple with value 9
We need to make enough room to fit the new tuples
M.Kersten 2008
Merge Gradually
A query merges only the qualifying values, i.e., only the values that it needs for a correct and complete result
Average cost increases significantly
We avoid the large peaks but...
Merge CompletelyMerge Gradually
M.Kersten 2008
The Ripple
7
2
10
29
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1
Touch only the pieces that are relevant for the current query
M.Kersten 2008
The Ripple
7
2
10
29
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1
Select 7<= A< 15Touch only the pieces that are relevant for the current query
M.Kersten 2008
The Ripple
7
2
10
29
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1
Select 7<= A< 15
5
9
16
35
Pending insertions
Touch only the pieces that are relevant for the current query
M.Kersten 2008
The Ripple
7
2
10
29
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1 5
9
16
35
Pending insertions
Touch only the pieces that are relevant for the current querySelect 7<= A< 15
M.Kersten 2008
The Ripple
7
2
10
29
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1 5
9
16
35
Pending insertions
Touch only the pieces that are relevant for the current querySelect 7<= A< 15
M.Kersten 2008
The Ripple
7
2
10
29
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1 5
9
16
35
Pending insertions
Touch only the pieces that are relevant for the current querySelect 7<= A< 15
M.Kersten 2008
The Ripple
7
2
10
29
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1 5
9
16
35
Pending insertions
Touch only the pieces that are relevant for the current query
Avoid shifting down non interesting pieces
Select 7<= A< 15
M.Kersten 2008
The Ripple
7
2
10
29
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1 5
9
16
35
Pending insertions
Touch only the pieces that are relevant for the current query
Avoid shifting down non interesting pieces
Select 7<= A< 15
M.Kersten 2008
The Ripple
7
2
1029
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1 5
9
16
35
Pending insertions
Touch only the pieces that are relevant for the current query
Immediately make room for the new tuples
Avoid shifting down non interesting pieces
Select 7<= A< 15
M.Kersten 2008
The Ripple
7
2
1029
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1 5
916
35
Pending insertions
Touch only the pieces that are relevant for the current query
Immediately make room for the new tuples
Avoid shifting down non interesting pieces
Select 7<= A< 15
M.Kersten 2008
The Ripple
7
2
1029
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1 5
916
35
Pending insertions
Touch only the pieces that are relevant for the current query
Immediately make room for the new tuples
Avoid shifting down non interesting pieces
Select 7<= A< 15
M.Kersten 2008
The Ripple
7
2
10
25
31
57
42
53
Start position: 7values: >35
Start position: 4values: >22
Start position: 1values: >1 5
916
35
Pending insertions
29
Touch only the pieces that are relevant for the current query
Immediately make room for the new tuples
Avoid shifting down non interesting pieces
Select 7<= A< 15
M.Kersten 2008
The Ripple
7
2
10
25
31
57
42
53
Start position: 7values: >35
Start position: 5values: >22
Start position: 1values: >1 5
916
35
Pending insertions
29
Touch only the pieces that are relevant for the current query
Immediately make room for the new tuples
Avoid shifting down non interesting pieces
Select 7<= A< 15
M.Kersten 2008
The Ripple
7
2
10
25
31
57
42
53
Start position: 7values: >35
Start position: 5values: >22
Start position: 1values: >1 5
916
35
Pending insertions
29
Touch only the pieces that are relevant for the current query
Immediately make room for the new tuples
Avoid shifting down non interesting pieces
Select 7<= A< 15
M.Kersten 2008
The Ripple
Maintain high performance through the whole query sequence in a self-organizing way
M.Kersten 2008
The Ripple
Maintain high performance through the whole query sequence in a self-organizing way
Merge Gradually Merge Completely
Merge Ripple
M.Kersten 2008 115
MonetDB/SQL Meets SkyServer:the Challenges of a Scientific Database,
Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten
CWI, Amsterdam
M.Kersten 2008 116
SkyServer with MonetDB
Goal: To provide SkyServer mirror with similar functionality using MonetDB
Three phases: 1%, 10%, entire SDSS data setCan we • Do better in terms of performance and functionality?• Improve query processing by novel parallelism and query
cracking techniques?• Extend functionality to support, e.g. LOFAR?
M.Kersten 2008 117
Portability Lessons
• Need for rich SQL environment (PSM)
• Cast to SQL:2003 standard• Replacement of data types and operations• Specific extensions ignored or replaced
• Avoid data redundancy• Auxiliary tables replaced by views:10% size
reduction
M.Kersten 2008 118
SkyServer Schema
446 columns377 million rows
Vertical fragment of 100+ popular columns
Materialized join of Photo and Spectra
6 columns12B rows
M.Kersten 2008 119
Spatial Search Lesson
• HTM (Hierarchical Triangular Mesh)• Implemented in C++, C#• Good for point-near-point and point-in-region queries
• Zones• Implemented in SQL• Good for point-near-point (x3)• Efficient for batch-oriented spatial join(x32)• Enables SQL optimizer usage
M.Kersten 2008 120
Query Log Lessons
• Query logs important for both application and science
• Analysed 1.2M queries, August 2006• Spatial access prevails (83%)• Small core of photo and spectro tables
accessed• 64% photo, 44% spectro, 27% both
M.Kersten 2008 121
Common Patterns
• Limited number of query patterns • Correlation to web site interface
• Most popular query (25%)SELECT top 10 p.objID, p.run, p.rerun, p.camcol, p.field,
p.obj, p.type, p.ra, p.dec, p.u, p.g, p.r, p.i, p.z, p.Err_u, p.Err_g, p.Err_r, p.Err_i, p.Err_z
FROM fGetNearbyObjEq(195,2.5,3) n, PhotoPrimary pWHERE n.objID = p.objID;
M.Kersten 2008 123
Spatial Overlap
• 24% queries overlap• Mean sequence length of 9.4,
max of 6200• Overlap and equality patterns for script-based
interaction• Zoom in/zoom out patterns for manual
interaction
M.Kersten 2008 125
Evaluation on 100GB
• ‘Color-cut’ for low-z quasarsSELECT g, run, rerun, camcol, field, objID, FROM GalaxyWHERE ( ( g <= 22) and
(u - g >= -0.27) and (u - g < 0.71) and (g - r >= -0.24) and (g - r < 0.35) and(r - i >= -0.27) and (r - i < 0.57) and(i - z >= -0.35) and (i - z < 0.7) );
• Moving asteroidsSELECT objID,
sqrt(power(rowv,2) + power(colv,2)) as velocityFROM PhotoObjWHERE power(rowv,2) + power(colv,2) > 50
and rowv >= 0 and colv >= 0;
1135,532
405
Quasars Asteroids
Tim
e in
se
c.
MonetDB MS SQL
M.Kersten 2008 126
Status
• Staircase to the sky• Jun 07 1GB• Jan 08 100GB• Oct 08 Entire DR6:
loaded and running MonetDB MS SQL 1.6 TB 2.6 TB Q2 0:18 0:01 Q3 5:32 2:47 Q4 1:14 2:46 Q5 1:09 10:15 Q6 1:03 0:10 Q7 0:40 0:01 Q8 0:26 1:05 9:56 16:34
M.Kersten 2008 127
Scale out to the Petabyte
• Large compute platforms are not well used by database engines; they have static designs
• Develop novel techniques to partial replicate hot data throughout a cluster/network of thousands of processors adaptively
• Replicate – Compute- Aggregate query plans to reach for massive parallel processing under convergence criteria (answer comes given enough time and resources)
M.Kersten 2008 128
Scale out to the Petabyte
• Loading scientific data is a major cost
• Develop adaptive caching techniques that interface with external sources and map them to database structures on the fly
• NETCDF, HDF2
M.Kersten 2008 129
Arrays as first class citizens
• The set-based computational model underlying SQL is insufficient, extend it to include the (sparse) array model
• Matlab-like operations scale to TB with just in time processingCREATE ARRAY A1(idx INTEGER DIMENSION[4],val FLOAT);
CREATE ARRAY A1(idx INTEGER NOT NULL PRIMARY KEYCHECK(idx>=0 AND idx<4),val FLOAT);
CREATE SEQUENCE range AS INTEGERSTART WITH 0INCREMENT BY 1MAXVALUE 3 ;
CREATE ARRAY A3 (idx range),val FLOAT);
M.Kersten 2008 131
• MonetDB Research & Development line• XQuery and Information Retrieval• Astronomy databases (SkyServer)• Event Stream Engine• Relational Cache Effectiveness• Multi-core Parallelism• DataStorage Rings• RDF Engine• ….
• Array Databases become en vogue
M.Kersten 2008 132
Summary
• MonetDB is a mature column store
• Cracking based on physical re-organization in the critical path can be made to work
• Let’s scale up to 100TB database(!) processing
M.Kersten 2008 134
Martin Kersten Peter Boncz Niels Nes Stefan Manegold Fabian Groffen Sjoerd Mullender Steffen GoeldnerArjen de VriesMenzo WindhouwerTim RuhlRomulo Goncalves
Jan Rittinger Wouter Alink Jennie Zhang Stratos IdreosErietta LiarouLefteris SidirourgosFlorian Waas Albrecht Schmidt Jonas Karlsson Martin van Dinther Peter Bosch Carel van den Berg Wilco Quak
Acknowledgements
Alex van Ballegooij Johan List Georgina Ramirez Marcin Zukowski Roberto CornacchiaSandor HemanTorsten Grust Jens Teubner Maurice van Keulen Jan Flokstra Milena Ivanova
MonetDB/{SQL,XQuery} open-source platformMonetDB/SQL Version 5 experimentation platformMonetDB/X100 aimed at cpu & IO squeezingMonetDB/RAM aimed at IR and science DBMonetDB/Armada aimed at evolving databasesMonetDB/SkyServer portal for Astronomy