m.kersten 2008 1 monetdb/sql : the challenges of a scientific database, milena ivanova, niels nes,...

131
M.Kersten 2008 1 MonetDB/SQL : the Challenges of a Scientific Database, Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam

Upload: joan-gallagher

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

M.Kersten 2008 1

MonetDB/SQL :the Challenges of a Scientific Database,

Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten

CWI, Amsterdam

M.Kersten 2008 2

M.Kersten 2008 3

Outline

• MonetDB/SQL, a column-store• Breaking the Memory Wall in MonetDB

Communication ACM, Dec 2008

• Database Cracking• SkyServer porting• Challenges

M.Kersten 2008 5

Paste

Present

PotencyPAX stores

N-ary stores Column

stores

M.Kersten 2008 6

John 32 HoustonOK

Early 80s: tuple storage structures for PCs were simple

Mary 31 HoustonOK

Easy to access at the cost of wasted space

Try to keep things simple

M.Kersten 2008 7

Slotted pages Logical pages equated physical pages

32 John Houston

31 Mary Houston

Try to keep things simple

M.Kersten 2008 8

Slotted pages Logical pages equated multiple physical pages

32 John Houston

31 Mary Houston

Try to keep things simple

M.Kersten 2008 9

Not all attributes are equally important

Avoid things you don’t always need

M.Kersten 2008 10

A column orientation is as simple and acts like an array

Attributes of a tuple are correlated by offset

Avoid moving too much around

M.Kersten 2008 11

• MonetDB Binary Association TablesID Day Discount

10 4/4/98 0.19511 9/4/98 0.06512 1/2/98 0.17513 7/2/98 0

OID ID100 10101 11102 12103 13104 14

OID Day100 4/4/98101 9/4/98102 1/2/98103 7/2/98104 1/2/99

OID Discount100 0.195101 0.065102 0.175103 0104 0.065

Try to keep things simple

M.Kersten 2008 12

Physical data organization• Binary Association Tables

head tail

100 10101 11102 12103 13104 14

Bat Unitfixed size

Densesequence

Memory mappedfiles

Try to avoid doing things twice

M.Kersten 2008 13

• Binary Association Tables acceleratorsOID ID

100 10101 11102 12103 13104 14Hash-based

access

Try to avoid doing things twice

Column properties:key-nessnon-nulldenseordered

M.Kersten 2008 14

• Binary Association Tables storage controlOID ID

100 Amsterdam101 Seattle102 New York103 London104 Paris

ID

AmsterdamSeattleNew YorkLondon

OID ID

100

101

102

103

104

A BAT can be usedas an encoding table

A VID datatypecan be used torepresent denseenumerations

Type remappingsare used to squeezespace

100

Try to avoid doing things twice

M.Kersten 2008 15

• Column orientation benefits datawarehousing

• Brings a much tighter packaging and improves transport through the memory hierarchy

• Each column can be more easily optimized for storage using compression schemes

• Each column can be replicated for read-only access

Mantra: Try to keep things simple

M.Kersten 2008 16

Try to maximize performance

Paste

Present

PotencyMaterialize All Model

Vectorizedmodel

Volcanomodel

M.Kersten 2008 17

Volcano Refresher

Query

SELECT name, salary*.19 AS tax

FROMemployee

WHERE age > 25

Try to maximize performance

M.Kersten 2008 18

Volcano Refresher

Operators

Iterator interface-open()-next(): tuple-close()

Try to maximize performance

M.Kersten 2008 19

• The Volcano model is based on a simple pull-based iterator model for programming relational operators.

• The Volcano model minimizes the amount of intermediate store

• The Volcano model is CPU intensive and inefficient

Try to maximize performance

Volcano paradigm

M.Kersten 2008 20

MonetDB paradigm

• The MonetDB kernel is a programmable relational algebra machine

• Relational operators operate on ‘array’-like structures

• Based on experiences in database machines 25 years ago, RAP, CASSM,…ICL RAP… PRISMA… IDIOMS…

Try to use simple a software pattern

M.Kersten 2008 21

SQL

MonetDB Server

MonetDB Kernel

XQuery

MAL

function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); X16 := calc.oid(0@0); X18 := algebra.markT(X15,X16); X19 := bat.reverse(X18); X20 := aggr.count(X19); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,"");end s3_1;

select count(*) from photoobjall;

Try to use simple a software pattern

M.Kersten 2008 22

Operator implementation

• All algebraic operators materialize their result

• Local optimization decisions

• Heavy use of code expansion to reduce cost• 55 selection routines• 149 unary operations• 335 join/group operations• 134 multi-join operations• 72 aggregate operations

Try to use simple a software pattern

M.Kersten 2008 24

Micro-benchmark

MonetDB/SQL 0.34 N 44

MySQL 25.1 N 238

PostgreSQL 10.6 N 1230

Commercial 1 39.0 N 800

Commercial 2 17 N 150

In milliseconds/10KFixed cost in ms

• Keeping the query result in a new table is often too expensive

select * into tmp from tapestry where attr1>=0 and attr1 <=@range

create table tmp( attr0 int, attr1 int);insert into tmp select * from tapestry

where attr1>=0 and attr1 <=@range;

M.Kersten 2008 25

Multi-column tapestry

Experiments ran on Athlon 1.4, Linux

commercial

MonetDB/SQL

#joins

ms

M.Kersten 2008 26

• A column store should be designed from scratch to benefit from its characteristics

• Simulation of a column store on top of an n-ary system using the Volcano model does not work

M.Kersten 2008 27

Try to maximize performance

Paste

Present

Potency

Execution Paradigm

DatabaseStructures

Queryoptimizer

M.Kersten 2008 28

• Applications have different characteristics• Platforms have different characteristics• The actual state of computation is crucial

• A generic all-encompassing optimizer cost-

model does not work

Try to avoid the search space trap

M.Kersten 2008 29

SQL

MonetDB Server

MonetDB Kernel

XQuery

MAL

MAL

Operational optimizer:– Exploit everything you know at runtime– Re-organize if necessary

Try to disambiguate decisions

M.Kersten 2008 30

SQL

MonetDB Server

MonetDB Kernel

XQuery

MAL

MAL

Strategic optimizer:– Exploit the semantics of the language– Rely on heuristics

Operational optimizer:– Exploit everything you know at runtime– Re-organize if necessary

Try to disambiguate decisions

M.Kersten 2008 31

SQL

MonetDB Server

Tactical Optimizer

MonetDB Kernel

XQuery

MAL

MAL

y1:bat[:oid,:dbl]:= bpm.take("sys_photoobjall_ra");y2 := bpm.new(:oid,:oid);barrier rs:= bpm.newIterator(y1,A0,A1); t1:= algebra.uselect(rs,A0,A1); bpm.addSegment(y2,t1);redo rs:= bpm.hasMoreElements(y1,A0,A1);exit rs;

x1:bat[:oid,:dbl]:= sql.bind("sys","photoobjall","ra",0);x14:= algebra.uselect(x1,A0,A1);

Tactical MAL optimizer:– No changes in front-ends and no direct human guidance– Minimal changes in the engine

Try to disambiguate decisions

M.Kersten 2008 32

Code Inliner. Constant Expression Evaluator.

Accumulator Evaluations.Strength Reduction. Common Term Optimizer.

Join Path Optimizer. Ranges Propagation. Operator Cost Reduction. Foreign Key handling. Aggregate Groups.

Code Parallizer. Replication Manager. Result Recycler.

MAL Compiler. Dynamic Query Scheduler. Memo-based Execution. Vector Execution.

Alias Removal. Dead Code Removal. Garbage Collector.

Try to disambiguate decisions

M.Kersten 2008 33

Try to maximize performance

Paste

Present

Potency

Execution Paradigm

DatabaseStructures

Queryoptimizer

M.Kersten 2008 34

Execution paradigms

• The MonetDB kernel is set up to accommodate different execution engines

• The MonetDB assembler program is • Interpreted in the order presented• Interpreted in a dataflow driven manner• Compiled into a C program• Vectorised processing

• X100 project

No data from persistent store to the memory trash

M.Kersten 2008 35

MonetDB/x100

Combine Volcano model withvector processing.

All vectors together should fit the CPU cache

Vectors are compressed

Optimizer should tune this,given the query characteristics.

ColumnBM (buffer manager)

X100 query engine

CPUcache

networkedColumnBM-s

RAM

M.Kersten 2008 36

• Varying the vector size on TPC-H query 1

mysql, oracle,

db2

X100

MonetDB

low IPC, overhead

RAM bandwidth

bound

No data from persistent store to the memory trash

M.Kersten 2008 37

• Vectorized-Volcano processing can be used for both multi-core and distributed processing

• The architecture and the parameters are influenced heavily by• Hardware characteristics• Data distribution to compress columns

No data from persistent store to the memory trash

M.Kersten 2008 38

Does MonetDB stand a ‘real’ test?

Is the main memory orientation a bottleneck?

Is it functionally complete?

The proof of the pudding is in the eating

M.Kersten 2008 39

TPC-H

ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

TPC-H 60K rows line_item table

Comfortably fit in memoryPerformance in milliseconds

M.Kersten 2008 40

TPC-H

ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

Scale-factor 16M row line-item table

Out of the box performanceQueries produce empty

or erroneous results

M.Kersten 2008 41

TPC-H

ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

M.Kersten 2008 42

TPC-H

ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

M.Kersten 2008 43

• Code base for MonetDB/SQL is 1.2M lines of C• Nightly regression testing on 17 platforms

M.Kersten 2008 44

If you want to play with a generic column-store, we recommend you download the academic version of Vertica, the commercialization of the C-Store project, or download MonetDB, an open-source column-store.

M.Kersten 2008 45

Try to maximize performance

Paste

Present

Potency

CrackingB-tree, HashIndices

MaterializedViews

M.Kersten 2008 46

• Indices in database systems focus on:

• All tuples are equally important for fast retrieval

• There are ample resources to maintain indices

• MonetDB cracks the database into pieces based on actual query load

Find a trusted fortune teller

M.Kersten 2008

Cracking algorithms

Physical reorganization happens per column based on selection predicates.

Split a piece of a column in two new pieces

A<10

A>=10

A<10

M.Kersten 2008

Cracking algorithms

Physical reorganization happens per column

Split a piece of a column in two new pieces

Split a piece of a column in three new pieces

A<10

A>=10

A<10

5<A<10

A>=10

5<A<10

A<5

M.Kersten 2008

Cracking example

3

8

6

2

12

13

4

17

15

select A>5 and A<10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12 >=10

>=10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

>=10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

>=10

<=5

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

>=10

<=5

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

>=10

<=5

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

>=10

<=5

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

>=10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

>=10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<=5

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<=5

<=5

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<=5

>5 and <10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6 2

15

13

4

17

12

<=5

>5 and <10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6 2

15

13

4

17

12

<=5

>5 and <10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<=5

>5 and <10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

>5 and <10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

13

4

17

12

<= 5

>= 10

> 5

15

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

M.Kersten 2008

racking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

>3 and <14

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

>3 and <14

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

>3 and <14

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

>3 and <14

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

> 3

>= 10

> 5

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

> 3

>= 10

> 5

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

> 3

>= 10

> 5

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

1712

> 3

>= 10

> 5

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

1712

> 3

>= 10

> 5

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

> 3

>= 10

> 5

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

> 3

>= 10

> 5

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

> 3

>= 10

> 5

<=3

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

> 3

>= 14

> 5

<=3

>=10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

> 3

>= 14

> 5

<=3

>=10

M.Kersten 2008

Cracking example

3

8

6

2

15

13

4

17

12

select A>5 and A<10

3

8

6

2

15

13

4

17

12

<= 5

>= 10

> 5

Improve data access for

future queries

select A>3 and A<14

3

8

6

2

15

13

4

17

12

>3

>= 14

> 5

<=3

>=10

The more we crack the more

we learn

M.Kersten 2008

Design

The first time a range query is posed on an attribute A, a cracking DBMS makes a copy of column A, called the cracker column of A A cracker column is continuously physically reorganized based on queries that need to touch attribute such as the result is in a contiguous space

For each cracker column, there is a cracker index

Cracker Index

Cracker Column

M.Kersten 2008

A simple range queryTry to avoid useless investments

M.Kersten 2008

TPC-H query 6

Try to avoid useless investments

M.Kersten 2008 91

• Cracking is easy in a column store and is part of the critical execution path

• Cracking works under high volume updates

Try to avoid useless investments

M.Kersten 2008

Updates

Base columns are updated as normally

We need to update the cracker column and the cracker index

Efficiently

Maintain the self-organization properties

Two issues: When How

M.Kersten 2008

When to propagate updates in cracking

Follow the workload to maintain self-organization

Updates become part of query processing

When an update arrives, it is not applied

For each cracker column there is a pending insertions column and a pending deletions column

Pending updates are applied only when a query needs the specific values

M.Kersten 2008

Updates aware select

We extended the cracker select operator to apply the needed updates before cracking

The select operator:1. Search the pending insertions column2. Search the pending deletions column3. If Steps 1 or 2 find tuples run an update algorithm4. Search the cracker index5. Physically reorganize the cracker column6. Update the cracker index7. Return a slice of the cracker column

M.Kersten 2008

Merging

7

2

10

29

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >12

Start position: 1values: >1

Insert a new tuple with value 9

The new tuple belongs to the blue piece 9

M.Kersten 2008

Merging

7

2

10

29

25

31

57

42

53

Start position: 8values: >35

Start position: 5values: >12

Start position: 1values: >1

Insert a new tuple with value 9

The new tuple belongs to the blue piece

9

Pieces in the cracker column are ordered

Tuples inside a piece are not ordered

Shifting is not a viable solution

M.Kersten 2008

Merging by Hopping

7

2

10

29

25

31

42

53 Start position: 8values: >35

Start position: 4values: >12

Start position: 1values: >1

57

9Insert a new tuple with value 9

We need to make enough room to fit the new tuples

M.Kersten 2008

Merge Gradually

A query merges only the qualifying values, i.e., only the values that it needs for a correct and complete result

Average cost increases significantly

We avoid the large peaks but...

Merge CompletelyMerge Gradually

M.Kersten 2008

The Ripple

7

2

10

29

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1

Touch only the pieces that are relevant for the current query

M.Kersten 2008

The Ripple

7

2

10

29

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1

Select 7<= A< 15Touch only the pieces that are relevant for the current query

M.Kersten 2008

The Ripple

7

2

10

29

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1

Select 7<= A< 15

5

9

16

35

Pending insertions

Touch only the pieces that are relevant for the current query

M.Kersten 2008

The Ripple

7

2

10

29

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1 5

9

16

35

Pending insertions

Touch only the pieces that are relevant for the current querySelect 7<= A< 15

M.Kersten 2008

The Ripple

7

2

10

29

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1 5

9

16

35

Pending insertions

Touch only the pieces that are relevant for the current querySelect 7<= A< 15

M.Kersten 2008

The Ripple

7

2

10

29

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1 5

9

16

35

Pending insertions

Touch only the pieces that are relevant for the current querySelect 7<= A< 15

M.Kersten 2008

The Ripple

7

2

10

29

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1 5

9

16

35

Pending insertions

Touch only the pieces that are relevant for the current query

Avoid shifting down non interesting pieces

Select 7<= A< 15

M.Kersten 2008

The Ripple

7

2

10

29

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1 5

9

16

35

Pending insertions

Touch only the pieces that are relevant for the current query

Avoid shifting down non interesting pieces

Select 7<= A< 15

M.Kersten 2008

The Ripple

7

2

1029

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1 5

9

16

35

Pending insertions

Touch only the pieces that are relevant for the current query

Immediately make room for the new tuples

Avoid shifting down non interesting pieces

Select 7<= A< 15

M.Kersten 2008

The Ripple

7

2

1029

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1 5

916

35

Pending insertions

Touch only the pieces that are relevant for the current query

Immediately make room for the new tuples

Avoid shifting down non interesting pieces

Select 7<= A< 15

M.Kersten 2008

The Ripple

7

2

1029

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1 5

916

35

Pending insertions

Touch only the pieces that are relevant for the current query

Immediately make room for the new tuples

Avoid shifting down non interesting pieces

Select 7<= A< 15

M.Kersten 2008

The Ripple

7

2

10

25

31

57

42

53

Start position: 7values: >35

Start position: 4values: >22

Start position: 1values: >1 5

916

35

Pending insertions

29

Touch only the pieces that are relevant for the current query

Immediately make room for the new tuples

Avoid shifting down non interesting pieces

Select 7<= A< 15

M.Kersten 2008

The Ripple

7

2

10

25

31

57

42

53

Start position: 7values: >35

Start position: 5values: >22

Start position: 1values: >1 5

916

35

Pending insertions

29

Touch only the pieces that are relevant for the current query

Immediately make room for the new tuples

Avoid shifting down non interesting pieces

Select 7<= A< 15

M.Kersten 2008

The Ripple

7

2

10

25

31

57

42

53

Start position: 7values: >35

Start position: 5values: >22

Start position: 1values: >1 5

916

35

Pending insertions

29

Touch only the pieces that are relevant for the current query

Immediately make room for the new tuples

Avoid shifting down non interesting pieces

Select 7<= A< 15

M.Kersten 2008

The Ripple

Maintain high performance through the whole query sequence in a self-organizing way

M.Kersten 2008

The Ripple

Maintain high performance through the whole query sequence in a self-organizing way

Merge Gradually Merge Completely

Merge Ripple

M.Kersten 2008 115

MonetDB/SQL Meets SkyServer:the Challenges of a Scientific Database,

Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten

CWI, Amsterdam

M.Kersten 2008 116

SkyServer with MonetDB

Goal: To provide SkyServer mirror with similar functionality using MonetDB

Three phases: 1%, 10%, entire SDSS data setCan we • Do better in terms of performance and functionality?• Improve query processing by novel parallelism and query

cracking techniques?• Extend functionality to support, e.g. LOFAR?

M.Kersten 2008 117

Portability Lessons

• Need for rich SQL environment (PSM)

• Cast to SQL:2003 standard• Replacement of data types and operations• Specific extensions ignored or replaced

• Avoid data redundancy• Auxiliary tables replaced by views:10% size

reduction

M.Kersten 2008 118

SkyServer Schema

446 columns377 million rows

Vertical fragment of 100+ popular columns

Materialized join of Photo and Spectra

6 columns12B rows

M.Kersten 2008 119

Spatial Search Lesson

• HTM (Hierarchical Triangular Mesh)• Implemented in C++, C#• Good for point-near-point and point-in-region queries

• Zones• Implemented in SQL• Good for point-near-point (x3)• Efficient for batch-oriented spatial join(x32)• Enables SQL optimizer usage

M.Kersten 2008 120

Query Log Lessons

• Query logs important for both application and science

• Analysed 1.2M queries, August 2006• Spatial access prevails (83%)• Small core of photo and spectro tables

accessed• 64% photo, 44% spectro, 27% both

M.Kersten 2008 121

Common Patterns

• Limited number of query patterns • Correlation to web site interface

• Most popular query (25%)SELECT top 10 p.objID, p.run, p.rerun, p.camcol, p.field,

p.obj, p.type, p.ra, p.dec, p.u, p.g, p.r, p.i, p.z, p.Err_u, p.Err_g, p.Err_r, p.Err_i, p.Err_z

FROM fGetNearbyObjEq(195,2.5,3) n, PhotoPrimary pWHERE n.objID = p.objID;

M.Kersten 2008 122

Query Coverage

M.Kersten 2008 123

Spatial Overlap

• 24% queries overlap• Mean sequence length of 9.4,

max of 6200• Overlap and equality patterns for script-based

interaction• Zoom in/zoom out patterns for manual

interaction

M.Kersten 2008 124

Evaluation on 1GB

M.Kersten 2008 125

Evaluation on 100GB

• ‘Color-cut’ for low-z quasarsSELECT g, run, rerun, camcol, field, objID, FROM GalaxyWHERE ( ( g <= 22) and

(u - g >= -0.27) and (u - g < 0.71) and (g - r >= -0.24) and (g - r < 0.35) and(r - i >= -0.27) and (r - i < 0.57) and(i - z >= -0.35) and (i - z < 0.7) );

• Moving asteroidsSELECT objID,

sqrt(power(rowv,2) + power(colv,2)) as velocityFROM PhotoObjWHERE power(rowv,2) + power(colv,2) > 50

and rowv >= 0 and colv >= 0;

1135,532

405

Quasars Asteroids

Tim

e in

se

c.

MonetDB MS SQL

M.Kersten 2008 126

Status

• Staircase to the sky• Jun 07 1GB• Jan 08 100GB• Oct 08 Entire DR6:

loaded and running MonetDB MS SQL 1.6 TB 2.6 TB Q2 0:18 0:01 Q3 5:32 2:47 Q4 1:14 2:46 Q5 1:09 10:15 Q6 1:03 0:10 Q7 0:40 0:01 Q8 0:26 1:05 9:56 16:34

M.Kersten 2008 127

Scale out to the Petabyte

• Large compute platforms are not well used by database engines; they have static designs

• Develop novel techniques to partial replicate hot data throughout a cluster/network of thousands of processors adaptively

• Replicate – Compute- Aggregate query plans to reach for massive parallel processing under convergence criteria (answer comes given enough time and resources)

M.Kersten 2008 128

Scale out to the Petabyte

• Loading scientific data is a major cost

• Develop adaptive caching techniques that interface with external sources and map them to database structures on the fly

• NETCDF, HDF2

M.Kersten 2008 129

Arrays as first class citizens

• The set-based computational model underlying SQL is insufficient, extend it to include the (sparse) array model

• Matlab-like operations scale to TB with just in time processingCREATE ARRAY A1(idx INTEGER DIMENSION[4],val FLOAT);

CREATE ARRAY A1(idx INTEGER NOT NULL PRIMARY KEYCHECK(idx>=0 AND idx<4),val FLOAT);

CREATE SEQUENCE range AS INTEGERSTART WITH 0INCREMENT BY 1MAXVALUE 3 ;

CREATE ARRAY A3 (idx range),val FLOAT);

M.Kersten 2008 131

• MonetDB Research & Development line• XQuery and Information Retrieval• Astronomy databases (SkyServer)• Event Stream Engine• Relational Cache Effectiveness• Multi-core Parallelism• DataStorage Rings• RDF Engine• ….

• Array Databases become en vogue

M.Kersten 2008 132

Summary

• MonetDB is a mature column store

• Cracking based on physical re-organization in the critical path can be made to work

• Let’s scale up to 100TB database(!) processing

M.Kersten 2008 133

CstoreSQLserver

DB2

PostgreSQL

MySQL

Whoa MonetDB !Speed lines !

M.Kersten 2008 134

Martin Kersten Peter Boncz Niels Nes Stefan Manegold Fabian Groffen Sjoerd Mullender Steffen GoeldnerArjen de VriesMenzo WindhouwerTim RuhlRomulo Goncalves

Jan Rittinger Wouter Alink Jennie Zhang Stratos IdreosErietta LiarouLefteris SidirourgosFlorian Waas Albrecht Schmidt Jonas Karlsson Martin van Dinther Peter Bosch Carel van den Berg Wilco Quak

Acknowledgements

Alex van Ballegooij Johan List Georgina Ramirez Marcin Zukowski Roberto CornacchiaSandor HemanTorsten Grust Jens Teubner Maurice van Keulen Jan Flokstra Milena Ivanova

MonetDB/{SQL,XQuery} open-source platformMonetDB/SQL Version 5 experimentation platformMonetDB/X100 aimed at cpu & IO squeezingMonetDB/RAM aimed at IR and science DBMonetDB/Armada aimed at evolving databasesMonetDB/SkyServer portal for Astronomy