ingres/ vectorwise

24
Ingres/VectorWise Doug Inkster – Ingres Development

Upload: mickey

Post on 23-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Doug Inkster – Ingres Development. Ingres/ VectorWise. Abstract. Ingres Corp. recently announced a cooperative project with the database research firm VectorWise of Amsterdam. This session discusses the nature of the relationship between Ingres and VectorWise and its impact on Ingres users. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ingres/ VectorWise

Ingres/VectorWise

Doug Inkster – Ingres Development

Page 2: Ingres/ VectorWise

Abstract

Ingres Corp. recently announced a cooperative project with the database research firm VectorWise of Amsterdam. This session discusses the nature of the relationship between Ingres and VectorWise and its impact on Ingres users

Page 3: Ingres/ VectorWise

Overview

What is VectorWise Significance of project (to existing users) Column store v.s. row store VectorWise innovations Ingres/VectorWise interface details

Page 4: Ingres/ VectorWise

VectorWise

Small Dutch startup spun from CWI Currently 6-7 employees Exciting development project based on Ph. D. research of

Marcin Zukowski under guidance of Peter Boncz Ingres provided seed money and currently has exclusive

rights to their technology

Page 5: Ingres/ VectorWise

New Ingres Storage Type

VectorWise technology will take the form of new Ingres storage type

Column store – but turbo-charged Users will just define tables as usual, but with new

storage type Extremely fast

Page 6: Ingres/ VectorWise

Performance in Traditional RDBMS

TPC H query 1: 6 million rows MySQL: 26.2 seconds DBMS x: 28.1 seconds Hand coded C program: 0.2 seconds

Initiated thought about why traditional RDBMS is so slow

Page 7: Ingres/ VectorWise

Row Stores – Storage Model

Row store : data stored on disk and processed by query engine in the form of rows– Ingres and all other commercial RDBMS’ follow same

model– Data stored on disk as rows packed into blocks/pages– Extracted from pages and passed between query plan

operations (joins, sorts, etc.) as rows Techniques improved over time, but same basic

model as research engines of 1970’s (System R, Berkeley Ingres)

Page 8: Ingres/ VectorWise

Row Stores – Storage Model

Select a, b from c:– Even if 200 columns in c, reads whole row image for all

qualified rows– Inefficient use of disk (excessive I/O)– Inefficient use of memory (large row-sized buffers)

Page 9: Ingres/ VectorWise

Row Stores – Execution Model

Select a*b from c where d+e >= 250:– Execution model turns column expressions into

pseudo-code (ADF in Ingres)– Pseudo-code evaluation almost always by

interpretation – one row at a time– For each instruction, operand addresses are built

(using known buffers) – all row induced overhead– Big switch on “opcode” (integer addition, float

compare, type coercions, etc.)– Very poor locality (code and data), very inefficient– No benefit from modern hardware features (cache,

instruction pipelines, etc.)

Page 10: Ingres/ VectorWise

Row Stores - Performance

Poor disk bandwidth High instructions per tuple Poor locality (data and instructions), poor exploitation

of newer hardware – high cycles per instruction Extremely high cycles per tuple!! Designed for OLTP, not large scale analytical

processing

Page 11: Ingres/ VectorWise

Column Stores – Storage Model

Derived from research in 1990’s One of the earliest was MonetDB – Peter Boncz’s

thesis Data stored in columns (all values from a column in

contiguous storage) Ordinal position in column store dictates row Only columns actually requested are accessed

(compare to “select a, b from c” example) Far more efficient disk usage

Page 12: Ingres/ VectorWise

Column Stores – Execution Model

Some column stores (Sybase IQ, Vertica) just read columns from disk and re-compose rows in memory

Execution model same as row stores Only deal with I/O problem

Page 13: Ingres/ VectorWise

MonetDB – Improved Performance

Column store improved I/O performance New execution model improved CPU performance Column wise (not row wise) computation Exploitation of newer hardware, compiler features

Page 14: Ingres/ VectorWise

MonetDB – Improved Performance

CPU Efficiency depends on “nice” code– out-of-order execution– few dependencies (control,data)– compiler support

Compilers love simple loops over arrays– loop-pipelining– automatic SIMD

Page 15: Ingres/ VectorWise

MonetDB – Expression Execution

SELECT id, name, (age-30)*50 as bonusFROM peopleWHERE age > 30

/* Returns vector of oid’s satisfying comparison

** and count of entries in vector. */ int select_gt_float( oid* res, float* column,

float val, int n){ for(int j=0,i=0; i<n; i++) if (column[i] >val) res[j++] = i; return j;}

Compiles into loop that performs many (10, 100, 1000) comparisons in parallel

Page 16: Ingres/ VectorWise

MonetDB - Problem

Operations required on whole columns at a time Lots of expensive intermediate result materialization Memory requirements Doesn’t scale into very large databases Still way faster than anything before

– Q1: MySQL – 26.2 seconds– MonetDB - 3.7 seconds

Page 17: Ingres/ VectorWise

MonetDB/X100 - Innovations

Fix MonetDB scaling problem– Columns broken into “chunks” or vectors (sized to fit cache)– Expression operators execute on vectors

Most primitives take just 0.5 (!) to 10 cycles per tuple– 10-100+ times faster than tuple-at-a-time

Other performance improvements– Tuned to disk/memory/cache– Lightweight compression to maximize throughput– Cooperative scans (future)

Q1: MySQL – 26.2 seconds– MonetDB – 3.7 seconds– X100 – 0.6 seconds!

Page 18: Ingres/ VectorWise

MonetDB/X100 – Hardware Trends

CPU– Increased cache, memory (64-bit)– Instruction pipelining, multiple simultaneous

instructions, SIMD (single instruction, multiple data) Disk

– Increased disk capacity– Increased disk bandwidth– Increased random access speeds – but not as much

as bandwidth– Sequential access increasing in importance

Page 19: Ingres/ VectorWise

MonetDB/X100 - Innovations

Reduced interpretation overhead– 100x less Function Calls

Good CPU cache use– High locality in the primitives– Cache-conscious data placement

No Tuple Navigation– Primitives only see arrays

Vectorization allows algorithmic optimization CPU and compiler-friendly function bodies

– Multiple work units, loop-pipelining, SIMD…

Page 20: Ingres/ VectorWise

Ingres/VectorWise Integration

Ingres offers:– Infrastructure– User interfaces, SQL compilation, utilities, etc.– Existing user community

VectorWise offers:– Execution engine (QEF/DMF/ADF equivalent)– Its own relational algebraic language

Page 21: Ingres/ VectorWise

Ingres/VectorWise Integration

SQL query parsed, optimized in Ingres Ingres builds query plan from OPF-generated QEP

– Query plan contains cross-compiled VectorWise query syntax

QEF passes query to VectorWise engine (and gets out of the way!)

VectorWise passes result set back to Ingres to be returned to caller (array of result rows – mapped to Ingres fetch buffers, many rows at a time)

Page 22: Ingres/ VectorWise

Ingres/VectorWise Integration

Currently:– Create table including unique/referential constraint

definitions (with structure = …)– Copy table– Drop table– Select– Insert– Update– Delete– Create table … as select …– Insert into … select …– Create index (VectorWise style)

Page 23: Ingres/ VectorWise

Ingres/VectorWise – Initial Release

Individual queries limited to either VectorWise or Ingres native tables– Probably lifted in the near future

Seamless as possible– Integrated ingstart/ingstop– Same utilities– Same user interfaces– Same recovery processes

Page 24: Ingres/ VectorWise

Ingres/VectorWise - Conclusions

Tremendously exciting development in Ingres Industry leading technology Opens up new applications Reduces pressure for new hardware