performance tradeoffs in read-optimized databases stavros harizopoulos * mit csail joint work with:...

22
Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos* MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts institute of technology *seeking an academic or research lab positio in 2007

Upload: annabelle-merry-harrell

Post on 31-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

Performance Tradeoffs in Read-Optimized Databases

Stavros Harizopoulos*MIT CSAIL

joint work with:Velen Liang, Daniel Abadi, and Sam Madden

massachusetts institute of technology

*seeking an academicor research lab position

in 2007

Page 2: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 2

Read-optimized databases

45

…37

Joe

…Sue

1

…2

column stores

1 Joe 45

… … …2 Sue 37

row stores

Sybase IQMonetDBCStore

SQL ServerDB2Oracle

Materialized views, multiple indices, compressionRead optimizations:

How does column-orientation affect performance?

Page 3: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 3

Rows vs. columns

column datarow data

1 Joe 45

2 Sue 37… … … single

file

project

Joe 45

1 2 …

JoeSue

4537……

3 files

Joe

45reconstruct

Joe 45

Study performance tradeoffs solely in data storage

seek

Page 4: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 4

Performance study• Methodology

– Built storage manager from scratch– Sequential scans– Analyze CPU, disk, memory

• Findings– Columns are generally more I/O efficient– Competing traffic favors columns– Conditions where columns are CPU-constrained– Conditions where rows are MemBW-constrained

Page 5: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 5

Talk outline• System architecture

• Workload and Experiments

• Analysis

• Conclusions

Page 6: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 6

System architecture• Block-iterator operators

– Single-threaded, C++, Linux AIO

• No buffer pool– Use filesystem, bypass OS cache

• Compression

• Dense-pack60% full 100% full

Page 7: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 7

Storage engine

S

SELECT name, ageWHERE age > 40

applypredicate(s)

Joe 45… …

S

S

#POS 45#POS …

Joe 45… …

applypredicate #1

row scanner column scanner

age

name

Page 8: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 8

Platform

3.2GHz

CPU L2 RAM

1MB 1GB180 MB/sec3.2 GB/sec

DISKS

direct IO

100msread

10msseek

L2 cacheprefetching

read 128 bytes

(striped)

prefetching:

Page 9: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 9

Workload• LINEITEM (wide)

– 60m rows → 9.5 GB

• ORDERS (narrow)– 60m rows → 1.9 GB

• Query

150 bytes 50 bytes

32 bytes 12 bytes

SELECT a1, a2, a3, …WHERE a1 yields variable selectivity

Page 10: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 10

Wide tuple: 10% selectivity

selected bytes per tuple

time

(sec

)

0

10

20

30

40

50

60

4 20 36 52 68 84 100 116 132 148

• Large prefetch hides disk seeks in columns

Row

Row (CPU only)

Column (CPU only)

Column

25B 10B 69B

int4B

text text text

char1B

Page 11: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 11

Wide tuple: 10% sel. (CPU)tim

e (s

ec)

row store

0

2

4

6

8

10

12

1 16

Other stalls (user)

Memory stalls (user)

Busy (user)

System

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

# attributes selectedcolumn store

• Row-CPU suffers from memory stalls

Page 12: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 12

0

2

4

6

8

10

12

1 16

Other stalls (user)

Memory stalls (user)

Busy (user)

System

• Column-CPU efficiency with lower selectivity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Wide tuple: 10% sel. (CPU)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0.1%

# attributes selectedcolumn store

time

(sec

)

row store

Page 13: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 13

Narrow tuple: 10% selectivity

• Memory stalls disappear in narrow tuples

• Compression: similar to narrow (not shown)

0

2

46

8

10

12

4 8 12 16 20 24 28 32

RowColumn

1 2 3 4 5 6 7

time

(sec

)

selected bytes per tuple# attributes selected

0

24

68

1012

1 7

Other

Memory

CPU user

CPU system

row store column store

Page 14: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 14

Varying prefetch size

• No prefetching hurts columns in single scans

0

10

20

30

40

4 8 12 16 20 24 28 32

time

(sec

)

no competingdisk traffic

selected bytes per tuple

Row (any prefetch size)

Column 48 (x 128KB)Column 16

Column 8

Column 2

Page 15: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 15

Varying prefetch size

• No prefetching hurts columns in single scans

• Under competing traffic, columns outperform rows for any prefetch size

0

10

20

30

40

4 8 12 16 20 24 28 32

no competingdisk traffic

with competing disk traffic

0

10

20

30

40

4 12 20 28

Column, 48Row, 48

0

10

20

30

40

4 12 20 28

Column, 8Row, 8

selected bytes per tuple

time

(sec

)

Page 16: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 16

Analysis• Central parameter in analysis:

cycles per disk byte (cpdb)

• What can it model:• More / fewer disks• More / fewer CPUs• CPU / disk competing traffic

• Trends in cpdb:• 10 → 30 from 1995 to 2006• Further increase with multicore chips

Page 17: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 17

Analysis

• Rows favored by narrow tuples and low cpdb– Disk-bound workloads have higher cpdb

8 12 16 20 24 28 32 369

18

36

72

14410% selectivity50% projection

tuple width

cycl

es p

er d

isk

byte

speedup ofcols over rows

2

1.6 – 2

1.2 – 1.6

0.8 – 1.2

0.4 – 0.8

(cpdb)

Page 18: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 18

See our paper for the rest• CPU time breakdowns, L2 prefetcher

• Disk prefetching implementation

• Compression results

• Non-pipelined column scanner

• Analysis

Page 19: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 19

Conclusions• Given enough space for prefetching,

columns outperform rows in most workloads

• Competing traffic favors columns

• Memory-bandwidth bottleneck in rows

• Future work– Column scanners, random I/O, write performance

Page 20: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 20

Thank you

db.csail.mit.edu/projects/cstore

Page 21: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 21

Compression methods• Dictionary

• Bit-pack– Pack several attributes inside a 4-byte word– Use as many bits as max-value

• Delta– Base value per page– Arithmetic differences

… ‘low’ …… ‘high’ …… ‘low’ …… ‘normal’ …

… 00 …… 10 …… 00 …… 01 …

Page 22: Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts

massachusetts institute of technology 22

Analysis

SizeFilevarious DB schemas

TupleWidth

MemBytesCycle memory bus speed

f # of selected attributes

I CPU work

cpdb(cycles perdisk byte)

more / fewer disks

more / fewer CPUs

CPU / disk competing traffic

parameter

what it can model