performance of fractal-tree databases · pdf filestate of the art (algorithmic perspective):...

53
Performance of Fractal-Tree Databases Michael A. Bender

Upload: tranphuc

Post on 20-Mar-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Performance of Fractal-Tree Databases

Michael A. Bender

Page 2: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

The Problem

Problem: maintain a dynamic dictionary on disk.Motivation: file systems, databases, etc.

State of the art: • B-tree [Bayer, McCreight 72]

• cache-oblivious B-tree [Bender, Demaine, Farach-Colton 00]

• buffer tree [Arge 95]

• buffered-repository tree[Buchsbaum,Goldwasser,Venkatasubramanian,Westbrook 00]

• Bε tree [Brodal, Fagerberg 03]

• log-structured merge tree [O'Neil, Cheng, Gawlick, O'Neil 96]

• string B-tree [Ferragina, Grossi 99]

• etc, etc!

State of the practice: • B-trees + industrial-strength features

2

Page 3: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

The Problem

Problem: maintain a dynamic dictionary on disk.Motivation: file systems, databases, etc.

State of the art (algorithmic perspective): • B-tree [Bayer, McCreight 72]

• cache-oblivious B-tree [Bender, Demaine, Farach-Colton 00]

• buffer tree [Arge 95]

• buffered-repository tree[Buchsbaum,Goldwasser,Venkatasubramanian,Westbrook 00]

• Bε tree [Brodal, Fagerberg 03]

• log-structured merge tree [O'Neil, Cheng, Gawlick, O'Neil 96]

• string B-tree [Ferragina, Grossi 99]

• etc, etc!

State of the practice: • B-trees + industrial-strength features

3

Page 4: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

The Problem

Problem: maintain a dynamic dictionary on disk.Motivation: file systems, databases, etc.

State of the art (algorithmic perspective): • B-tree [Bayer, McCreight 72]

• cache-oblivious B-tree [Bender, Demaine, Farach-Colton 00]

• buffer tree [Arge 95]

• buffered-repository tree[Buchsbaum,Goldwasser,Venkatasubramanian,Westbrook 00]

• Bε tree [Brodal, Fagerberg 03]

• log-structured merge tree [O'Neil, Cheng, Gawlick, O'Neil 96]

• string B-tree [Ferragina, Grossi 99]

• etc, etc!

State of the practice: • B-trees + industrial-strength features/optimizations

4

Page 5: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Sequential inserts in B-trees have near-optimal data locality

B-trees are Fast at Sequential Inserts

5

Page 6: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Sequential inserts in B-trees have near-optimal data locality

• One disk I/O per leaf (which contains many inserts).

• Sequential disk I/O.

• Performance is disk-bandwidth limited.

B-trees are Fast at Sequential Inserts

6

These B-tree nodes reside in memory

Insertions are into this leaf node

Page 7: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

High entropy inserts (e.g., random) in B-trees have poor data locality

• Most nodes are not in main memory.

• Most insertions require a random disk I/O.

• Performance is disk-seek limited.

• ≤ 100 inserts/sec/disk (≤ 0.05% of disk bandwidth).

B-Trees Are Slow at Ad Hoc Inserts

7

These B-tree nodes reside in memory

Page 8: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

B-trees Have a Similar Story for Range Queries

Range queries in newly built B-trees have good locality

Range queries in aged B-trees have poor locality• Leaf blocks are scattered across disk.• For page-sized nodes, as low as 1% disk bandwidth.

8

Leaf nodes are scattered across disk in aged B-tree.

Page 9: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

B-trees Have a Similar Story for Range Queries

Range queries in newly built B-trees have good locality

Range queries in aged B-trees have poor locality• Leaf blocks are scattered across disk.• For page-sized nodes, as low as 1% disk bandwidth.

9

Leaf nodes are scattered across disk in aged B-tree.

Page 10: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Results

Cache-Oblivious Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson 07]

• Replacement for Traditional B-tree

• High entropy inserts/deletes run up to 100x faster

• No aging --> always fast range queries

• Streaming B-tree is cache-oblivious‣ Good data locality without memory-specific parameterization.

10

Page 11: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Results (cont)

Fractal TreeTM database• TokuDB is a storage engine for MySQL‣ A storage engine is a structure that stores on-disk data.‣ Traditionally a storage engine is a B-tree.

• MySQL is an open-source database‣ Most installations of any database

• Built in context of our startup Tokutek.

Performance• 10x-100x faster index inserts

• No aging

• Faster queries in important cases

11

File System

Database

Application Layer

SQL Processing, Query Optimization…

TokuDB for MySQL

Page 12: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Creative Fundraising for Startup

12

Page 13: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Algorithmic Performance Model

Minimize # of block transfers per operation

Disk-Access Machine (DAM) [Aggrawal, Vitter 88]

• Two-levels of memory.• Two parameters:

block-size B, memory-size M.

Cache-Oblivious Model (CO) [Frigo,

Leiserson, Prokop, Ramachandran 99]

• Parameters B and M are unknown to the algorithm or coder.

• (Of course, used in proofs.)

Memory

Disk

B

B

Page 14: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Algorithmic Performance Model

Minimize # of block transfers per operation

Disk-Access Machine (DAM) [Aggrawal, Vitter 88]

• Two-levels of memory.• Two parameters:

block-size B, memory-size M.

Cache-Oblivious Model (CO) [Frigo,

Leiserson, Prokop, Ramachandran 99]

• Parameters B and M are unknown to the algorithm or coder.

• (Of course, used in proofs.)

Memory

Disk

B

B =?

=?

Page 15: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Fractal Tree Inserts (and Deletes)

15

Example: N=1 billion, B=4096• 1 billion 128-byte rows (128 gigabytes) ‣ log2 (1 billion) = 30

• Half-megabyte blocks that hold 4096 rows each‣ log2 (4096) = 12

• B-trees require = 30/12 = 3 disk seeks (modulo swapping, but at least 1*

• Streaming B-trees require = 30/4096 = 0.007 disk seeks

B-tree Streaming B-tree

Insert O(logBN)=O( ) O( )logNlogB

logNB

logNlogB

logNB

Page 16: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Fractal Tree Inserts (and Deletes)

16

Example: N=1 billion, B=4096• 1 billion 128-byte rows (128 gigabytes) ‣ log2 (1 billion) = 30

• Half-megabyte blocks that hold 4096 rows each‣ log2 (4096) = 12

• B-trees require = 30/12 = 3 disk seeks (modulo caching, insertion pattern)

• Streaming B-trees require = 30/4096 = 0.007 disk seeks

B-tree Streaming B-tree

Insert O(logBN)=O( ) O( )logNlogB

logNB

logNlogB

logNB

Page 17: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Random Inserts into Fractal Tree (“streaming B-tree”) and B-tree (Berkeley DB)

Inserts into Prototype Fractal Tree

17

Fractal Tree

B- Tree

Page 18: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Point searches ~3.5x slower (N=230)• Searches/sec improves as more of data structure fits in

cache)

Searches in Prototype Fractal Tree

18

S B-tree

B- TreeFractal Tree

Page 19: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Asymmetry Between Inserts and Key Searches

Small specification changes affect complexity

E.g., duplicate keys• Slow: Return an error when a duplicate key is inserted‣ Hidden search

• Fast: Overwrite duplicates or maintain all versions‣ No hidden search

E.g. deletes• Return number elements deleted is slow‣ Hidden search

• Delete without feedback is fast‣ No hidden search

19

Page 20: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Asymmetry Between Inserts and Key Searches

Small specification changes affect complexity

E.g., duplicate keys• Slow: Return an error when a duplicate key is inserted‣ Hidden search

• Fast: Overwrite duplicates or maintain all versions‣ No hidden search

E.g. deletes• Slow: Return number of elements deleted‣ Hidden search

• Fast: Delete without feedback‣ No hidden search

20

Page 21: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Asymmetry Between Inserts and Key Searches

Small specification changes affect complexity

E.g., duplicate keys• Slow: Return an error when a duplicate key is inserted‣ Hidden search

• Fast: Overwrite duplicates or maintain all versions‣ No hidden search

E.g. deletes• Slow: Return number of elements deleted‣ Hidden search

• Fast: Delete without feedback‣ No hidden search

Next slide: extra difficulty of key searches

21

Page 22: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Extra Difficulty of Key Searches

Page 23: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Asymmetry Between Inserts and Key Searches

Inserts/point query asymmetry has impact on• System design. How to redesign standard mechanisms

(e.g., concurrency-control mechanism).

• System use. How to take advantage of faster inserts (e.g., to enable faster queries).

23

Page 24: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Overview of Talk

24

Page 25: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Overview

External-memory dictionaries

Performance limitations of B-trees

Fractal-Tree data structure (Streaming B-tree)

Search/point-query asymmetry

Impact of search/point-query asymmetry on database use

How to build a streaming B-tree

Impact of search/point-query asymmetry on system design

Scaling into the future

25

Page 26: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Search/point-query asymmetry affecting database use

Page 27: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

key value

a b c d e

key value

a b c d e

select d where 270 ≤ a ≤ 538

Select via Index

select d where 270 ≤ e ≤ 538

Select via Table Scan

How B-trees Are Used in Databases

Data maintained in rows and stored in B-trees.

Page 28: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

key value

a b c d e

key value

a b c d e

select d where 270 ≤ a ≤ 538

Select via Index

select d where 270 ≤ e ≤ 538

Select via Table Scan

How B-trees Are Used in Databases

Data maintained in rows and stored in B-trees.

Page 29: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases29

select d where 270 ≤ b ≤ 538

key value

a b c d e

key value

b a

key value

c a

Selecting via an index can be slow, if it is coupled with point queries.

How B-trees Are Used in Databases (Cont.)

main table index

Page 30: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Covering index can speed up selects • Key contains all columns necessary to answer query.

30

key

a

value

b c d e

key value

bd a

key value

c a

select d where 270 ≤ b ≤ 538

But coverirock.

How B-trees Are Used in Databases (Cont.)

main table covering index

Page 31: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Insertion Pain Can Masquerade as Query Pain

People often don’t use these indexes.They use simplistic schema.

• Sequential inserts via autoincrement key

• Few indexes, few covering indexes

Then insertions are fast but queries are slow.

31

key

t

value

a b c d eAutoincrement key

(effectively a timestanp)

Page 32: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Insertion Pain Can Masquerade as Query Pain

People often don’t use these indexes.They use simplistic schema.

• Sequential inserts via autoincrement key

• Few indexes, few covering indexes

Then insertions are fast but queries are slow.

Adding sophisticated indexes helps queries • B-trees cannot afford to maintain them.

Fractal Trees can.

32

key

t

value

a b c d eAutoincrement key

(effectively a timestanp)

Page 33: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

How to Build a Fractal Tree and How it Performs

Page 34: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Simplified (Cache-Oblivious) Fractal Tree

O((logN)/B) insert cost & O(log2N) search cost • Sorted arrays of exponentially increasing size.

• Arrays are completely full or completely empty(depends on the bit representation of # of elmts).

• Insert into the smallest array. Merge arrays to make room.

34

20 21 22 23

Page 35: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Simplified (Cache-Oblivious) Fractal Tree (Cont.)

35

Page 36: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Analysis of Simplified Fractal Tree

Insert Cost:• cost to flush buffer of size X = O(X/B)

• cost per element to flush buffer = O(1/B)

• max # of times each element is flushed = log N

• insert cost = O((log N))/B) amortized memory transfers

Search Cost• Binary search at each level

• log(N/B) + log(N/B) - 1 + log(N/B) - 2 + ... + 2 + 1 = O(log2(N/B))

36

Page 37: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Idea of Faster Key Searches in Fractal Tree

O(log (N/B)) search cost • Some redundancy of elements between levels

• Arrays can be partially full

• Horizontal and vertical pointers to redundant elements

• (Fractional Cascading)

37

Page 38: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Why The Previous Data Structure is a Simplification

• Need concurrency-control mechanisms

• Need crash safety

• Need transactions, logging+recovery

• Need better search cost

• Need to store variable-size elements

• Need better amortization

• Need to be good for random and sequential inserts

• Need to support multithreading.

• Need compression

38

Page 39: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

iiBench Insertion Benchmark

Fractal Trees scale with disk bandwidth not seek time. • In fact, now we are compute bound, so cannot yet take full advantage of more

cores or disks. (This will change.)

39

0!

5,000!

10,000!

15,000!

20,000!

25,000!

30,000!

35,000!

40,000!

45,000!

50,000!

0! 200,000,000! 400,000,000! 600,000,000! 800,000,000! 1,000,000,000!

Ro

ws/S

ec

on

d!

Rows Inserted!

iiBench - 1B Row Insert Test!

InnoDB!

TokuDB!

Page 40: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

iiBench Deletions

40

0!

5,000!

10,000!

15,000!

20,000!

25,000!

30,000!

35,000!

40,000!

0! 100,000,000! 200,000,000! 300,000,000! 400,000,000! 500,000,000!

Ro

ws/S

ec

on

d!

Rows Inserted!

iiBench - 500M Row Insert/Delete Test!

TokuDB!

InnoDB!

Insertions only here Insertions + deletions here

Page 41: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Search/point query asymmetry when building Fractal-Tree Database

41

Page 42: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Building TokuDB Storage Engine for MySQL

Engineering to do list• Need concurrency-control mechanisms

• Need crash safety

• Need transactions, logging+recovery

• Need better search cost

• Need to store variable-size elements

• Need better amortization

• Need to be good for random and sequential inserts

• Need to support multithreading.

• Need compression

42

Page 43: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Building TokuDB Storage Engine for MySQL

Engineering to do list• Need concurrency-control mechanisms

• Need crash safety

• Need transactions, logging+recovery

• Need better search cost

• Need to store variable-size elements

• Need better amortization

• Need to be good for random and sequential inserts

• Need to support multithreading.

• Need compression

43

Page 44: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Concurrency Control for Transactions

Transactions• Sequence of durable operations.• Happen atomically.

Atomicity in TokuDB via pessimistic locking• readers lock: A and B can both read row x of database.• writers lock: if A writes to row x, B cannot read x until A

completes.

44

A DB E

C

A DE CB

Page 45: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

B-tree implementation: maintain locks in leaves• Insert row t• Search for row u• Search for row v and put a cursor • Increment cursor. Now cursor points to row w.

Doesn’t work for Fractal Trees: maintaining locks involves implicit searches on writes.

45

v w t

writer lock

u

reader lock reader range lock

Concurrency Control for Transactions (cont)

Page 46: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Scaling Fractal Trees into the Future

Page 47: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

iiBench on SSD

B-trees are slow on SSDs, probably b/c they waste bandwidth. • When inserting one row, a whole block (much larger) is written.

47

0

5000

10000

15000

20000

25000

30000

35000

0 5e+07 1e+08 1.5e+08

Insert

ion R

ate

Cummulative Insertions

RAID10X25-EFusionIO

InnoDB

TokuDB

RAID10

X25E

FusionIO

Page 48: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

B-tree Inserts Are Slow on SSDs

Inserting an element of size x into a B-tree dirties a leaf block of size B.

We can write keys of size x into a B-tree using at most a O(x/B) fraction of disk bandwidth.

Fractal trees do efficient inserts into SSDs because they transform random I/O into sequential I/O.

48

Bx

Page 49: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

B-tree Inserts Are Slow on SSDs

Inserting an element of size x into a B-tree dirties a leaf block of size B.

We can write keys of size x into a B-tree using at most a O(x/B) fraction of disk bandwidth.

Fractal trees do efficient inserts on SSDs because they transform random I/O into sequential I/O.

49

Bx

Page 50: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Disk Hardware Trends

Disk capacity will continue to grow quickly

but seek times will change slowly.

• Bandwidth scales as square root of capacity.

50

Year Capacity Bandwidth

2008 2 TB 100MB/s

2012 4.5 TB 150MB/s

2017 67 TB 500MB/s

Source: http://blocksandfiles.com/article/4501

Page 51: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Fractal Trees Enable Compact Systems

B-trees require capacity, bandwidth, and random I/O

• B-tree based systems achieve large random I/O rates by using more spindles and lower capacity disks.

Fractal Trees require only capacity & bandwidth• Fractal Trees enable the use of high-capacity disks.

51

Page 52: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Fractal Trees Enable Big Disks

B-trees require capacity, bandwidth, and seeks.

Fractal trees require only capacity and bandwidth.

Today, for a 50TB database,• Fractal tree with 25 2TB disks gives 500K ins/s.• B-tree with 25 2TB disks gives 2.5K ins/s.• B-tree with 500 100GB disks gives 50K ins/s but costs $, racks, and

power.

In 2017, for a 1500TB database:• Fractal tree with 25 67TB disks gives 2500K ins/s.• B-tree with 25 67TB disks gives 2.5K ins/s.

B-trees need spindles, and spindle density increases slowly.

52

Page 53: Performance of Fractal-Tree Databases · PDF fileState of the art (algorithmic perspective): ... (modulo swapping, ... Performance of Fractal-Tree Databases. B

Michael Bender -- Performance of Fractal-Tree Databases

Using Big Disks Also Saves Energy

Power consumption of disks• Enterprise 80 to 160 GB disk runs at 4W (idle power).

• Enterprise 1-2 TB disk runs at 8W (idle power).

Data centers/server farms use 80-160 GB disks• Use many small-capacity disks, not large ones.

Using large disks may save factor >10 in Storage Costs

• Other considerations modify this factor‣ e.g., CPUs necessary to drive disks, scale-out infrastructure, cooling, etc. ‣ Metric: e.g., Watts/MB versus Inserts/Joule

53

Power Management in High-Density Data Centers

Michael Bender

2