data management on non-volatile memoryjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68...

79
DATA MANAGEMENT ON NON-VOLATILE MEMORY JOY ARULRAJ Carnegie Mellon University

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

1

DATA MANAGEMENT ONNON-VOLATILE MEMORYJOY ARULRAJCarnegie Mellon University

Page 2: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

2

1960 1980 2000 2020

DRAM NON-VOLATILEMEMORY

FLASHMEMORY

EVOLUTION OF MEMORY TECHNOLOGY

1970 1990 2010

Page 3: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

3

NON-VOLATILE MEMORY [NVM]

PERFORMANCE

DURABILITY

FAST

SLOW

VOLATILE NON-VOLATILE

SSD

NVMDRAM

Page 4: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

4

DEVICE CHARACTERISTICS

CHARACTERISTIC DRAM NVM SSDDevice Latency 1x 10x 1000xByte-AddressabilityDurability

Cost/GB 100x 10x 1xHigh Capacity

Page 5: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

5

50 YEARS OF DATABASE SYSTEMS RESEARCH

SEPARATEMEMORY & STORAGE

DRAM

SSD SSD

CPU

SSD

COMPUTE/STORAGEBALANCE

RANDOM VS. SEQUENTIAL

21 3

Page 6: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

6

RESEARCH OVERVIEW

• How to manage data on NVM?– Challenging because of NVM’s unique characteristics– Important given the sudden shift in compute/storage balance

DATABASE SYSTEM

NVM

Page 7: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

7

#1: INDUSTRY STANDARDS

• Standardization of NVM technologies– Design standards– Interface specifications

JUNE 2016

Page 8: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

8

#2: OPERATING SYSTEM SUPPORT

• Major operating systems natively support NVM– Linux 4.8– Windows 10

AUGUST 2015

Page 9: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

9

#3: ARCHITECTURAL SUPPORT

• New assembly instructions in ISA updates– Efficiently flush data from volatile CPU cache to NVM– Kaby Lake processor

MAY 2016

Page 10: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

10

HOW DO TODAY’SDATABASE SYSTEMS PERFORM

ON NON-VOLATILE MEMORY?

Page 11: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

11

NVM HARDWARE EMULATOR [INTEL]

• Emulates a wide range of NVM technologies– Special CPU microcode– Supports recently added assembly instructions

STORECACHE LINE

WRITE-BACK

CPU CACHE NVM

L1 CACHE

L2 CACHE

Page 12: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

12

TODAY’S DATABASE SYSTEMS ON NVM

• Database System: MySQL• Storage device performance– NVM’s performance compared to that of disk– I/O benchmark

• Database system performance– On NVM compared to that on disk– TPC-C benchmark

Page 13: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

13

STORAGE HIERARCHIES

NVMDRAM

SSD

DISK-BASED HIERARCHY NVM-BASED HIERARCHY

BOTH MEMORY& STORAGE

21

Page 14: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

14

I/OThroughput(Operations/sec

/thread)

0

100,000

200,000

300,000

STORAGE DEVICE PERFORMANCE

>100x

Higher isBetter

278971

2541

SOLID-STATE DISK

NON-VOLATILEMEMORY

Page 15: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

15

TransactionalThroughput

(Transactions/sec)

0

3,000

6,000

9,000

DATABASE SYSTEM PERFORMANCE

Higher isBetter

Disk-centric designhurts performance on NVM

SOLID-STATE DISK

NON-VOLATILEMEMORY

1750

5510

3x

Page 16: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

16

INDEXING(VLDB’18)

STORAGEMANAGEMENT

(SIGMOD’15)LOGGING AND RECOVERY(VLDB’17)

QUERYPROCESSING

(SIGMOD’16)

LAYERS OF A DATABASE SYSTEM

PELOTON NVM DATABASE SYSTEM

Page 17: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

17

WRITE-BEHINDLOGGING

BZTREEINDEX

FUTUREDIRECTIONS

WRITE-BEHINDLOGGING

BZTREEINDEX

FUTUREDIRECTIONS

Page 18: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

18

LOGGING & RECOVERY: MOTIVATION

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

DATABASE SYSTEM

SHOPPINGAPPLICATION

DURABILITY

FAULT-TOLERANCE

ATOMICITY

ALL-OR-NOTHING

Page 19: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

19

WRITE-AHEAD LOGGING: DURABILITYBUFFER POOL

DATABASE

ORDER

STOCK

ORDER

STOCK

LOG

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

DRAM

SSD

1

ORDER

STOCK

23

Page 20: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

20

WRITE-AHEAD LOGGING: ATOMICITYLOG

TRANSACTION #1 – BEGIN

TRANSACTION #1 – ADD ORDER

SYSTEM CRASH

TRANSACTION #2 – BEGIN

TRANSACTION #1 – UPDATE STOCK

TRANSACTION #3 – BEGIN

TRANSACTION #3 – ADD ORDER

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

Page 21: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

21

WRITE-AHEAD LOGGING: RECOVERY PROTOCOL

ACTIVE TRANSACTION TABLETXN ID STATUS LATEST CHANGETXN #2 RUNNING LOG RECORD #7TXN #3 RUNNING ---

DIRTY PAGE TABLEPAGE ID CHANGE THAT DIRTIED PAGE

PAGE #30 LOG RECORD #5PAGE #40 LOG RECORD #7

BUFFER POOL

DATABASELOG

LOG ANALYSIS1 REDO CHANGES2 UNDO CHANGES3

DRAM

SSD

Page 22: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

22

PROBLEM #1: DATA DUPLICATION

STORAGE COST

PERFORMANCE

BUFFER POOL

DATABASELOG

NVM

DATA

DATA

DATA

23

1

Page 23: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

23

PROBLEM #2: SLOW RECOVERYBUFFER POOL

DATABASELOG

AVAILABILITY

LINEAR-TIME RECOVERY

NEEDS TO REDO LOG

NVM

Page 24: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

24

24

HOW TO IMPROVEPERFORMANCE AND AVAILABILTY

ON NON-VOLATILE MEMORY?

WRITE-BEHIND LOGGINGVLDB 2017

Page 25: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

25

WRITE-BEHIND LOGGING: OVERVIEW

• NVM-centric design– Improves availability by enabling instant recovery– Provides same guarantees as write-ahead logging

• Key techniques– Directly propagate changes to the database– Only record meta-data in log

Page 26: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

26

STOCK STOCK’ STOCK’’DATAVERSION

0 100 200TRANSACTIONTIMESTAMP

DATA VERSIONING USING TIMESTAMPS

DATABASE

STOCKSTOCK’

STOCK’’

Page 27: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

27

WRITE-BEHIND LOGGING: DURABILITYBUFFER POOL

DATABASE

ORDER ORDER

LOG

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

NVM

STOCK’

STOCK

ORDER

STOCK’’

1

L1 CACHE

L2 CACHE

Page 28: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

28

<10, 20>

WRITE-BEHIND LOGGING: ATOMICITY

<10, 20>

TIMESTAMP INTERVAL FOR EACH TRANSACTION BATCH

DATABASE

DATABASE SYSTEM

LOG

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

BEGIN TRANSACTIONAdd OrderUpdate Stock COMMIT TRANSACTION

ORDER ORDER

STOCK’

STOCK

ORDER

STOCK’’NVM

Page 29: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

29

WRITE-BEHIND LOGGING: RECOVERY PROTOCOL

BATCH 1TRANSACTION

BATCH

—FAILED

TIMESTAMPINTERVAL

DATABASE

10

<10, 20>

<20, 30>

<30, 40>

<40, 50>

<50, 60>

BATCH 2

BATCH 3

BATCH 4

<20, 30>

BATCH 5

15

2025

35

10

1535

LOGNVM

<20, 30>

Page 30: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

30

SOLUTION #1: NO DATA DUPLICATION

DATA META-DATA

DATABASE

ORDER

STOCK

<10, 20>

NVM

LOG

1

STORAGE COSTPERFORMANCE

Page 31: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

31

SOLUTION #2: INSTANT RECOVERY

REDO2ANALYSIS1 UNDO3

ANALYSIS1

WRITE-AHEAD LOGGING Linear-Time Recovery

WRITE-BEHIND LOGGING Constant-Time Recovery

AVAILABILITY

Page 32: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

32

WRITE-BEHIND LOGGING

• Enables instant recovery from failures • Eliminates data duplication• Generalizes to single-versioned database systems• Supports a multi-tier storage hierarchy• Handles long lived transactions• Copes with failures during recovery

Page 33: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

33

EVALUATION

• Logging Protocols: Write-Behind vs. Write-Ahead Logging– Recovery Time– Database System Performance

• Workload: TPC-C benchmark on Peloton• Storage devices– Solid-state disk– Non-volatile memory

Page 34: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

34

RECOVERY TIME

0

100

200

300

SOLID-STATE DISK NON-VOLATILE MEMORY

WRITE-BEHIND LOGGINGWRITE-AHEAD LOGGING

160x250x

RecoveryTime (sec)

Lower isBetter

260

481.7 0.3

Page 35: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

35

DATABASE SYSTEM PERFORMANCE

0

40,000

80,000

120,000

SOLID-STATE DISK NON-VOLATILE MEMORY

2x

5x8K 41K

1.7K

88KWrite-behind logging only

works well on NVM

WRITE-BEHIND LOGGINGWRITE-AHEAD LOGGING

TransactionalThroughput

(Transactions/sec)

Higher isBetter

Page 36: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

36

WRITE-BEHIND LOGGING: SUMMARY

Advances the state of the art by shifting the complexity class of the recovery protocol on NVM

STORAGE COSTPERFORMANCEAVAILABILITY

Page 37: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

37

PELOTON NVM DATABASE SYSTEM

INDEXING(VLDB’18)

STORAGEMANAGEMENT

(SIGMOD’ 15)LOGGING AND RECOVERY(VLDB’17)

QUERYPROCESSING

(SIGMOD’16)

LAYERS OF A DATABASE SYSTEM

Page 38: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

38

WRITE-BEHINDLOGGING

BZTREEINDEX

FUTUREDIRECTIONS

WRITE-BEHINDLOGGING

BZTREEINDEX

FUTUREDIRECTIONS

Page 39: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

39

INDEXING DATA: MOTIVATION

BEGIN TRANSACTIONUpdate Stock By Stock IDCOMMIT TRANSACTION

DATABASE SYSTEM

SHOPPINGAPPLICATION

1010

STOCK INDEX

STOCK ID > 5 AND < 15

5

1 5

15

10 15

15

10

Page 40: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

40

SYNCHRONIZATION WITH LOCKS

BEGIN TRANSACTIONUpdate Stock by Stock IDCOMMIT TRANSACTION

BEGIN TRANSACTIONUpdate Stock by Stock IDCOMMIT TRANSACTION

BEGIN TRANSACTIONUpdate Stock by Stock IDCOMMIT TRANSACTION

1010

5

1 5

15

10 15

15

105

10

5

1

10

5

Page 41: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

41

BWTREE: LOCK-FREE B+TREE [MICROSOFT]

10

5

1 5

15

10 151051

SINGLE-WORD COMPARE-AND-SWAP

INSTRUCTION CPU

Page 42: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

42

BWTREE: DURABILITY & ATOMICITY

1

3

BUFFER POOL

LOGINDEX

2

BEGIN TRANSACTIONUpdate Stock by Stock IDCOMMIT TRANSACTION

INDEX-SPECIFICLOGGING &RECOVERY

DRAM

SSD

Page 43: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

43

PROBLEM #1: HIGH CODE COMPLEXITY

S

AB BA

S’

SPLITTING A NODE

1

3

2

4

SINGLE-WORDCOMPARE-AND-SWAP

INSTRUCTION CPU

LOCK-FREEDOM

INTERMEDIATE STATES

Page 44: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

44

PROBLEM #2: INDEX-SPECIFIC PROTOCOL

1

3

BUFFER POOL

INDEX

2DURABILITY &

ATOMICITY

INDEX-SPECIFICLOGGING &RECOVERY

LOG

NVM

Page 45: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

45

45

HOW TO SIMPLIFY PROGRAMMING ON NON-VOLATILE MEMORY?

BZTREE: A HIGH-PERFORMANCE LATCH-FREE INDEX FOR NON-VOLATILE MEMORYVLDB 2018

Page 46: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

46

BZTREE: OVERVIEW

• NVM-centric design– Uses a new software primitive to simplify programming– Provides same guarantees as disk-centric BwTree

• BzTree supersedes BwTree – But, we skipped BxTree & ByTree– Because we think it’s the “last” index you will ever need!

• Key techniques– Offload programming complexity to the software primitive– Adopt a simpler NVM-centric architecture

Page 47: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

47

NVM-CENTRIC SOFTWARE PRIMITIVE

VOLATILESINGLE-WORD

COMPARE-AND-SWAP

1

3

2

HARDWARE PRIMITIVE

1

SOFTWARE PRIMITIVE

1

1NVMDRAM

PERSISTENTMULTI-WORD

COMPARE-AND-SWAP

Page 48: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

48

BZTREE: NVM-CENTRIC ARCHITECTURE

1

BUFFER POOL

INDEX

NVMLOG

BEGIN TRANSACTIONUpdate Stock by Stock IDCOMMIT TRANSACTION

BEGIN TRANSACTIONUpdate Stock by Stock IDCOMMIT TRANSACTION

BEGIN TRANSACTIONUpdate Stock by Stock IDCOMMIT TRANSACTION

SOFTWARE PRIMITIVE

L1 CACHE

L2 CACHE

Page 49: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

49

BZTREE: DURABILITY AND ATOMICITY

OPERATION TABLELOCATION EXPECTED OLD VALUE NEW VALUE FLUSHED

NVM0x300 OLD PARENT POINTER NEW PARENT POINTER 0

0x100 OLD CHILD POINTER NEW CHILD POINTER 1

0x200 OLD NODE STATUS NEW NODE STATUS 1

SOFTWARE PRIMITIVE

1

1

1

PERSISTENTMULTI-WORD

COMPARE-AND-SWAP

Page 50: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

50

SOLUTION #1: LOW CODE COMPLEXITY

S

AB BA

S’

SPLITTING A NODE

1

1

1

1EXPONENTIALLY

FEWERINTERMEDIATE

STATES

SOFTWARE PRIMITIVE

Page 51: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

51

SOLUTION #2: NO INDEX-SPECIFIC PROTOCOL

LOCATION OLD VALUE NEW VALUE FLUSHED

0x100 OLD CHILD POINTER NEW CHILD POINTER 1

0x200 OLD NODE STATUS NEW NODE STATUS 1

0x300 OLD PARENT POINTER NEW PARENT POINTER 0

1

INDEX

NVM

DURABILITY & ATOMICITY

SOFTWARE PRIMITIVE

Page 52: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

52

EVALUATION

• Index data structures: BzTree vs. BwTree index– Code complexity– Index Performance

• Benchmark: Yahoo Cloud Serving benchmark– Read-mostly workload– Balanced workload

• Storage devices– Non-volatile memory (BzTree only works on NVM)

Page 53: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

53

CODE COMPLEXITY [NODE SPLIT PROTOCOL]

CODE COMPLEXITY METRIC BWTREE BZTREE

Lower isBetter

2x4xLINES OF CODE 750 200

FEWER INTERMEDIATESTATES

NO INDEX-SPECIFICPROTOCOL

CYCLOMATIC COMPLEXITY 12 7

1 2

Page 54: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

54

INDEX PERFORMANCE

NVM-CENTRIC BZTREEDISK-CENTRIC BWTREE

Throughput(M Operations/sec)

Higher isBetter

0

30

60

90

READ-MOSTLY WORKLOAD

BALANCED WORKLOAD

4x2x

27M 7M45M

31M

In addition to simplifying programming, BzTree also delivers better performance

Page 55: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

55

BZTREE: SUMMARY

Advances the state of the art by illustrating a simpler way to design data structures for NVM

STORAGE COSTPERFORMANCEAVAILABILITY

Page 56: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

56

56

OTHER RESEARCH PROJECTS

Page 57: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

57

57

CURRENT RESEARCH AGENDA

AREA #1:NON-VOLATILE MEMORY

DATABASE SYSTEMS

AREA #2:SELF-DRIVING

DATABASE SYSTEMS

Page 58: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

58

AREA #1: NVM DATABASE SYSTEM

INDEXING(VLDB’18)

STORAGEMANAGEMENT

(SIGMOD’15)LOGGING AND RECOVERY(VLDB’17)

QUERYPROCESSING

(SIGMOD’16)

LAYERS OF A DATABASE SYSTEM

Page 59: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

59

AREA #2: SELF-DRIVING DATABASE SYSTEM

INDEX TUNING(UNDER REVIEW)

STORAGE LAYOUT TUNING

(SIGMOD’16)SELF-DRIVINGDESIGN(CIDR’17)

SELF-DRIVINGDATABASE SYSTEM

Page 60: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

60

WRITE-BEHINDLOGGING

BZTREEINDEX

FUTUREDIRECTIONS

WRITE-BEHINDLOGGING

BZTREEINDEX

FUTUREDIRECTIONS

Page 61: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

61

• Storage management tailored for a multi-tier hierarchy– Industry collaboration: Intel Labs, Samsung Research

CROSS-MEDIA STORAGE MANAGEMENT

DRAM SSDCPU NVM

Page 62: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

62

DECLARATIVE HARDWARE MANAGEMENT

DATABASE SYSTEM

MACHINE LEARNING

SYSTEM

• Most hardware-centric optimizations are system-specific

NVM TPU

This approach does not scale!

Page 63: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

63

DECLARATIVE HARDWARE MANAGEMENT

DECLARATIVEHARDWARE MANAGER

DECLARATIVEREQUESTS

HARDWARE-CENTRICMECHANISMS

DATABASE SYSTEM

MACHINE LEARNING

SYSTEM

NVM TPU

Page 64: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

64

CONCLUSION

• Non-volatile memory invalidates age-old design assumptions• Presented the design of a new NVM-centric database system• Broader impact on other types of data processing systems

BZTREEINDEX

WRITE-BEHINDLOGGING

Page 65: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

65

http://pelotondb.io

INDEXING(VLDB’18)

STORAGEMANAGEMENT

(SIGMOD’15) LOGGING AND RECOVERY(VLDB’17)

QUERYPROCESSING

(SIGMOD’16)

Page 66: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

66

66

BACKUP SLIDES

Page 67: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

67

DECLARATIVE STORAGE MANAGEMENT

DECLARATIVEHARDWARE MANAGER

DECLARATIVESTORAGE ALGEBRA

STORAGE-CENTRICMECHANISMS

DATABASE SYSTEM

MACHINE LEARNINGSYSTEM

• DATA LAYOUT• DATA MIGRATION• DATA ORDERING• DATA DISTRIBUTION• DATA PACKING

NVMDRAM SSD

Page 68: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

68

WRITE-BEHIND LOGGING: RELATED WORK

• NVM-centric logging, but only support linear-time recovery– NVM Group Commit [VLDB’13], Passive Group Commit [VLDB’14]

• NVM-centric logging with non-commodity hardware features– MARS [SOSP’13], BPFS [SOSP’09]

68

Write-behind logging enables constant-time recovery using only commodity hardware features

Page 69: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

69

BZTREE: RELATED WORK

• NVM-centric indexing, but with index-specific recovery logic– FP-Tree [SIGMOD’16], NV-Tree [FAST’15]

• DRAM-centric indexing– ART [ICDE’13], MassTree [Eurosys’12]

69

BzTree illustrates a simpler way to design persistent data structures by obviating the need for index-specific recovery

Page 70: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

70

RESEARCH IMPACT

• Research groups are shaping their systems for NVM– Oracle, SAP HANA

• Byte-addressable NVM is still not commercially available– Intel shipped block-addressable NVM in 2017– Intel plans to ship byte-addressable NVM in 2019– Peloton will be the only open-source database system ready for NVM

70

Page 71: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

71

SHADOW PAGING71

RECOVERY PROTOCOLCHARACTERISTIC

SHADOWPAGING

WRITE-BEHINDLOGGING

PROTOCOL TYPE NO REDO/ NO UNDO NO REDO/ UNDOCOMMIT OVERHEAD HIGH LOWSYSTEM INTEGRATION COMPLEX SIMPLECONCURRENCY SUPPORT NEED LOGGING YES

Page 72: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

72

VISTA RECOVERABLE MEMORY72

SYSTEMCHARACTERISTIC

RECOVERABLEMEMORY

WRITE-BEHINDLOGGING

UNDO MECHANISM PHYSICAL UNDO LOGICAL UNDOCONCURRENCY SUPPORT NO YESDBMS INTEGRATION COMPLEX SIMPLE

Page 73: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

73

THE LOG IS THE DATABASE73

SYSTEMCHARACTERISTIC

AMAZONAURORA

WRITE-BEHINDLOGGING

PROTOCOL TYPE REDO/ UNDO NO REDO/ UNDOMATERIALIZATION YES NOREAD OVERHEAD HIGH LOW

Page 74: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

74

BATTERY-BACKED DRAM

• Available only in specialized environments– General-purpose database systems are not designed to leverage it

• Limitations of battery-backed DRAM– Physical form factor, Availability, Reliability, Cost

74

DRAM

Page 75: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

75

MEDIA FAILURE

• Write-behind logging focuses on software failures– Transaction failures, System failures– Software failures outnumber media failures 10-to-1

• Media failure– Replicating data to another machine’s non-volatile memory

75

NVM-1 NVM-2

REPLICATION

Page 76: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

76

SECURITY/PROTECTION

• Virtual memory protection mechanism– All accesses should go through the TLB– Using write-permission bits in the page table

76

PAGE PAGE PAGE

ENABLE WRITES

DISABLE WRITES

Page 77: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

77

TRANSACTIONAL MEMORY77

SYSTEMCHARACTERISTIC

SOFTWARETRANSACTIONAL

MEMORY

HARDWARETRANSACTIONAL

MEMORY

PERSISTENTMULTI-WORD

CAS

DURABILITY YES NO YES

PERFORMANCE HEAVYWEIGHT LIGHTWEIGHT LIGHTWEIGHT

FALSE ABORTS NO YES NO

Page 78: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

78

READ-COPY-UPDATE78

SYNCHRONIZATIONPRIMITIVECHARACTERISTIC

READ-COPY-

UPDATE

PERSISTENTMULTI-WORD

CAS

MULTI-LOCATION NO YES

DURABILITY NO YES

WRITERS USE LOCKS YES NO

Page 79: DATA MANAGEMENT ON NON-VOLATILE MEMORYjarulraj/talks/2018.job.talk.pdf · 2018-09-25 · 68 WRITE-BEHIND LOGGING: RELATED WORK •NVM-centric logging, but only support linear-time

79

CANDIDATE NVM TECHNOLOGIES79

FLASH-BACKEDDRAM

SPIN-TRANSFERTORQUE MRAM

MEMRISTOR

PHASE-CHANGEMEMORY

2D & 3DFLASH

Faster thanDRAM

Slower thanDRAM