andy pavlo

43
Andy Pavl Andy Pavl OLTP OLTP Databas Databas es es Magical Magical Parallel Parallel November 10 November 10 th th , 2011 – MIT , 2011 – MIT

Upload: butch

Post on 07-Jan-2016

113 views

Category:

Documents


1 download

DESCRIPTION

Magical Parallel. OLTP Databases. Andy Pavlo. November 10 th , 2011 – MIT CSAIL. Databases?. Evan Jones?. Lebron is going to Miami!. The McRib will be back!. Michael Jackson is in trouble!. On-Line. Transaction. Processing. Fast. Repetitive. Small. H -Store. H -Store Partitioning. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Andy Pavlo

Andy PavloAndy PavloOLTP OLTP

DatabaDatabasesses

Magical Magical ParallelParallel

November 10November 10thth, 2011 – MIT CSAIL, 2011 – MIT CSAIL

Page 2: Andy Pavlo

Databases?Evan Jones?

Page 3: Andy Pavlo

The McRib will be back!

Michael Jackson

is in trouble!

Lebron is going

to Miami!

Page 4: Andy Pavlo
Page 5: Andy Pavlo

TransTransactionaction

On-LineOn-LineProceProcessingssing

Page 6: Andy Pavlo

FastRepetitiveSmall

Page 7: Andy Pavlo

H-Store

Page 8: Andy Pavlo

ITEMITEMITEMjITEMjITEMITEMITEMITEMITEMITEM

P2

P4

DISTRICTDISTRICT

CUSTOMERCUSTOMER

ORDER_ITEMORDER_ITEM

STOCKSTOCK

ORDERSORDERS

Replicated

WAREHOUSEWAREHOUSE

P1

P1

P1

P1

P1

P1

P2

P2

P2

P2

P2

P2

P3

P3

P3

P3

P3

P3

P4

P4

P4

P4

P4

P4

P5

P5

P5

P5

P5

P5

P5

P3

P1

ITEMITEMITEMITEM

ITEMITEM ITEMITEM

ITEMITEM

Partitions

ITEMITEM

H-Store Partitioning

Tables

Page 9: Andy Pavlo
Page 10: Andy Pavlo

ClientApplication

April 20, 2023

H-Store Cluster

Procedure NameInput

Parameters

Page 11: Andy Pavlo

This transaction

will execute 4 queries on partitions 1,3, and 6!

Page 12: Andy Pavlo

Optimization #1

ClientApplication

P1

P2

P3

P4

Page 13: Andy Pavlo

P1

P2

P3

P4

Optimization #2

ClientApplication

ClientApplication

Page 14: Andy Pavlo

P1

P2

P3

P4

Optimization #2

ClientApplication

ClientApplication

Page 15: Andy Pavlo

April 20, 2023

InsertOrderInsertOrderLineUpdateStockInsertOrderLineUpdateStock

STOP LOGGINGSTOP LOGGING

Optimization #3

Page 16: Andy Pavlo

April 20, 2023

Optimization #4

Page 17: Andy Pavlo

Throughput(txn/s)

Why this Matters

Page 18: Andy Pavlo

CanadianCanadians do not s do not like like unnecessunnecessary ary surgeriessurgeries..

Pro Tip:Pro Tip:

Page 19: Andy Pavlo

On Predictive Modeling forOn Predictive Modeling forOptimizing Transaction ExecutionOptimizing Transaction Executionin Parallelin ParallelOLTP SystemsOLTP Systemsin PVLDB, vol 5. issue 2,in PVLDB, vol 5. issue 2,October 2011October 2011

Page 20: Andy Pavlo

April 20, 2023Houdini

ClientApplication

Page 21: Andy Pavlo

Use models to predict before execution.

Use models to predict before execution.

Main Idea:Main Idea:

Page 22: Andy Pavlo
Page 23: Andy Pavlo
Page 24: Andy Pavlo

Estimate the path that a transaction will take

Estimate the path that a transaction will take

Step #1:Step #1:

Page 25: Andy Pavlo

Determine which optimizations to enable.

Determine which optimizations to enable.

Step #2:Step #2:

Page 26: Andy Pavlo

Current State:

SELECT * FROM WAREHOUSE WHERE W_ID = ?SELECT * FROM WAREHOUSE WHERE W_ID = ?

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

Input Parameters:

GetWarehouse:????

Page 27: Andy Pavlo

April 20, 2023

Confidence Coefficient:

0.56

Best Partition: 0

Use Undo Logging:

Yes

Partitions Read:

{ 0 }

Partitions Written:

{ 0 }

Partitions Done:

{1, 2, 3}

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

Input Parameters:

Transaction Estimate:

+1

+1

+1

+1

+1

+1

Page 28: Andy Pavlo

1) Long/wide models.

2) Keeping models in synch.

3) Incorrect predictions.

LimitatioLimitations:ns:

Page 29: Andy Pavlo

SELECT S_QTY FROM STOCK WHERE S_W_ID = ? AND S_I_ID = ?;

SELECT S_QTY FROM STOCK WHERE S_W_ID = ? AND S_I_ID = ?;??

??

Current State:

XX

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

CheckStock:

Input Parameters:

INSERT INTO ORDERS(o_id, o_w_id)VALUES (?, ?);

INSERT INTO ORDERS(o_id, o_w_id)VALUES (?, ?);

InsertOrder:

Page 30: Andy Pavlo
Page 31: Andy Pavlo

Partition models based on input properties.

Partition models based on input properties.

Refinement:Refinement:

Page 32: Andy Pavlo

April 20, 2023

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

Page 33: Andy Pavlo

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

w_id=0i_w_id=[0,1] i_ids=[1001,1002]

SELECT S_QTY FROM STOCK WHERE S_W_ID = ? AND S_I_ID = ?

SELECT S_QTY FROM STOCK WHERE S_W_ID = ? AND S_I_ID = ?

??

CheckStock:

??

Input Parameters:

Page 34: Andy Pavlo

1) Execute txn at the best partition.

2) Only lock the partitions needed.

3) Disable undo logging if not needed.

4) Speculatively commit transactions.

Houdini:Houdini:

Page 35: Andy Pavlo

5) Estimate initial path.6) Update as transaction executes.7) Recompute if workload changes.8) Partition for better accuracy.

Houdini:Houdini:

Page 36: Andy Pavlo

EvalEvaluatiouationn

ExperimentalExperimental

Page 37: Andy Pavlo

Model Accuracy

TATP TPC-CAuctionMark94.9%95.0%90.2%

Page 38: Andy Pavlo

Estimation Overhead

TATP TPC-CAuctionMark

Page 39: Andy Pavlo

TATP TPC-CAuctionMark

(txn/s)

Throughput

+57% +126%+117%

Page 40: Andy Pavlo

Small overhead cost improves throughput.

Small overhead cost improves throughput.

Conclusion:Conclusion:

Page 41: Andy Pavlo

November 9, 2011

Page 42: Andy Pavlo

h-storehstore.cs.br

own.eduTwitter: @andy_pavlo

Page 43: Andy Pavlo

Future work