class 13 scans vs indexes - harvard...

28
scans vs indexes prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 13

Upload: others

Post on 24-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

scans vs indexesprof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 13

Page 2: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 2

1,2,3… 12,15,17 20,… …

35,50

35,…12,20 50,…

b-tree - dynamic tree - always balanced

Page 3: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 3

(secondary) index vs scan: the eternal battle

A A

select … from R where A<v and ….

Page 4: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 3

(secondary) index vs scan: the eternal battle

A A

select … from R where A<v and ….

Just having indexes in the system is or can be useless… or even bad for performance

Knowing when to use an index is key

Page 5: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 3

(secondary) index vs scan: the eternal battle

A A

select … from R where A<v and ….

Just having indexes in the system is or can be useless… or even bad for performance

Knowing when to use an index is key

Primary index vs secondary vs scan?

Page 6: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 4

design/implement numerous possible algorithms + data representations

choose the bestdata source, algorithms and path for each query

database kernel

data data data

algo

rithm

s/op

erat

ors

applications

sql

Page 7: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 5

scan

secondary index scan

Page 8: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 5

scan

secondary index scan

Page 9: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 6

a1 a2 a3 a4 a5

b1 b2 b3 b4 b5

c1 c2 c3 c4 c5

A B Ca5 a3 a2 a1 a4

Asecondary index on A

values out of order with base data

a query that select on A and then needs B

intermediate out of order

Page 10: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 6

a1 a2 a3 a4 a5

b1 b2 b3 b4 b5

c1 c2 c3 c4 c5

A B Ca5 a3 a2 a1 a4

Asecondary index on A

values out of order with base data

5 3 2 1 4

a query that select on A and then needs B

intermediate out of order

Page 11: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 6

a1 a2 a3 a4 a5

b1 b2 b3 b4 b5

c1 c2 c3 c4 c5

A B Ca5 a3 a2 a1 a4

Asecondary index on A

values out of order with base data

5 3 2 1 4

a5 a3 a2 a1 a4

A5 3 2 1 4

select2 1 4

b1 b2 b3 b4 b5

Ba query that select on A and then needs B

intermediate out of order

Page 12: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 7

covering index:contains all columns needed for a set of queries

A A B

no need to go to base data but…

Page 13: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 8

A A

random access to traverse the tree

& need to sort result

sequential access pattern but needs to

access all data

Vs.

Page 14: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 9

Page 15: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 10

the standard solution1) maintain statistics, 2) optimizer chooses access path depending on estimated selectivity

Page 16: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 10

the standard solution1) maintain statistics, 2) optimizer chooses access path depending on estimated selectivity

what is wrong with that

Page 17: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 11

Motivation

0

2

4

6

8

10

12

14

16

18

20

0 20 40 60 80 100

Exec

uti

on

tim

e (s

ec)

Tho

usa

nd

s

Result selectivity (%)

Index Scan

Full Scan

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100

Exec

uti

on

tim

e (

sec)

Result selectivity (%)

Index Scan

Full Scan

Motivation TPCH (SF10) 2/2

0.1

1

10

100

1000

Q1 Q3 Q5 Q7 Q9 Q11 Q13 Q16 Q19 Q22

Nor

mal

ized

exec

utio

n ti

me

TPC-H Query

Original

Tuned

Page 18: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 11

Motivation

0

2

4

6

8

10

12

14

16

18

20

0 20 40 60 80 100

Exec

uti

on

tim

e (s

ec)

Tho

usa

nd

s

Result selectivity (%)

Index Scan

Full Scan

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100

Exec

uti

on

tim

e (

sec)

Result selectivity (%)

Index Scan

Full Scan

Motivation TPCH (SF10) 2/2

0.1

1

10

100

1000

Q1 Q3 Q5 Q7 Q9 Q11 Q13 Q16 Q19 Q22

Nor

mal

ized

exec

utio

n ti

me

TPC-H Query

Original

Tuned

ROBUSTNESS

Page 19: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 12

All together

0

50

100

150

200

0 20 40 60 80 100

Exec

utio

n tim

e (s

ec)

Result selectivity (%)

Full ScanIndex ScanOptimizer decisionAvg. statistics collection

0

50

100

150

200

0 20 40 60 80 100Result selectivity (%)

0

50

100

150

200

0 20 40 60 80 100Result selectivity (%)

All together

0

50

100

150

200

0 20 40 60 80 100

Exec

utio

n tim

e (s

ec)

Result selectivity (%)

Full ScanIndex ScanOptimizer decisionAvg. statistics collection

0

50

100

150

200

0 20 40 60 80 100Result selectivity (%)

0

50

100

150

200

0 20 40 60 80 100Result selectivity (%)

All together

0

50

100

150

200

0 20 40 60 80 100

Exec

utio

n tim

e (s

ec)

Result selectivity (%)

Full ScanIndex ScanOptimizer decisionAvg. statistics collection

0

50

100

150

200

0 20 40 60 80 100Result selectivity (%)

0

50

100

150

200

0 20 40 60 80 100Result selectivity (%)

basic stats per column for pair

can we just recompute the statistics?

Page 20: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 13

2012, somewhere in Germany

if I keep 30 data systems researchers “trapped” in a castle for a week, we might be able to

define “robust query processing” and find a few solutions

Page 21: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 14

robust query processing (best definition to date by Goetz) graceful degradation when the environment changes 14

Page 22: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 15

Renata Borovica University of Melbourne

Campbell Fraser Google

Marcin Zukowski Snowflake

selectivity

resp

onse

tim

eindex

scan

Can we avoid bad access path selection(secondary index vs scan)

when we have stale (or no) statistics?

Page 23: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 16

select min(A) from R where B<10 and C<80

logical plan

optimizer

physical plan execution

mid query reoptimization

Page 24: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 17

SWITCH SCANwhile index probing switch to scan if cardinality > estimation

good: avoids worst case bad: performance cliff

SMOOTH SCANgoal avoid performance cliff close to optimal

Cardinality Estimate based SS

Cardinality

Co

st

Index Scan

~ TS

co

st

TS c

ost

Table Scan

𝑋 ∗ 𝐸𝐶

Smooth Scan

Switch Scan

Design smooth scan

Page 25: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 18

Page 26: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 18

Page 27: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

/24CS165, Fall 2017 Stratos Idreos 19

Extra: Efficient mid-query re-optimization of sub-optimal query execution plansNavin Kabra and David DeWitt ACM SIGMOD International Conference on Management of Data, 1998

Browse: Smooth Scan: Statistics-Oblivious Access PathsRenata Borovica, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski and Campbell Fraser IEEE International Conference on Data Engineering (ICDE), 2015

Read: Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?

Mike. Kester, Manos Athanassoulis, and Stratos Idreos ACM SIGMOD International Conference on Management of Data, 2017

Page 28: class 13 scans vs indexes - Harvard SEASdaslab.seas.harvard.edu/.../CS165Fall2017Class13BeforeClass.pdf · class 13 scans vs indexes. Title: class13 copy Created Date: 10/23/2017

DATA SYSTEMSprof. Stratos Idreos

class 13

scans vs indexes