the value of merge join and hash join in microsoft sql ...hy460/pdf/sql server.pdf · sql server...

18
September 22, 1999 1 The value of merge join and hash join in Microsoft SQL Server and relational query processing Goetz Graefe Microsoft SQL Server

Upload: others

Post on 23-Jun-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 1

The value ofmerge join and hash joinin Microsoft SQL Serverand relational query processing

Goetz GraefeMicrosoft SQL Server

Page 2: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 2

Why this study?• Blasgen & Eswaran – 20 years ago

Merge join & (index) nested loops cover allcases pretty well

• DeWitt, Sacco, others – 10-15 years agoHash join is great for large unsorted inputs

• Analytical studies, simulation, experiments

Page 3: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 3

Success without merge/hash join• Sybase & Microsoft SQL Server

Until recently used only nested loopsSuccessful for over 10 years!Even used in data warehousing!

• Focus on OLTPSybase invented stored proceduresMicrosoft leads SMP TPC-C efficiency

• Focus on canned reportsPerfectly possible with tuned index sets

Page 4: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 4

Are the prior studies wrong?• Small evaluation sets

Few tables, few queries

• Insufficient credit to index tuningFixed set of indexes

• This study:Still limited yet non-trivial queries & tablesIndexes tuned using a “tuning wizard” tool

• Large set of possible indexes, integrated with query optimizer

• Next studyIndexes tuned specifically for available algorithms

Page 5: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 5

SQL Server 7.0 query processor• Nested loops with stored or temporary indexes• Merge join & hash join (incl. hash teams)• Index intersection, union, difference, & join• Star joins: star indexes, cross-product, & semi-

join reduction• Constraints exploited for selectivity estimation &

cost calculation & query simplification• Parallelism on SMPs• Content queries (“contains”, “near”, “about”)• Optimized update plans (indexes, constraints)• Heterogeneous & distributed queries

Page 6: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 6

Relevant SQL Server tools• Graphical show plan• Profiler

Captures workloads & events (e.g., deadlocks)Filters on application, database, user, operation,

elapsed time, etc.

• Index tuning wizardOptimizes a workload captured with the profilerReconsider all indexes – only add indexesIncrease / decrease database sizeUses query optimizer to assess choices

Page 7: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 7

Experimental setup• TPC-D database

scale factor = 1 (1 GB raw data)

• Old & new TPC queries22 queries total

• Flags to disableIndex join, merge join, hash join, hash teamsStream aggregation, hash aggregation

• Indexes in simple database designPrimary keys, foreign keys, dates

Page 8: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 8

Simple indexes

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Queries & Algorithms

Tim

e [%

of e

ntir

e N

L run]

NL

MJ

HJ

All

Performance with simple indexes

Page 9: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 9

Performance with simple indexes• NL=MJ >> HJ=All: #1, #15

Hashing improves performanceAggregation, not join, make the differenceEarly aggregation missing in sort code

• NL=MJ=HJ=All: #2, #13, #16, #17No really meaningful differenceIndexes are sufficient to select & retrieve rows

• NL > MJ > HJ=All: #3, #5, #7, #8, #9, #11• NL >> MJ=HJ=All: #4, #14, #19

Need some method for large unindexed inputs

Page 10: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 10

Workload performance

Workload performance

0

10

20

30

40

50

60

70

80

90

100

Entire workload

Tim

e [%

of N

L run]

NL

MJ

HJ

All

Page 11: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 11

Workload performance• Only NLJ is not competitive

Due to simplistic index design

• Hash-based query processor performs best• NLJ + MJ are very competitive

40% difference to full QP with hash joinThat’s 9 month of hardware improvements

• Presuming 2x CPU speed in 18 months

Poor indexing strongly favors hash joinBlasgen & Eswaran were right all along …?

Page 12: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 12

Tuned index setTuning wizard retains primary keys indexes• 7 indexes on line item, up to 7 columns

Total 26 columns indexes

• 4 indexes on orders, lots of redundant keys• 2 indexes on part supply

Page 13: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 13

Performance with tuned indexesTuned indexes

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Queries & Algorithms

Tim

e [%

of N

L run o

n s

imple

index

es]

NLMJ

HJAll

Page 14: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 14

Performance with tuned indexes• Overall performance improvements

Except queries 6, 12, 19Tuning wizard minimizes workload time

• Not the time for each individual query

• More queries in these patternsNL > MJ=HJ=AllNL=MJ=HJ=All

Page 15: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 15

Workload performanceEntire workload, tuned indexes

0

5

10

15

20

25

30

35

40

45

50

Tim

e [%

of N

L run o

n s

imple

index

es]

NL

MJ

HJ

All

Page 16: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 16

Workload performance• All algorithm combinations are fast

Maximal difference 45 vs. 20, or 21 months

• Either MJ or HJ serve wellHaving both adds 20% performance – 5 months

Page 17: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 17

Conclusions• Either indexing or merge / hash join• Are hash join & merge join just an excuse

for poor (non-automatic) indexes?• Next steps

Tune & analyze for specific algorithmsAnalyze bitmap operations & star joinsLook for orders of magnitude – multiple years

• Pre-computed query result – indexed views• Fully automatic indexing & tuning• Caching data & query results on desktops

Page 18: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 18

More information• www.microsoft.com/sql• Msdn.microsoft.com• Technet.microsoft.com• Research.microsoft.com