lecture 25: query optimization
DESCRIPTION
Lecture 25: Query Optimization. Wednesday, November 26, 2003. Outline. Cost-based optimization 16.5, 16.6. Cost-based Optimizations. Main idea: apply algebraic laws, until estimated cost is minimal Practically: start from partial plans, introduce operators one by one - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/1.jpg)
1
Lecture 25:Query Optimization
Wednesday, November 26, 2003
![Page 2: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/2.jpg)
2
Outline
• Cost-based optimization 16.5, 16.6
![Page 3: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/3.jpg)
3
Cost-based Optimizations
• Main idea: apply algebraic laws, until estimated cost is minimal
• Practically: start from partial plans, introduce operators one by one– Will see in a few slides
• Problem: there are too many ways to apply the laws, hence too many (partial) plans
![Page 4: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/4.jpg)
4
Cost-based Optimizations
Approaches:
• Top-down: the partial plan is a top fragment of the logical plan
• Bottom up: the partial plan is a bottom fragment of the logical plan
![Page 5: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/5.jpg)
5
Search Strategies
• Branch-and-bound:– Remember the cheapest complete plan P seen so far
and its cost C
– Stop generating partial plans whose cost is > C
– If a cheaper complete plan is found, replace P, C
• Hill climbing:– Remember only the cheapest partial plan seen so far
• Dynamic programming:– Remember the all cheapest partial plans
![Page 6: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/6.jpg)
6
Dynamic Programming
Unit of Optimization: select-project-join
• Push selections down, pull projections up
![Page 7: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/7.jpg)
7
Join Trees
• R1 ⋈ R2 ⋈ …. ⋈ Rn• Join tree:
• A plan = a join tree• A partial plan = a subtree of a join tree
R3 R1 R2 R4
![Page 8: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/8.jpg)
8
Types of Join Trees
• Left deep:
R3 R1
R5
R2
R4
![Page 9: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/9.jpg)
9
Types of Join Trees
• Bushy:
R3
R1
R2 R4
R5
![Page 10: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/10.jpg)
10
Types of Join Trees
• Right deep:
R3
R1R5
R2 R4
![Page 11: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/11.jpg)
11
Problem
• Given: a query R1 ⋈ R2 ⋈ … ⋈ Rn
• Assume we have a function cost() that gives us the cost of every join tree
• Find the best join tree for the query
![Page 12: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/12.jpg)
12
Dynamic Programming
• Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset
• In increasing order of set cardinality:– Step 1: for {R1}, {R2}, …, {Rn}– Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn}– …– Step n: for {R1, …, Rn}
• It is a bottom-up strategy• A subset of {R1, …, Rn} is also called a subquery
![Page 13: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/13.jpg)
13
Dynamic Programming
• For each subquery Q ⊆ {R1, …, Rn} compute the following:– Size(Q)– A best plan for Q: Plan(Q)– The cost of that plan: Cost(Q)
![Page 14: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/14.jpg)
14
Dynamic Programming
• Step 1: For each {Ri} do:– Size({Ri}) = B(Ri)– Plan({Ri}) = Ri– Cost({Ri}) = (cost of scanning Ri)
![Page 15: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/15.jpg)
15
Dynamic Programming
• Step i: For each Q ⊆ {R1, …, Rn} of cardinality i do:– Compute Size(Q) (later…)– For every pair of subqueries Q’, Q’’
s.t. Q = Q’ Q’’compute cost(Plan(Q’) ⋈ Plan(Q’’))
– Cost(Q) = the smallest such cost– Plan(Q) = the corresponding plan
![Page 16: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/16.jpg)
16
Dynamic Programming
• Return Plan({R1, …, Rn})
![Page 17: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/17.jpg)
17
Dynamic Programming
To illustrate, we will make the following simplifications:
• Cost(P1 ⋈ P2) = Cost(P1) + Cost(P2) + size(intermediate result(s))
• Intermediate results:– If P1 = a join, then the size of the intermediate result is
size(P1), otherwise the size is 0– Similarly for P2
• Cost of a scan = 0
![Page 18: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/18.jpg)
18
Dynamic Programming
• Example:
• Cost(R5 ⋈ R7) = 0 (no intermediate results)
• Cost((R2 ⋈ R1) ⋈ R7) = Cost(R2 ⋈ R1) + Cost(R7) + size(R2 ⋈ R1) = size(R2 ⋈ R1)
![Page 19: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/19.jpg)
19
Dynamic Programming
• Relations: R, S, T, U• Number of tuples: 2000, 5000, 3000, 1000
• Size estimation: T(A ⋈ B) = 0.01*T(A)*T(B)
![Page 20: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/20.jpg)
20
Subquery Size Cost Plan
RS
RT
RU
ST
SU
TU
RST
RSU
RTU
STU
RSTU
![Page 21: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/21.jpg)
21
Subquery Size Cost Plan
RS 100k 0 RS
RT 60k 0 RT
RU 20k 0 RU
ST 150k 0 ST
SU 50k 0 SU
TU 30k 0 TU
RST 3M 60k (RT)S
RSU 1M 20k (RU)S
RTU 0.6M 20k (RU)T
STU 1.5M 30k (TU)S
RSTU 30M 60k+50k=110k (RT)(SU)
![Page 22: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/22.jpg)
22
Dynamic Programming
• Summary: computes optimal plans for subqueries:– Step 1: {R1}, {R2}, …, {Rn}– Step 2: {R1, R2}, {R1, R3}, …, {Rn-1, Rn}– …– Step n: {R1, …, Rn}
• We used naïve size/cost estimations• In practice:
– more realistic size/cost estimations (next time)– heuristics for Reducing the Search Space
• Restrict to left linear trees• Restrict to trees “without cartesian product”
– need more than just one plan for each subquery:• “interesting orders”
![Page 23: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/23.jpg)
23
Reducing the Search Space
• Left-linear trees v.s. Bushy trees
• Trees without cartesian product
Example: R(A,B) S(B,C) T(C,D)⋈ ⋈Plan: (R(A,B) T(C,D)) S(B,C) has a cartesian ⋈ ⋈
product – most query optimizers will not consider it
![Page 24: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/24.jpg)
24
Counting the Number of Join Orders
• The mathematics we need to know:
Given x1, x2, ..., xn, how many permutations can we construct ? Answer: n!
• Example: permutations of x1x2x3x4 are: x1x2x3x4
x2x4x3x1
.. . . (there are 4! = 4*3*2*1 = 24 permutations)
![Page 25: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/25.jpg)
25
Counting the Number of Join Orders
• The mathematics we need to know:
Given the product x0x1x2...xn, in how many ways can we place n pairs of parenthesis around them ? Answer: 1/(n+1)*C2n
n = (2n)!/((n+1)*(n!)2)
• Example: for n=3 ((x0x1)(x2x3))
((x0(x1x2))x3)
(x0((x1x2)x3))
((x0x1)x2)x3)
(x0(x1(x2x3)))• There are 6!/(4*3!*3!) = 5 ways
![Page 26: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/26.jpg)
26
Counting the Number of Join Orders (Exercise)
R0(A0,A1) R⋈ 1(A1,A2) . . . R⋈ ⋈ n(An,An+1)• The number of left linear join trees is:
• The number of left linear join trees without cartesian products is:
• The number of bushy join trees is:
• The number of bushy join trees without cartesian product is:
![Page 27: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/27.jpg)
27
Number of Subplans Inspected by Dynamic Programming
R0(A0,A1) R⋈ 1(A1,A2) . . . R⋈ ⋈ n(An,An+1)• The number of left linear subplans inspected is:
• The number of left linear subplans without cartesian products inspected is:
• The number of bushy join subplans inspected is:
• The number of bushy join subplans without cartesian product:
![Page 28: Lecture 25: Query Optimization](https://reader036.vdocuments.site/reader036/viewer/2022062422/568135d4550346895d9d3eee/html5/thumbnails/28.jpg)
28
Happy Thanksgivings !