queryoptimization_siao
TRANSCRIPT
-
8/6/2019 QueryOptimization_Siao
1/24
Query Optimization
CS 157B
Ch. 14
Mien Siao
-
8/6/2019 QueryOptimization_Siao
2/24
Outline Introduction
Steps in Cost-based query optimization- Query
Flow Projection Example
Query Interaction in DBMS
Cost-based query Optimization: Algebraic
Expressions
-
8/6/2019 QueryOptimization_Siao
3/24
Introduction What is Query Optimization?
Suppose you were given a chance tovisit 15 pre-selected different citiesin Europe. The only constraint wouldbe Time
-> Would you have a plan to visitthe cities in any order?
-
8/6/2019 QueryOptimization_Siao
4/24
-
8/6/2019 QueryOptimization_Siao
5/24
Plan:
-> Place the 15 cities in different groups
based on their proximity to each other.-> Start with one group and move on tothe next group.
Important point made over here is thatyou would have visited the cities in amore organized manner, and the Timeconstraint mentioned earlier would have
been dealt with efficiently.
-
8/6/2019 QueryOptimization_Siao
6/24
-
8/6/2019 QueryOptimization_Siao
7/24
Starting with System-R, most of thecommercial DBMSs use cost-based
optimizers.
The estimation should be accurateand easy. Another important point is
the need for being logicallyconsistent because the least costplan will always be consistently low.
-
8/6/2019 QueryOptimization_Siao
8/24
Steps in a Cost-based query
optimization
1. Parsing
2. Transformation
3. Implementation
4. Plan selection based on costestimates
-
8/6/2019 QueryOptimization_Siao
9/24
Query Flow
Parser
Optimizer
CodeGenerator/Interpreter
Processor
SQL
-
8/6/2019 QueryOptimization_Siao
10/24
Query Parser Verify validity of the SQLstatement. Translate query into an internalstructure using relational calculus.
Query Optimizer Find the best expressionfrom various different algebraic expressions.Criteria used is Cheapness
Code Generator/Interpreter Make calls forthe Query processor as a result of the work doneby the optimizer.
Query Processor Execute the calls obtainedfrom the code generator.
-
8/6/2019 QueryOptimization_Siao
11/24
Cost of physical plans includes processortime and communication time. The mostimportant factor to consider is disk I/Os
because it is the most time consumingaction.
Some other costs associated are:- Operations (joins, unions,
intersections).- The order of operations.Why?
-
8/6/2019 QueryOptimization_Siao
12/24
Joins, unions, and intersections areassociative and commutative.
- Management of storage ofarguments and passing of it.
Factors mentioned above should belimited and minimized when creatingthe best physical plan.
-
8/6/2019 QueryOptimization_Siao
13/24
-
8/6/2019 QueryOptimization_Siao
14/24
We can fit 5 tuples into 1 block- 5 tuples * 190 bytes/tuple = 950 bytes
can fit into 1 block- For 20,000 tuples, we would require
4,000 blocks (20,000 / 5 tuples per block= 4,000
With a projection resulting in elimination ofcolumn c (150 bytes), we could estimatethat each tuple would decrease to 40bytes (190 150 bytes)
-
8/6/2019 QueryOptimization_Siao
15/24
Now, the new estimate will be 25 tuples in1 block.
- 25 tuples * 40 bytes/tuple = 1000 byteswill be able to fit into 1 block
- With 20,000 tuples, the new estimate is800 blocks (20,000 tuples / 25 tuples per
block = 800 blocks)
Result is reduction by a factor of 5
-
8/6/2019 QueryOptimization_Siao
16/24
Query interaction in DBMS How does a query interact with a
DBMS?
- Interactive users
- Embedded queries in programswritten in C, C++, etc.
What is the difference betweenthese two ?
-
8/6/2019 QueryOptimization_Siao
17/24
Interactive Users:
- When there is an interactive userquery, the query goes through theQuery Parser, Query Optimizer,Code Generator, and Query
Processor each time.
-
8/6/2019 QueryOptimization_Siao
18/24
-
8/6/2019 QueryOptimization_Siao
19/24
- In an embedded query, the callsgenerated by the code generator are
stored in the database. Each timethe query is reached within theprogram at run-time, the QueryProcessor invokes the stored calls in
the database.- Optimization is independent in
embedded queries.
-
8/6/2019 QueryOptimization_Siao
20/24
Cost-based query Optimization:
Algebraic ExpressionsIf we had the following query-
SELECT p.pname, d.dname
FROM Patients p, Doctors d
WHERE p.doctor = d.dname
AND d.dgender = M
-
8/6/2019 QueryOptimization_Siao
21/24
projection
filter
join
Scan (Patients) Scan (Doctors)
-
8/6/2019 QueryOptimization_Siao
22/24
Cost-based query Optimization :
Transformationprojection projection
filter join
join
Scan (Patients) Scan (Doctors) Scan(Patients) Scan(Doctors)
-
8/6/2019 QueryOptimization_Siao
23/24
Cost-based query Optimization:
Implementationprojection projection
filter hash join
natural join filter
Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors)
-
8/6/2019 QueryOptimization_Siao
24/24
Cost-based query Optimization:
Plan selection based on costsprojection projection
filter hash join
natural join filter
Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors)
Estimated Costs= 100ms
Estimated Costs= 50ms