query processing-and-optimization
DESCRIPTION
Query Processing and OptimizationTRANSCRIPT
![Page 1: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/1.jpg)
Query Processing and Optimization
![Page 2: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/2.jpg)
Basic Concepts
2
• Query Processing – activities involved in retrieving data from the database:– SQL query translation into low-level language
implementing relational algebra – Query execution
• Query Optimization – selection of an efficient query execution plan
![Page 3: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/3.jpg)
Phases of Query Processing
3
![Page 4: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/4.jpg)
Relational Algebra
• Relational algebra defines basic operations on relation instances
• Results of operations are also relation instances
4
![Page 5: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/5.jpg)
Basic Operations
• Unary algebra operations:– Selection– Projection
• Binary algebra operations:– Union– Set difference– Cross-product
5
![Page 6: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/6.jpg)
Additional Operations
• Can be expressed through 5 basic operations:– Join– Intersection– Division
6
![Page 7: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/7.jpg)
Selectioncriterion(I)
where criterion – selection condition, and I- an instance of a relation.
• Result: – the same schema– A subset of tuples from the instance I
• Criterion: conjunction (AND) and disjunction (OR)
• Comparison operators: <,<=,=,,>=,>7
![Page 8: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/8.jpg)
Projection• Vertical subset of input relation instance• The schema of the result :– is determined by the list of desired fields– types of fields are inherited
a1,a2,…,am(I),
where a1,a2,…,am – desired fields from the relation with the instance I
8
![Page 9: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/9.jpg)
Binary Operations• Union-compatible relations:– The same number of fields– Corresponding fields have the same domains
• Union of 2 relations• Intersection of 2 relations• Set-difference• Cross-product – does not require union-
compatibility
Marina G. Erechtchoukova 9
![Page 10: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/10.jpg)
Joins• Join is defined as cross-product followed by
selections• Based on the conditions, joins are classified:– Theta-joins– Natural joins– Other…
10
![Page 11: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/11.jpg)
Theta Join
RCond S = Cond(R x S)
Where Cond – refers to the attributes of both relations R and S in the form of comparison expressions with operators:
<,<=,=,,>=,>
11
![Page 12: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/12.jpg)
Relational Algebra Expressions
• The result of a relational operation is a relation instance
• Relational algebra expression combines relation instances using relational algebra operations
• Relational algebra expression produces the result of a query
12
![Page 13: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/13.jpg)
Simple SQL Query
SELECT select-list select-list
FROM from-list Cross Product
WHERE qualification; qualification
13
![Page 14: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/14.jpg)
Conceptual Evaluation Strategy for Simple Query
• Compute the cross-product of tables in from-list
• Delete those rows which fail the qualification condition
• Delete all columns that do not appear in the select-list
• If DISTINCT clause is specified, eliminate duplicate rows.
14
![Page 15: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/15.jpg)
Nested Queries
• Query block:– Single SELECT_FROM_WHERE expression– May include GROUP BY and HAVING
• Query block – basic unit that is translated into RA expression and optimized
• SQL query is decomposed into query blocks
15
![Page 16: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/16.jpg)
Different Processing Strategies
• Algorithms implementing basic relational algebra operations
• Algorithms implementing additional relational algebra operations
• Example:Find the students who have marks higher than
75 and are younger than 23
16
![Page 17: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/17.jpg)
Query Decomposition
• Analysis– Relational algebra tree
• Normalization• Semantic analysis• Simplification• Query restructuring
17
![Page 18: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/18.jpg)
Analysis
• Analyze query using compiler techniques• Verify that relations and attributes exist • Verify that operations are appropriate for
object type• Transform the query into some internal
representation
18
![Page 19: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/19.jpg)
Relational Algebra Tree• Leaf nodes are created for each base relation.• Non-leaf nodes are created for each intermediate
relation produced by RA operation.• Root of the tree represents query result.• Sequence is directed from leaves to root.
19
![Page 20: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/20.jpg)
Relational Algebra Tree (Cont…)
20
Root
Intermediate operations
Intermediate operations
Leaves
…
![Page 21: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/21.jpg)
Criterion Normalization
• Conjunctive normal form – a sequence of boolean expressions connected by conjunction (AND):– Each expression contains terms of comparison operators
connected by disjunctions (OR)• Disjunctive normal form – a sequence of boolean
expressions connected by disjunction (OR):– Each expression contains terms of comparison operators
connected by conjunction (AND)
21
![Page 22: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/22.jpg)
Criterion Normalization (Cont…)
• Arbitrary complex qualification condition can be converted into one of the normal forms
• Algorithms for computation:– CNF – only tuples that satisfy all expressions– DNF – tuples that are the result of union of tuples
that satisfy the exprssions
22
![Page 23: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/23.jpg)
Semantic Analysis
• Applied to normalized queries• Rejects contradictory queries:– Qualification condition cannot be satisfied by any
tuple
• Rejects incorrectly formulated queries:– Condition components do not contribute to
generation of the result.
23
![Page 24: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/24.jpg)
Relation Connection Graph
• Conjunctive queries without negation• Each node corresponds to a base relation and
the result• An edge between two nodes is created:– If there a join – If a node is a source for projection.
• If the graph is not connected, the query is incorrectly formulated
24
![Page 25: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/25.jpg)
Simplification
• Eliminates redundancy in qualification• Queries against views:– Access privileges– Redundancy in qualification
• Transform query to equivalent efficiently computed form
• Main tool – rules of boolean algebra
25
![Page 26: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/26.jpg)
Queries against Views
• View resolution:– View select-list is translated into corresponding select-list
in the view defining query– From-list of the query is modified to hold the names of
base tables– Qualifications from WHERE clause are combined– GROUP BY and HAVING clauses are modified
26
![Page 27: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/27.jpg)
Rules of Boolean Algebra
ptruep
pfalsep
falsefalsep
ppp
ppp
)(
)(
pqpp
pqpp
truepp
falsepp
truetruep
)(
)(
)(
)(
27
![Page 28: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/28.jpg)
Query Restructuring• Rewriting a query using relational
algebra operations• Modifying relational algebra expression
to provide more efficient implementation
28
![Page 29: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/29.jpg)
Query Optimization• Optimization criteria:– Reduce total execution time of the query:• Minimize the sum of the execution times of all
individual operations• Reduce the number of disk accesses
– Reduce response time of the query:• Maximize parallel operations
• Dynamic vs. static optimization
29
![Page 30: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/30.jpg)
Heuristic Approach
• Heuristic - problem-solving by experimental methods
• Applying general rules to choose the most appropriate internal query representation
• Based on transformation rules for relational algebra operations
30
![Page 31: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/31.jpg)
Transformation Rules• Cascade of selection operations:
• Commutativity of selection operations
• Sequence of projection operations
where )...(
)(...
NML
R LNML
)))((()( RR rqprqp
31
))(())(( RR pqqp
![Page 32: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/32.jpg)
Transformation Rules (Cont…)• Commutativity of selection and projection
where p involves only attributes from {A1,…,Am}
• Commutativity of binary operations ; ; ;
))(())(( ,...,,..., 11RR
mm AAppAA
32
RSSR
RSSR pp
RSSR
RSSR
![Page 33: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/33.jpg)
Transformation Rules (Cont…)
• Commutativity of selection and theta join
• Commutativity of projection and theta join
Where A1contains only attributes from R and A2-only attributes from S
SRRR rprp ))(()(
33
)()()(2121SRSR ArArAA
![Page 34: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/34.jpg)
Transformation Rules (Cont…)• Commutativity of projection and union
• Associativity of binary operations
34
)()()( SRSR LLL
).()(
);()(
);()(
);()(
TSRTSR
TSRTSR
TRSTRR
TRSTSR
![Page 35: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/35.jpg)
Heirustic Rules
• Perform selection as early as possible• Combine Cross product with a subsequent
selection• Rearrange base relations so that the most
restrictive selection is executed first.• Perform projection as early as possible• Compute common expressions once.
35
![Page 36: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/36.jpg)
Cost Estimation Components
• Cost of access to secondary storage• Storage cost – cost of storing intermediate
results• Computation cost• Memory usage cost – usage of RAM buffers
36
![Page 37: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/37.jpg)
Cost Estimation for Relational Algebra Expressions
• Formulae for cost estimation of each operation
• Estimation of relational algebra expression• Choosing the expression with the lowest cost
37
![Page 38: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/38.jpg)
Cost Estimation in Query Optimization
• Based on relational algebra tree• For each node in the tree the estimation is to
be done for:– the cost of performing the operation;– the size of the result of the operation;– whether the result is sorted.
38
![Page 39: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/39.jpg)
Database Statistics for a Relation
• Cardinality of relation instance• Block (of tuples) – page• Number of blocks required to store a relation
(data)• Blocking factor – number of tuples in one
block • Number of blocks required to store an index
39
![Page 40: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/40.jpg)
Database Statistics for an Attribute of a Relation
• The number of distinct values• Possible minimum and maximum values• Selection cardinality of an attribute:– For equality condition on the attribute– For inequality condition on the attribute
40
![Page 41: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/41.jpg)
Algorithms for Relational Algebra Operations Implementation
• Linear search• Binary search • Sort-merge• External sorting• Hashing
41
![Page 42: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/42.jpg)
File Organization
• The physical arrangement of data in a file into records and blocks (pages) on secondary storage
• Storing and retrieving data depends on the file organization
42
![Page 43: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/43.jpg)
Heap Files
• Unordered files• Records are placed in the file in the same
order as they are inserted• If there is insufficient space in the last block, a
new block is added.• Records are retrieved based on scan
43
![Page 44: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/44.jpg)
Ordered Files
• Files sorted on the values of the ordering fields
• Ordering key – ordering fields with unique constraint
• Under certain conditions records can be retrieved based on binary search
44
![Page 45: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/45.jpg)
Hash Files
• Records are randomly distributed across the available space
• To store a record the address of the block (page) is calculated by Hash function
• Blocks are kept at about 80% occupancy• To retrieve the data all blocks are scanned which is
about 1.25 times more than for heap files
45
![Page 46: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/46.jpg)
Indexes
• A data structure that allows the DBMS to locate particular records
• Index files are not required but very helpful• Index files can be ordered by the values of
indexing fields
46
![Page 47: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/47.jpg)
Retrieval Algorithms
• Files without indexes:– Records are selected by scanning data files
• Indexed files:– Matching selection condition– Records are selected by scanning index files and
finding corresponding blocks in data files
47
![Page 48: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/48.jpg)
Search Space
• Collection of possible execution strategies for a query
• Strategies can use:– Different join ordering– Different selection methods– Different join methods
• Enumeration algorithm – an algorithm to determine an optimal strategy from the search space
48
![Page 49: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/49.jpg)
Pipelining
• Materialization - saving intermediate results in a temporary table
• Pipelining – submitting the results of one operation to another operation without creating a temporary table
• A pipeline is implemented for each join operation
• Requires specific algorithms
49
![Page 50: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/50.jpg)
Linear Trees
• In a linear tree at least one child of a join node is a base relation
• Left-deep tree – the right child of each join node is a base relation
• Right-deep tree – the left child of each join node is a base relation
• Bushy tree – non-linear tree
50
![Page 51: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/51.jpg)
Left-Deep Tree
• Supports fully pipelined strategies• Advantage:– Reduces search space
• Disadvantage:– Excludes alternative strategies which may be of a
lower cost
51
![Page 52: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/52.jpg)
Query Optimization in Oracle
• Rule-based optimizer– Specify the goal in init.ora file
OPTIMIZER_MODE = RULE
• Cost-based optimizer– Specify the goal in init.ora file
OPTIMIZER_MODE = CHOOSE
52
![Page 53: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/53.jpg)
Rule-Based Optimizer
• 15 rules are ranked• RowID describes the physical location of the
record• RowID is associated with table indeces• Access path for a table only chosen if
statement contains a predicate or other construct that makes that access path available.
53
![Page 54: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/54.jpg)
Cost-Based Optimizer
• Statistics:– ANALYZE - command to generates statistics– PL/SQL package DBMS_STAT
• Hints– To access full table– To use a rule– To use a certain index– …
54
![Page 55: Query processing-and-optimization](https://reader034.vdocuments.site/reader034/viewer/2022052218/5462602caf7959b92a8b4eb2/html5/thumbnails/55.jpg)
Example
• SELECT /*+ full(student) */ sname FROM student WHERE Y_of_B = 1983;
55