07.overview of query processing
TRANSCRIPT
-
7/30/2019 07.Overview of Query Processing
1/35
Distributed Database SystemsAutumn, 2008
Chapter 7
Overview of Query
Processing
1Distributed Database Systems
-
7/30/2019 07.Overview of Query Processing
2/35
SQL: Non-Procedural Language of RDB
Tuple calculus
{ t|F(t) } where:
t: tuple variable
F(t) : well formed formula
Example
Get the No. and name of all managers
2Distributed Database Systems
""|, MANAGERTITLEtEMPtENAMEENOt
-
7/30/2019 07.Overview of Query Processing
3/35
SQL: Non-Procedural Language of RDB
Domain calculus
where:
xi : domain variables
: well formed formula
Example
{x,y |E(x,y, "manager") }
3Distributed Database Systems
,,,|,,, 2121 nn xxxFxxx
nxxxF ,,, 21
Variables are position sensitive!
-
7/30/2019 07.Overview of Query Processing
4/35
SQL: Non-Procedural Language of RDB
SQL is a tuple calculus language
SELECT ENO,ENAME
FROM EMP
WHERE TITLE=manager
4Distributed Database Systems
End user uses non-procedural languages
to express queries.
-
7/30/2019 07.Overview of Query Processing
5/35
Query Processor
Query processor transforms queries into
procedural operations to access data
5Distributed Database Systems
-
7/30/2019 07.Overview of Query Processing
6/35
Query Processor
Distributed query processor has to deal
with
query decomposition, and
data localization
6Distributed Database Systems
-
7/30/2019 07.Overview of Query Processing
7/35
7.1 Query Processing Problems
Distributed Database Systems 7
-
7/30/2019 07.Overview of Query Processing
8/35
7.1 Query Processing Problems
Centralized query processor must transform calculus query intoalgebra operation, and
choose the best execution plan Example:
SELECT ENAME
FROME,G
WHERE E.ENO = G.ENO
AND RESP=manager
8Distributed Database Systems
-
7/30/2019 07.Overview of Query Processing
9/35
7.1 Query Processing Problems
Relational Algebra 1
Relational Algebra 2
9Distributed Database Systems
GE ManagerRESPENOENAME ""
GEENOGENOEManagerRESPENAME ..""
Execution plan 2 is better for consumingless resources!
-
7/30/2019 07.Overview of Query Processing
10/35
7.1 Query Processing Problems
In DDB, the query processor mustconsider the communication cost and
select the best site!
Same query as last example, but G and Eare distributed.
Simple plan:
To transport all segments to query site andexecute there. This causes too much network
traffic, very costly.
10Distributed Database Systems
-
7/30/2019 07.Overview of Query Processing
11/35
7.1 Query Processing Problems
Distributed Query Example
Distribution of E and G
11Distributed Database Systems
-
7/30/2019 07.Overview of Query Processing
12/35
7.1 Query Processing Problems
Distributed Query Example
Query
12Distributed Database Systems
GE ManagerREPSPENOENAME ""
-
7/30/2019 07.Overview of Query Processing
13/35
7.1 Query Processing Problems
Distributed Query Example
Optimized Processing
13Distributed Database Systems
-
7/30/2019 07.Overview of Query Processing
14/35
7.2 Objectives of Query Processing
Distributed Database Systems 14
-
7/30/2019 07.Overview of Query Processing
15/35
7.2 Objectives of Query Processing
Two-fold objectives:
Transformation, and
Optimization
15Distributed Database Systems
-
7/30/2019 07.Overview of Query Processing
16/35
7.2 Objectives of Query Processing
Cost to be considered for optimization:
CPU time
I/O time, and
Communication time
16Distributed Database Systems
WAN: the last cost is dominant
LAN: all three are equal
-
7/30/2019 07.Overview of Query Processing
17/35
7.3 Complexity of Relational Algebra Operations
Distributed Database Systems 17
-
7/30/2019 07.Overview of Query Processing
18/35
7.3 Complexity of Relational Algebra Operations
Measured by n (cardinality) and tuples aresorted on comparison attributes
Distributed Database Systems 18
O(n)
O(nlogn)O(nlogn)
O(n2)
)duplicates(with,
GROUP),duplicates(with
,,,,
-
7/30/2019 07.Overview of Query Processing
19/35
7.4 Characterization of Query Processor
Distributed Database Systems 19
-
7/30/2019 07.Overview of Query Processing
20/35
7.4.1 Languages
For users:
calculus or algebra based languages.
For query processor:
map the input into internal form of
algebra augmented with
communication primitives.
Distributed Database Systems 20
-
7/30/2019 07.Overview of Query Processing
21/35
7.4.2 Types of Optimization
Exhaustive search
Workable for small solution space
Heuristics
Perform first,semi-join, etc. for largesolution space
Distributed Database Systems 21
,
-
7/30/2019 07.Overview of Query Processing
22/35
7.4.3 Optimization Timing
Static
Do it at compiling time by using statistics,
appropriate for exhaustive search, optimized
once, but executed many times. Dynamic
Do it at execution time, accurate, repeated
for every execution, expensive.
Distributed Database Systems 22
-
7/30/2019 07.Overview of Query Processing
23/35
7.4.4 Statistics
Facts of
Cardinalities
Attribute value distribution
Size of relation, etc.
Provided to query optimizer and
periodically updated.
Distributed Database Systems 23
-
7/30/2019 07.Overview of Query Processing
24/35
7.4.5 Decision Site
For query optimization, it may be done by Single site centralizedapproach, or
All the sites involved distributed, or
Hybrid
one site makes major decision incooperation with other sites making local
decisions
Distributed Database Systems 24
-
7/30/2019 07.Overview of Query Processing
25/35
7.4.6 Exploration of the Network Topology
WAN communication cost is dominant
LAN
communication cost is comparable to I/Ocost. Broadcasting capability, star network,
satellite network should be considered.
Distributed Database Systems 25
-
7/30/2019 07.Overview of Query Processing
26/35
7.4.7 Exploration of Replicated Fragments
Use replications to minimize
communication costs.
Distributed Database Systems 26
-
7/30/2019 07.Overview of Query Processing
27/35
7.4.8 Use of Semi-joins
Reduce the size of operand
relations to cut down
communication costs whenoverhead is not significant.
Distributed Database Systems 27
-
7/30/2019 07.Overview of Query Processing
28/35
7.5 Layers of Query Processing
Distributed Database Systems 28
-
7/30/2019 07.Overview of Query Processing
29/35
-
7/30/2019 07.Overview of Query Processing
30/35
7.5.1 Query Decomposition
Decompose calculus query into algebraquery using global conceptual schema
information.
Distributed Database Systems 30
Step 1 calculus normalization
Step 2 semantic analysis to rejectincorrect queries
Step 3 simplification to eliminateredundant components
Step 4 translation of calculus queryinto optimized algebra query.
-
7/30/2019 07.Overview of Query Processing
31/35
7.5.2 Data Localization
Distributed query is mapped intoa fragment query and simplified
to produce agoodone.
Distributed Database Systems 31
-
7/30/2019 07.Overview of Query Processing
32/35
7.5.3 Global Query Optimization
Find an execution strategy close tooptimal.
Find the best ordering of operations in
the fragment query, includingcommunication operations.
Cost function defined in time is required.
Distributed Database Systems 32
-
7/30/2019 07.Overview of Query Processing
33/35
7.5.4 Local Query Optimization
Centralized system algorithms
(to be discussed in chapter 9)
Distributed Database Systems 33
-
7/30/2019 07.Overview of Query Processing
34/35
-
7/30/2019 07.Overview of Query Processing
35/35
7.6 Conclusions
Query processor
must be able to findgood execution plan for a calculus query, s.
t. CPU time, I/O time and communication
time are minimized. Method: laying of
decomposition
localization global query optimization
local query optimization