Download - ICS 421 Spring 2010 Query Evaluation ( i )
1
ICS 421 Spring 2010
Query Evaluation (i)
Asst. Prof. Lipyeow LimInformation & Computer Science Department
University of Hawaii at Manoa
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
22/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Parse Query
Enumerate Plans
Estimate Cost
Choose Best Plan
Evaluate Query Plan
Result
QuerySELECT * FROM Reserves WHERE sid=101
Sid=101
Reserves
SCAN (sid=101)
ReservesIDXSCAN (sid=101)
Reserves
Index(sid)
fetch
32.0 25.0
Pick B
A B
Evaluate Plan A
Optimizer
3
Parse Query• Input : SQL
– Eg. SELECT-FROM-WHERE, CREATE TABLE, DROP TABLE statements
• Output: Some data structure to represent the “query”– Relational algebra ?
• Also checks syntax, resolves aliases, binds names in SQL to objects in the catalog
• How ?2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Parse Query
Enumerate Plans
Estimate Cost
Choose Best Plan
Evaluate Query Plan
Result
Query
4
Enumerate Plans• Input : a data structure representing the
“query”• Output: a collection of equivalent query
evaluation plans• Query Execution Plan (QEP): tree of
database operators.– high-level: RA operators are used– low-level: RA operators with particular
implementation algorithm.• Plan enumeration: find equivalent plans
– Different QEPs that return the same results– Query rewriting : transformation of one
QEP to another equivalent QEP.2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Parse Query
Enumerate Plans
Estimate Cost
Choose Best Plan
Evaluate Query Plan
Result
Query
5
Estimate Cost• Input : a collection of equivalent
query evaluation plans• Output: a cost estimate for each
QEP in the collection• Cost estimation: a mapping of a
QEP to a cost– Cost Model: a model of what counts
in the cost estimate. Eg. Disk accesses, CPU cost …
• Statistics about the data and the hardware are used.
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Parse Query
Enumerate Plans
Estimate Cost
Choose Best Plan
Evaluate Query Plan
Result
Query
6
Choose Best Plan• Input : a collection of equivalent
query evaluation plans and their cost estimate
• Output: best QEP in the collection• The steps: enumerate plans, estimate
cost, choose best plan collectively called the:
• Query Optimizer: – Explores the space of equivalent plan
for a query– Chooses the best plan according to a
cost model2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Parse Query
Enumerate Plans
Estimate Cost
Choose Best Plan
Evaluate Query Plan
Result
Query
7
Evaluate Query Plan• Input : a QEP (hopefully the best)• Output: Query results• Often includes a “code
generation” step to generate a lower level QEP in executable “code”.
• Query evaluation engine is a “virtual machine” that executes some code representing low level QEP.
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Parse Query
Enumerate Plans
Estimate Cost
Choose Best Plan
Evaluate Query Plan
Result
Query
8
Query Execution Plans (QEPs)• A tree of database operators: each operator is a RA
operator with specific implementation• Selection : Index Scan or Table Scan• Projection π:
– Without DISTINCT : Table Scan– With DISTINCT : requires sorting or index scan
• Join : – Nested loop joins (naïve)– Index nested loop joins– Sort merge joins
• Sort :– In-memory sort– External sort
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
9
QEP Examples
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5
S.rating>5 AND R.bid=100
Reserves Sailors
R.sid=S.sid
πS.sname
Nested Loop Join
On the fly
On the fly
(SCAN) (SCAN)
S.rating>5 AND R.bid=100
Reserves Sailors
R.sid=S.sid
πS.sname
S.rating>5
Reserves Sailors
R.sid=S.sid
πS.sname
R.bid=100
S.rating>5
Reserves Sailors
R.sid=S.sid
πS.sname
Nested Loop Join
On the fly
R.bid=100
(SCAN) (SCAN)
Temp T1
10
Access Paths• An access path is a method of retrieving
tuples. Eg. Given a query with a selection condition:– File or table scan– Index scan
• Index matching problem: given a selection condition, which indexes can be used for the selection, i.e., matches the selection ?– Selection condition normalized to conjunctive
normal form (CNF), where each term is a conjunct
– Eg. (day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3
– CNF: (day<8/9/94 OR bid=5 OR sid=3 ) AND (rname=‘Paul’ OR bid=5 OR sid=3)
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
S.rating>5
Reserves Sailors
R.sid=S.sid
πS.sname
Nested Loop Join
On the fly
R.bid=100
(SCAN) (SCAN)
Temp T1
Index(R.bid)
R.bid=100
(IDXSCAN)
Fetch
Reserves
11
Index Matching
• A tree index matches a selection condition if the selection condition is a prefix of the index search key.
• A hash index matches a selection condition if the selection condition has a term attribute=value for every attribute in the index search key
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
I1: Tree Index (a,b,c)
I2: Tree Index (b,c,d)
I3: Hash Index (a,b,c)
Q1: a=5 AND b=3
Q2: a=5 AND b>6
Q3: b=3
Q4: a=5 AND b=3 AND c=5
Q5: a>5 AND b=3 AND c=5
12
One Approach to Selections
• The selectivity of an access path is the size of the result set (in terms of tuples or pages).– Sometimes selectivity is also used to mean reduction factor:
fraction of tuples in a table retrieved by the access path or selection condition.
• Eg. Consider the selection: day<8/9/94 AND bid=5 AND sid=3
– Tree Index(day) – Hash index (bid,sid)
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
1. Find the most selective access path, retrieve tuples using it2. Apply remaining terms in selection not matched by the
chosen access path
13
Join Algorithms• Cost model
– Single DBMS server: I/Os in number of pages– Distributed DBMS: network I/Os + local disk I/Os– td : time to read/write one page to local disk
– ts: time to ship one page over the network to another node
• Single server:– Nested Loop Join– Index Nested Loop Join– Sort Merge Join– Hash Join
• Distributed:– Semi-Join– Bloom Join
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
14
Nested Loop Join
For each data page PS1 of S1
For each tuple s in PS1
For each data page PR1 of R1
For each tuple r in PR1
if (s.sid==r.sid) then output s,r
• Worst case number of local disk reads = Npages(S1) + |S1|*Npages(R1)
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
sid bid day22 101 10/10/9658 103 11/12/96
sid sname rating age
22 Dustin 7 45.0
31 Lubber 8 55.5
58 Rusty 10 35.0
R1S1
15
Index Nested Loop Join
For each data page PS1 of S1
For each tuple s in PS1
if (s.sid Index(R1.sid)) then fetch r & output <s,r>
• Worst case number of local disk reads with tree index= Npages(S1) + |S1|*( 1 + logF Npages(R1))
• Worst case number of local disk reads with hash index= Npages(S1) + |S1|* 2
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
sid bid day22 101 10/10/9658 103 11/12/96
sid sname rating age
22 Dustin 7 45.0
31 Lubber 8 55.5
58 Rusty 10 35.0
R1S1
Index(R1.sid)
16
Sort Merge Join
1. Sort S1 on SID2. Sort R1 on SID3. Compute join on SID using Merging algorithm
• If join attributes are relatively unique, the number of disk pages = Npages(S1) log Npages(S1) + Npages(R1) log Npages(R1) + Npages(S1) + Npages(R1)
• If the number of duplicates in the join attributes is large, the number of disk pages approaches that of nested loop join.
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
sid bid day19 100 8/8/9922 101 10/10/9622 99 10/12/9558 103 11/12/96
sid sname rating age
22 Dustin 7 45.0
31 Lubber 8 55.5
58 Rusty 10 35.0
R1S1
17
Distributed Joins
• Consider:– Reserves join Sailors
• Depends on:– Which node get the query– Whether tables are
fragmented/partitioned or not
• Node 1 gets query– Perform join at Node 3 (or 4)
ship results to Node 1 ?– Ship tables to Node 1 ?
• Node 3 gets query– Fetch sailors in loop ?– Cache sailors locally ?
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Network
Boats1
Node 1
Boats2
Node 2
Reserves
Node 3
Sailors
Node 4
18
Distributed Joins over Fragments
R join S = R.sid=S.sid (R S)
= R.sid=S.sid ((R1R2) (S1 S2))
= R.sid=S.sid ((R1 S1) (R1 S2) (R2 S1) (R2 S2))
= R.sid=S.sid (R1 S1) R.sid=S.sid (R1 S2) R.sid=S.sid (R2 S1) R.sid=S.sid (R2 S2)
= (R1 join S1) (R1 join S2) (R2 join S1) (R2 join S2)
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Network
Reserves1
Node 1
Reserves2
Node 2
Sailors1
Node 3
Sailors2
Node 4
Equivalent to a union of joins over each pair of fragments
This equivalence applies to splitting a relation into pages in a single server DBMS system too!
19
Distributed Nested Loop• Consider performing R1 join S2 on
Node 1• Page-oriented nested loop join:
For each page r of R1Fetch r from local diskFor each page s of S2
Fetch s if scacheOutput r join s
• Cost = Npages(R1)* td + Npages(R1)*Npages(S2)*(td + ts)
• If cache can hold entire S2, cost is Npages(R1)* td + Npages(S2)* (td + ts)
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Network
R1
Node 1
S2
Node 2
foreachR1 page r Fetch
S2 page s
r join s
20
Semijoins• Consider performing R1 join S2 on
Node 1• S2 needs to be shipped to R1• Does every tuple in S2 join with R1 ?• Semijoin:
– Don’t ship all of S2– Ship only those S2 rows that will join with
R1– Assumes that the join causes a reduction
in S2!
• Cost = Npages(R1)*td + Npages(πsidR1)*ts + Cost() + Npages(sidjsidS2)*ts + Cost(R1 join sidjsidS2)
2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Network
R1
Node 1
S2
Node 2
πsidR1 (jsid,πsidR1
πsidS2)
sidjsidS2
R1 joinsidjsidS2
21
Bloomjoins• Consider performing R1 join S2 on
Node 1• Can we do better than semijoin ?• Bloomjoin:
– Don’t ship all of (πsidR1)– Node 1: Ship a “bloom filter” (like a
signature) of (πsidR1)• Hash each sid• Set the bit for hash value in a bit vector• Send the bit vector v1
– Node 2: • Hash each (πsidS2) to bit vector v2• Computer (v1 v2) • Send rows of S2 in the intersection
• False positives2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa
Network
R1
Node 1
S2
Node 2
v1=Bloom(πsidR1)
v2=Bloom(πsidS2)
sidjsidS2R1 joinsidjsidS2
jsid=v1v2