CS 4432 query processing 1
CS4432: Database Systems II
CS 4432 query processing 2
Query Processing
Query in SQL Query Plan in Algebra
CS 4432 query processing 3
Example
Data: relation R (A, B, C) relation S (C, D, E)
Query: SELECT B, D
FROM R, SWHERE R.A = “c” and S.E = 2 and R.C=S.C
CS 4432 query processing 4
R A B C S C D E
a 1 10 10 x 2
b 1 20 20 y 2
c 2 10 30 z 2
d 2 35 40 x 1
e 3 45 50 y 3
Answer B D2 x
SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C
CS 4432 query processing 5
• How do we execute query?
- Form Cartesian product of all tables in FROM-clause
- Select tuples that match WHERE-clause- Project columns that occur in SELECT-clause
One idea
CS 4432 query processing 6
R X S R.A R.B R.C S.C S.D S.E
a 1 10 10 x 2
a 1 10 20 y 2
. .
C 2 10 10 x 2 . .
Bingo!
Got one...
CS 4432 query processing 7
But ?
• Performance would be unacceptable!
• We need a better approach to reasoning about queries, their execution
orders and their respective costs
CS 4432 query processing 8
Formal Relational Query Languages
• Two mathematical Query Languages form basis for “real” languages (e.g. SQL), and for implementation:
– Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.)
– Relational Algebra: More operational, very useful for representing execution plans.
CS 4432 query processing 9
Relational Algebra
• Tuple : ordered set of data values• Relation: a set of tuples• Algebra: formal mathematical
system consisting of a set of objects and operations on those objects
• Relational algebra: Algebra whose objects are relations and operators transform relations into other relations
CS 4432 query processing 10
Relational Algebra ?
Query: SELECT B, D
FROM R, SWHERE R.A = “c” and S.E = 2 and R.C=S.C
CS 4432 query processing 11
Relational Algebra - can be used to
describe plans...Ex: Plan I
B,D
R.A=“c” S.E=2 R.C=S.C
XR S
OR: B,D [ R.A=“c” S.E=2 R.C = S.C (RXS)]
CS 4432 query processing 12
Example Instances
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
sid bid day
22 101 10/10/9658 103 11/12/96
R1
S1
S2
“Sailors” and
“Reserves” relations
CS 4432 query processing 13
Relational Algebra• Basic operations:
– Selection ( ) Selects a subset of rows from relation.– Projection ( ) Deletes unwanted columns from relation.– Cross-product ( ) Allows us to combine two relations.– Set-difference ( ) Tuples in reln. 1, but not in reln. 2.– Union ( ) Tuples in reln. 1 and in reln. 2.
• Additional operations:– Intersection, join, division, renaming: Not essential !
• Algebra is “closed”: Since each operation returns a relation, operations can be composed !
CS 4432 query processing 14
Projectionsname rating
yuppy 9lubber 8guppy 5rusty 10
sname rating
S,
( )2
age
35.055.5
age S( )2
CS 4432 query processing 15
Selection
rating
S82( )
sid sname rating age28 yuppy 9 35.058 rusty 10 35.0
sname ratingyuppy 9rusty 10
sname rating rating
S,
( ( ))82
CS 4432 query processing 16
Union, Intersection, Set Difference
• Operate on two union-compatible relations:– Same number of
fields.– `Corresponding’ fields
have same type.
sid sname rating age
22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0
sid sname rating age31 lubber 8 55.558 rusty 10 35.0
S S1 2
S S1 2
sid sname rating age
22 dustin 7 45.0
S S1 2
CS 4432 query processing 17
Cross-Product• Each row of S1 is paired with each row of
R1.
( ( , ), )C sid sid S R1 1 5 2 1 1
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/ 10/ 96
22 dustin 7 45.0 58 103 11/ 12/ 96
31 lubber 8 55.5 22 101 10/ 10/ 96
31 lubber 8 55.5 58 103 11/ 12/ 96
58 rusty 10 35.0 22 101 10/ 10/ 96
58 rusty 10 35.0 58 103 11/ 12/ 96
Conflict: Both S1 and R1 have a field called sid.Renaming operator:
CS 4432 query processing 18
Joins
• Condition Join :
• Result schema same as that of cross-product.
R c S c R S ( )
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/ 12/ 9631 lubber 8 55.5 58 103 11/ 12/ 96
S RS sid R sid
1 11 1
. .
CS 4432 query processing 19
Joins
• Equi-Join: condition contains only equalities.
• Result schema only one copy of fields for which equality is specified.
• Natural Join: Equijoin on all common fields.
sid sname rating age bid day
22 dustin 7 45.0 101 10/ 10/ 9658 rusty 10 35.0 103 11/ 12/ 96
S Rsid
1 1
CS 4432 query processing 20
Division
• Not primitive operator, but useful: Find sailors who have reserved all boats.
• A has 2 fields x and y; B has only field y:– A/B =
– i.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there is an xy tuple in A.
x x y A y B| ,
CS 4432 query processing 21
Examples of Division A/B
snos1s2s3s4
snos1s4
snos1
sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4
A
pnop2
B1
pnop2p4
B2
pnop1p2p4
B3
A/B1 A/B2 A/B3
CS 4432 query processing 22
Expressing A/B Using Basic Operators
• Division is useful shorthand. • Idea: For A/B, compute all x values
that are not `disqualified’ by some y value in B.
Disqualified x values:
A/B:
x x A B A(( ( ) ) )
x A( ) all disqualified tuples
CS 4432 query processing 23
Find names of sailors who’ve reserved boat #103
Solution 2: ( , Re )Temp servesbid
1103
( , )Temp Temp Sailors2 1
sname Temp( )2
Solution 3: sname bidserves Sailors( (Re ))103
sname bidserves Sailors(( Re ) )103
Solution 1:
CS 4432 query processing 24
Find names of sailors who’ve reserved a red boat
• Information about boat color only available in Boats; so need an extra join:
sname color redBoats serves Sailors((
' ') Re )
A more efficient solution:
sname sid bid color redBoats s Sailors( ((
' ') Re ) )
A query optimizer can find this, given the first solution!
CS 4432 query processing 25
Find sailors who’ve reserved a red or a green boat
• Can identify all red or green boats, then find sailors who’ve reserved one of these boats:
( , (' ' ' '
))Tempboatscolor red color green
Boats
sname Tempboats serves Sailors( Re )
Can also define Tempboats using union! (How?)
What happens if is replaced by in this query?
CS 4432 query processing 26
Find sailors who’ve reserved a red and a green boat
• Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (sid is a key for Sailors):
( , ((' '
) Re ))Tempredsid color red
Boats serves
sname Tempred Tempgreen Sailors(( ) )
( , ((' '
) Re ))Tempgreensid color green
Boats serves
CS 4432 query processing 27
Find names of sailors who’ve reserved all boats
• Uses division; schemas of input relations to / must be carefully chosen:
( , (,
Re ) / ( ))Tempsidssid bid
servesbidBoats
sname Tempsids Sailors( )
To find sailors who’ve reserved all ‘Interlake’ boats:
/ (' '
) bid bname Interlake
Boats.....
CS 4432 query processing 28
Relational Algebra
representation used to describe plans...
CS 4432 query processing 29
Example
Data: relation R (A, B, C) relation S (C, D, E)
Query: SELECT B, D
FROM R, SWHERE R.A = “c” and S.E = 2 and R.C=S.C
CS 4432 query processing 31
Relational Algebra - to describe planEx: Plan I
B,D
R.A=“c” S.E=2 R.C=S.C
XR S
OR: B,D [ R.A=“c” S.E=2 R.C = S.C (RXS)]
CS 4432 query processing 32
Another idea:
B,D
R.A = “c” S.E = 2
R S
Plan II
natural join
CS 4432 query processing 33
R S
A B C (R) (S) C D E
a 1 10 A B C C D E 10 x 2
b 1 20 c 2 10 10 x 2 20 y 2
c 2 10 20 y 2 30 z 2
d 2 35 30 z 2 40 x 1
e 3 45 50 y 3
SELECT B,D FROM R,S WHERE R.A = “c” and S.E = 2 and R.C=S.C
CS 4432 query processing 34
Yet another idea:
B,D
S.E = 2
R.A = “c”
R S
Plan III
natural join
CS 4432 query processing 35
Plan III
Use R.A and S.C Indexes(1) Use R.A index to select R tuples
with R.A = “c”(2) For each R.C value found,
use S.C index to find matching tuples(3) Eliminate S tuples S.E 2(4) Join matching R,S tuples, project
B,D attributes and place in result
CS 4432 query processing 36
R S
A B C C D E
a 1 10 10 x 2
b 1 20 20 y 2
c 2 10 30 z 2
d 2 35 40 x 1
e 3 45 50 y 3
A CI1 I2
=“c”
<c,2,10> <10,x,2>
check=2?
output: <2,x>
next tuple:<c,7,15>
CS 4432 query processing 37
Overview of Query Optimization
CS 4432 query processing 38
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answer
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics
CS 4432 query processing 39
Example: SQL query
Query : Find the movies with stars born in 1960
SELECT titleFROM StarsInWHERE starName IN (
SELECT nameFROM MovieStarWHERE birthdate LIKE ‘%1960’
);
CS 4432 query processing 40
Example: Parse Tree<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Tuple> IN <Query>
title StarsIn <Attribute> ( <Query> )
starName <SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Attribute> LIKE <Pattern>
name MovieStar birthDate ‘%1960’
CS 4432 query processing 41
Example: Generating Relational Algebra
title
StarsIn <condition>
<tuple> IN name
<attribute> birthdate LIKE ‘%1960’
starName MovieStar
Fig. 16.14: An expression using a two-argument , midway between a parse tree and relational algebra
CS 4432 query processing 42
Example: Logical Query Plan
title
starName=name
StarsIn name
birthdate LIKE ‘%1960’
MovieStar
Fig. 16.16: Applying the rule for IN conditions
CS 4432 query processing 43
Example: Improved Logical Query Plan
title
starName=name
StarsIn name
birthdate LIKE ‘%1960’
MovieStar
Fig. 16.16: An improvement on prev fig
Question:Push project to
StarsIn?
CS 4432 query processing 44
Example: Estimate Result Sizes
Need expected size
StarsIn
MovieStar
CS 4432 query processing 45
Example: One Physical Plan
Parameters: join order,
memory size, project attributes,...
Hash join
SEQ scan index scan Parameters:Select Condition,...
StarsIn MovieStar
CS 4432 query processing 46
Example: Estimate costs
L.Q.P
P1 P2 …. Pn
C1 C2 …. Cn Pick best!
CS 4432 query processing 47
Query Optimization
• Relational algebra level …• Detailed query plan level …
– Estimate costs– Generate and compare plans