chapter 5: part 2. group-by &...
TRANSCRIPT
1
SQL: Structured Query Language
Chapter 5: Part 2.
Group-By & Joins.
2
Running Example
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
S1
S2
Instances of the Sailorsand Reserves relations in our examples.
3
Aggregation andHaving Clauses
4
Aggregate Operators
Significant extension of relational algebra.
COUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)
Why no Distinct?
5
Aggregate Operators
COUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)
SELECT AVG (S.age)FROM Sailors SWHERE S.rating=10
SELECT COUNT (*)FROM Sailors S
SELECT COUNT (DISTINCT S.rating)FROM Sailors SWHERE S.sname=‘Bob’
7
Find name and age of the oldest sailor(s)
SELECT S.sname, MAX (S.age)FROM Sailors S
SELECT S.sname, S.ageFROM Sailors SWHERE S.age =
(SELECT MAX (S2.age)FROM Sailors S2)
• What does this query do ? Is this query legal?
What does this query do ? Is this query legal?
•No! Why not ?
8
Find name and age of the oldest sailor(s)
SELECT S.sname, S.ageFROM Sailors SWHERE S.age =
(SELECT MAX (S2.age)FROM Sailors S2)
SELECT S.sname, S.ageFROM Sailors SWHERE (SELECT MAX (S2.age)
FROM Sailors S2)= S.age
Example queries are equivalent in SQL/92
But 2nd one does not always work in some systems
9
Motivation for Grouping
What are the problems with above ?
• We may not know how many rating levels exist.
• Nor what the rating values for these levels are.
• Serious performance overhead due DB to programming connections
SELECT MIN (S.age)FROM Sailors SWHERE S.rating = level
For level = 1, 2, ... , 10:
Find age of youngest sailor for each rating level.
10
Add Group By Clause to SQL
11
Queries With GROUP BY
GROUP BY: A group is a set of tuples that each have the same value for all attributes in grouping-list.
SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-list
12
Are GROUP BY queries valid or not ?
SELECT S.nameFROM Sailors SGROUP BY S.rating
SELECT avg ( S.salary)FROM Sailors SGROUP BY S.rating
13
Guidelines on Attributes: GROUP BY
target-list contains :(i) attribute names from grouping-list,
or (ii) aggregate-op (column-name)
REQUIREMENT: - Each answer tuple of a group must have single value.
SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-list
14
Find age of youngest sailor for each rating
SELECT S.rating, MIN (S.age) AS min-ageFROM Sailors SGROUP BY S.rating
Is Target List Valid ?
15
Find age of the youngest sailor with age >= 18, for each rating
SELECT S.rating, MIN (S.age) AS min-ageFROM Sailors SWHERE S.age >= 18GROUP BY S.rating
16
Queries With GROUP BY and HAVING
HAVING: A restriction on each group.
SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-listHAVING group-qualification
17
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors in the group
SELECT S.rating, MIN (S.age) AS min-ageFROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) > 1
Query With Having Clause
What does query below mean ?
18
GroupBy --- Conceptual Evaluation
1. Compute the cross-product of relation-list (From)
2. Discard tuples that fail qualification (Where)
3. Delete `unnecessary’ fields
4. Partition the remaining tuples into groups by the value of attributes in grouping-list. (GroupBy)
5. Eliminate groups using the group-qualification (Having)
6. Apply selection to each group to produce output tuple (Select)
Reminder: one answer tuple is generated per qualifying group.
19
GroupBy --- Conceptual Evaluation
Step by step example.
20
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors
SELECT S.rating, MIN (S.age) AS min-age
FROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) > 1
sid sname rating age
22 dustin 7 45.0
29 brutus 1 33.0
31 lubber 8 55.5
32 andy 8 25.5
58 rusty 10 35.0
64 horatio 7 35.0
71 zorba 10 16.0
74 horatio 9 35.0
85 art 3 25.5
95 bob 3 63.5
96 frodo 3 25.5
Sailors instance:
21
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors.
rating age
7 45.0
1 33.0
8 55.5
8 25.5
10 35.0
7 35.0
10 16.0
9 35.0
3 25.5
3 63.5
3 25.5
rating minage
3 25.5
7 35.0
8 25.5
rating age
1 33.0
3 25.5
3 63.5
3 25.5
7 45.0
7 35.0
8 55.5
8 25.5
9 35.0
10 35.0
22
Again: Find age of youngest sailor with age 18, for each rating with at least 2 such sailors
SELECT S.rating, MIN (S.age) AS min-age
FROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) > 1
Now: Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors and with every sailor under 60.
Options:
1. Put 60 age condition into WHERE clause ?
2. Put 60 age condition into HAVING clause ?
23
Find age of youngest sailor with age >= 18, for each rating with at least 2 such sailors and with every sailor under 60.
rating age
7 45.0
1 33.0
8 55.5
8 25.5
10 35.0
7 35.0
10 16.0
9 35.0
3 25.5
3 63.5
3 25.5
rating age
1 33.0
3 25.5
3 63.5
3 25.5
7 45.0
7 35.0
8 55.5
8 25.5
9 35.0
10 35.0
rating minage
7 35.0
8 25.5
HAVING COUNT (*) > 1 AND EVERY (S.age <=60)
EVERY :
Must hold
for all tuples
in the group.
24
Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60.
SELECT S.rating, MIN (S.age) AS min-age
FROM Sailors SWHERE S.age >= 18 ???GROUP BY S.ratingHAVING COUNT (*) > 1 ???
sid sname rating age
22 dustin 7 45.0
29 brutus 1 33.0
31 lubber 8 55.5
32 andy 8 25.5
58 rusty 10 35.0
64 horatio 7 35.0
71 zorba 10 16.0
74 horatio 9 35.0
85 art 3 25.5
95 bob 3 63.5
96 frodo 3 25.5
Sailors instance:
Now can check
age<=60 before
making groups !
25
Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60.
SELECT S.rating, MIN (S.age) AS min-age
FROM Sailors SWHERE S.age >= 18 AND S.age <= 60GROUP BY S.ratingHAVING COUNT (*) > 1
sid sname rating age
22 dustin 7 45.0
29 brutus 1 33.0
31 lubber 8 55.5
32 andy 8 25.5
58 rusty 10 35.0
64 horatio 7 35.0
71 zorba 10 16.0
74 horatio 9 35.0
85 art 3 25.5
95 bob 3 63.5
96 frodo 3 25.5
Sailors instance:
Answer relation:
rating minage
3 25.5
7 35.0
8 25.5
Check age<=60 before making groups.
26
Join, GroupBy and Nesting.
27
For each red boat, find the number of reservations for this boat
Sailors : sid, name, …
Boats : bid, color, …
Reserves: sid, bid, day
28
For each red boat, find the number of reservations for this boat
Grouping over Join of three relations.
SELECT B.bid, COUNT (*) AS s-countFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’GROUP BY B.bid
29
For each red boat, find the number of reservations for this boat
• Illegal !!!
• Only column in GroupBy can appear in Having clause, unless in aggregate operator of Having clause;
• E.g., HAVING count (B.color = ‘red’ ) > 1;
• E.g., HAVING EVERY (B.color = ‘red’ );
SELECT B.bid, COUNT (*) AS s-countFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid GROUP BY B.bidHAVING (B.color=‘red’)
Q: What if we move B.color=‘red’ from WHERE to HAVING?
30
Find age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age)
SELECT S.rating, MIN (S.age)FROM Sailors SWHERE S.age > 18GROUP BY S.ratingHAVING 1 < (SELECT COUNT (*)
FROM Sailors S2WHERE S2.rating=S.rating)
Hint : HAVING clause can also contain a subquery.
31
Find age of the youngest sailor, for each rating with at least 2 sailors with age over 18;
SELECT S.rating, MIN (S.age)FROM Sailors SGROUP BY S.ratingHAVING 1 < (SELECT COUNT (*)
FROM Sailors S2WHERE S2.age > 18 and S.rating=S2.rating)
32
Aggregation with nesting
33
Find those ratings for which the average age is the minimum over all ratings
Above query has a problem ! What ?
Aggregate operations cannot be nested!
SELECT S.ratingFROM Sailors SWHERE S.age = (SELECT MIN (AVG (S2.age))
FROM Sailors S2)
34
Find those ratings for which the average age is the minimum over all ratings
SELECT Temp.rating, Temp.avg-age
FROM (SELECT S.rating, AVG (S.age) AS avg-ageFROM Sailors SGROUP BY S.rating) AS Temp
WHERE Temp.avg-age = (SELECT MIN (Temp.avg-age)FROM Temp)
Correct solution (in SQL/92):
Note: Not all SQL engines use the “AS” syntax for
naming a temporary relation. Then just drop “AS”.
35
Special Joins
36
Outer Joins : Special Operators
Left Outer Join;
SELECT S.sid, R.bid
FROM Sailors S LEFT OUTER JOIN Reserves R
WHERE S.sid = R.sid
SELECT S.sid, R.bid
FROM Sailors S NATURAL LEFT OUTER JOIN Reserves R
Sailors rows (left) without a matching Reserves row (right) appear in result, but not vice versa.
Right Outer Join; and Full Outer Join
37
Null Values
38
Null Values
Field values in a tuple are sometimes :
Unknown (e.g., a rating has not been assigned) or
Inapplicable (e.g., no maiden-name when male)
SQL provides special value null for such situations.
The presence of null complicates many issues.
39
Allowing Null Values
SQL special operators IS NULL or IS NOT NULL to check if value is/is not null.
Disallow NULL value:
rating INTEGER NOT NULL
We need a 3-valued logic :
• condition can be true, false or unknown.
40
Working with NULL values Question :
Predicate (S.rating = 8) can be TRUE or FALSE.
What if S.rating value is a null value?
Comparison operators on NULL return UNKNOWN
Recall: A result is only returned if WHERE clause is TRUE
(not FALE nor not UNKNOWN)
41
Working with NULL values
Question :
Arithmetic expression (S.rating + 8) usually is an INT.
What if S.rating value is a null value?
Arithmetic operations on NULL return NULL.
42
Truth table with UNKNOWN
WHERE clause is satisfied only when it evaluates to TRUE.
UNKNOWN AND TRUE = UNKNOWN
UNKNOWN OR TRUE = TRUE
UNKNOWN AND FALSE = FALSE
UNKNOWN OR FALSE = UNKNOWN
UNKNOWN AND UNKNOWN = UNKNOWN
UNKNOWN OR UNKNOWN = UNKNOWN
NOT UNKNOWN = UNKNOWN
44
Summary: To Recap
SQL easy to understand language, yet very powerful for expressing complex requests
SQL clauses : Nested subqueries
AGGREGATION, GROUPBY and HAVING
Special joins
Handling NULLS
Many alternative ways to write same query:
optimizer required to find efficient evaluation plan.
In practice, users should be aware of how queries are optimized and evaluated for best results.