chapter 5: part 2. group-by &...

42
1 SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins.

Upload: others

Post on 06-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

1

SQL: Structured Query Language

Chapter 5: Part 2.

Group-By & Joins.

Page 2: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

2

Running Example

sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0

sid sname rating age

28 yuppy 9 35.0

31 lubber 8 55.5

44 guppy 5 35.0

58 rusty 10 35.0

sid bid day

22 101 10/10/96

58 103 11/12/96

R1

S1

S2

Instances of the Sailorsand Reserves relations in our examples.

Page 3: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

3

Aggregation andHaving Clauses

Page 4: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

4

Aggregate Operators

Significant extension of relational algebra.

COUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)

Why no Distinct?

Page 5: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

5

Aggregate Operators

COUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)

SELECT AVG (S.age)FROM Sailors SWHERE S.rating=10

SELECT COUNT (*)FROM Sailors S

SELECT COUNT (DISTINCT S.rating)FROM Sailors SWHERE S.sname=‘Bob’

Page 6: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

7

Find name and age of the oldest sailor(s)

SELECT S.sname, MAX (S.age)FROM Sailors S

SELECT S.sname, S.ageFROM Sailors SWHERE S.age =

(SELECT MAX (S2.age)FROM Sailors S2)

• What does this query do ? Is this query legal?

What does this query do ? Is this query legal?

•No! Why not ?

Page 7: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

8

Find name and age of the oldest sailor(s)

SELECT S.sname, S.ageFROM Sailors SWHERE S.age =

(SELECT MAX (S2.age)FROM Sailors S2)

SELECT S.sname, S.ageFROM Sailors SWHERE (SELECT MAX (S2.age)

FROM Sailors S2)= S.age

Example queries are equivalent in SQL/92

But 2nd one does not always work in some systems

Page 8: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

9

Motivation for Grouping

What are the problems with above ?

• We may not know how many rating levels exist.

• Nor what the rating values for these levels are.

• Serious performance overhead due DB to programming connections

SELECT MIN (S.age)FROM Sailors SWHERE S.rating = level

For level = 1, 2, ... , 10:

Find age of youngest sailor for each rating level.

Page 9: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

10

Add Group By Clause to SQL

Page 10: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

11

Queries With GROUP BY

GROUP BY: A group is a set of tuples that each have the same value for all attributes in grouping-list.

SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-list

Page 11: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

12

Are GROUP BY queries valid or not ?

SELECT S.nameFROM Sailors SGROUP BY S.rating

SELECT avg ( S.salary)FROM Sailors SGROUP BY S.rating

Page 12: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

13

Guidelines on Attributes: GROUP BY

target-list contains :(i) attribute names from grouping-list,

or (ii) aggregate-op (column-name)

REQUIREMENT: - Each answer tuple of a group must have single value.

SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-list

Page 13: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

14

Find age of youngest sailor for each rating

SELECT S.rating, MIN (S.age) AS min-ageFROM Sailors SGROUP BY S.rating

Is Target List Valid ?

Page 14: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

15

Find age of the youngest sailor with age >= 18, for each rating

SELECT S.rating, MIN (S.age) AS min-ageFROM Sailors SWHERE S.age >= 18GROUP BY S.rating

Page 15: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

16

Queries With GROUP BY and HAVING

HAVING: A restriction on each group.

SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-listHAVING group-qualification

Page 16: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

17

Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors in the group

SELECT S.rating, MIN (S.age) AS min-ageFROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) > 1

Query With Having Clause

What does query below mean ?

Page 17: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

18

GroupBy --- Conceptual Evaluation

1. Compute the cross-product of relation-list (From)

2. Discard tuples that fail qualification (Where)

3. Delete `unnecessary’ fields

4. Partition the remaining tuples into groups by the value of attributes in grouping-list. (GroupBy)

5. Eliminate groups using the group-qualification (Having)

6. Apply selection to each group to produce output tuple (Select)

Reminder: one answer tuple is generated per qualifying group.

Page 18: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

19

GroupBy --- Conceptual Evaluation

Step by step example.

Page 19: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

20

Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors

SELECT S.rating, MIN (S.age) AS min-age

FROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) > 1

sid sname rating age

22 dustin 7 45.0

29 brutus 1 33.0

31 lubber 8 55.5

32 andy 8 25.5

58 rusty 10 35.0

64 horatio 7 35.0

71 zorba 10 16.0

74 horatio 9 35.0

85 art 3 25.5

95 bob 3 63.5

96 frodo 3 25.5

Sailors instance:

Page 20: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

21

Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors.

rating age

7 45.0

1 33.0

8 55.5

8 25.5

10 35.0

7 35.0

10 16.0

9 35.0

3 25.5

3 63.5

3 25.5

rating minage

3 25.5

7 35.0

8 25.5

rating age

1 33.0

3 25.5

3 63.5

3 25.5

7 45.0

7 35.0

8 55.5

8 25.5

9 35.0

10 35.0

Page 21: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

22

Again: Find age of youngest sailor with age 18, for each rating with at least 2 such sailors

SELECT S.rating, MIN (S.age) AS min-age

FROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) > 1

Now: Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors and with every sailor under 60.

Options:

1. Put 60 age condition into WHERE clause ?

2. Put 60 age condition into HAVING clause ?

Page 22: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

23

Find age of youngest sailor with age >= 18, for each rating with at least 2 such sailors and with every sailor under 60.

rating age

7 45.0

1 33.0

8 55.5

8 25.5

10 35.0

7 35.0

10 16.0

9 35.0

3 25.5

3 63.5

3 25.5

rating age

1 33.0

3 25.5

3 63.5

3 25.5

7 45.0

7 35.0

8 55.5

8 25.5

9 35.0

10 35.0

rating minage

7 35.0

8 25.5

HAVING COUNT (*) > 1 AND EVERY (S.age <=60)

EVERY :

Must hold

for all tuples

in the group.

Page 23: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

24

Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60.

SELECT S.rating, MIN (S.age) AS min-age

FROM Sailors SWHERE S.age >= 18 ???GROUP BY S.ratingHAVING COUNT (*) > 1 ???

sid sname rating age

22 dustin 7 45.0

29 brutus 1 33.0

31 lubber 8 55.5

32 andy 8 25.5

58 rusty 10 35.0

64 horatio 7 35.0

71 zorba 10 16.0

74 horatio 9 35.0

85 art 3 25.5

95 bob 3 63.5

96 frodo 3 25.5

Sailors instance:

Now can check

age<=60 before

making groups !

Page 24: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

25

Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60.

SELECT S.rating, MIN (S.age) AS min-age

FROM Sailors SWHERE S.age >= 18 AND S.age <= 60GROUP BY S.ratingHAVING COUNT (*) > 1

sid sname rating age

22 dustin 7 45.0

29 brutus 1 33.0

31 lubber 8 55.5

32 andy 8 25.5

58 rusty 10 35.0

64 horatio 7 35.0

71 zorba 10 16.0

74 horatio 9 35.0

85 art 3 25.5

95 bob 3 63.5

96 frodo 3 25.5

Sailors instance:

Answer relation:

rating minage

3 25.5

7 35.0

8 25.5

Check age<=60 before making groups.

Page 25: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

26

Join, GroupBy and Nesting.

Page 26: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

27

For each red boat, find the number of reservations for this boat

Sailors : sid, name, …

Boats : bid, color, …

Reserves: sid, bid, day

Page 27: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

28

For each red boat, find the number of reservations for this boat

Grouping over Join of three relations.

SELECT B.bid, COUNT (*) AS s-countFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’GROUP BY B.bid

Page 28: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

29

For each red boat, find the number of reservations for this boat

• Illegal !!!

• Only column in GroupBy can appear in Having clause, unless in aggregate operator of Having clause;

• E.g., HAVING count (B.color = ‘red’ ) > 1;

• E.g., HAVING EVERY (B.color = ‘red’ );

SELECT B.bid, COUNT (*) AS s-countFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid GROUP BY B.bidHAVING (B.color=‘red’)

Q: What if we move B.color=‘red’ from WHERE to HAVING?

Page 29: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

30

Find age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age)

SELECT S.rating, MIN (S.age)FROM Sailors SWHERE S.age > 18GROUP BY S.ratingHAVING 1 < (SELECT COUNT (*)

FROM Sailors S2WHERE S2.rating=S.rating)

Hint : HAVING clause can also contain a subquery.

Page 30: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

31

Find age of the youngest sailor, for each rating with at least 2 sailors with age over 18;

SELECT S.rating, MIN (S.age)FROM Sailors SGROUP BY S.ratingHAVING 1 < (SELECT COUNT (*)

FROM Sailors S2WHERE S2.age > 18 and S.rating=S2.rating)

Page 31: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

32

Aggregation with nesting

Page 32: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

33

Find those ratings for which the average age is the minimum over all ratings

Above query has a problem ! What ?

Aggregate operations cannot be nested!

SELECT S.ratingFROM Sailors SWHERE S.age = (SELECT MIN (AVG (S2.age))

FROM Sailors S2)

Page 33: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

34

Find those ratings for which the average age is the minimum over all ratings

SELECT Temp.rating, Temp.avg-age

FROM (SELECT S.rating, AVG (S.age) AS avg-ageFROM Sailors SGROUP BY S.rating) AS Temp

WHERE Temp.avg-age = (SELECT MIN (Temp.avg-age)FROM Temp)

Correct solution (in SQL/92):

Note: Not all SQL engines use the “AS” syntax for

naming a temporary relation. Then just drop “AS”.

Page 34: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

35

Special Joins

Page 35: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

36

Outer Joins : Special Operators

Left Outer Join;

SELECT S.sid, R.bid

FROM Sailors S LEFT OUTER JOIN Reserves R

WHERE S.sid = R.sid

SELECT S.sid, R.bid

FROM Sailors S NATURAL LEFT OUTER JOIN Reserves R

Sailors rows (left) without a matching Reserves row (right) appear in result, but not vice versa.

Right Outer Join; and Full Outer Join

Page 36: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

37

Null Values

Page 37: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

38

Null Values

Field values in a tuple are sometimes :

Unknown (e.g., a rating has not been assigned) or

Inapplicable (e.g., no maiden-name when male)

SQL provides special value null for such situations.

The presence of null complicates many issues.

Page 38: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

39

Allowing Null Values

SQL special operators IS NULL or IS NOT NULL to check if value is/is not null.

Disallow NULL value:

rating INTEGER NOT NULL

We need a 3-valued logic :

• condition can be true, false or unknown.

Page 39: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

40

Working with NULL values Question :

Predicate (S.rating = 8) can be TRUE or FALSE.

What if S.rating value is a null value?

Comparison operators on NULL return UNKNOWN

Recall: A result is only returned if WHERE clause is TRUE

(not FALE nor not UNKNOWN)

Page 40: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

41

Working with NULL values

Question :

Arithmetic expression (S.rating + 8) usually is an INT.

What if S.rating value is a null value?

Arithmetic operations on NULL return NULL.

Page 41: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

42

Truth table with UNKNOWN

WHERE clause is satisfied only when it evaluates to TRUE.

UNKNOWN AND TRUE = UNKNOWN

UNKNOWN OR TRUE = TRUE

UNKNOWN AND FALSE = FALSE

UNKNOWN OR FALSE = UNKNOWN

UNKNOWN AND UNKNOWN = UNKNOWN

UNKNOWN OR UNKNOWN = UNKNOWN

NOT UNKNOWN = UNKNOWN

Page 42: Chapter 5: Part 2. Group-By & Joins.web.cs.wpi.edu/~cs542/s17/material/Ch5_SQL_QUERY_part2.pdf · SQL: Structured Query Language Chapter 5: Part 2. Group-By & Joins. 2 Running Example

44

Summary: To Recap

SQL easy to understand language, yet very powerful for expressing complex requests

SQL clauses : Nested subqueries

AGGREGATION, GROUPBY and HAVING

Special joins

Handling NULLS

Many alternative ways to write same query:

optimizer required to find efficient evaluation plan.

In practice, users should be aware of how queries are optimized and evaluated for best results.