sql unit 5 aggregation, group by, and having kirk scott 1

61
SQL Unit 5 Aggregation, GROUP BY, and HAVING Kirk Scott 1

Upload: shanna-pierce

Post on 28-Dec-2015

232 views

Category:

Documents


3 download

TRANSCRIPT

1

SQL Unit 5Aggregation, GROUP BY, and HAVING

Kirk Scott

2

• 5.1 Grouping By One Field• 5.2 Grouping By More than One Field• 5.3 GROUP BY with HAVING• 5.4 More on Nulls

3

5.1 Grouping By One Field

4

• 1. Recall that the term aggregation referred to built-in functions like these:

• COUNT, SUM, AVG, MAX, MIN, etc. • The results of such a function are based on the

contents of more than one row in a table.

5

• A simple example of the use of such a function would be:

• • SELECT SUM(salesprice)• FROM Carsale• • This would find the sum of the salesprices of

all of the cars listed in the Carsale table.

6

• 2. Remember also that the records in the Carsale table include the spno, and it is possible to write a query that orders the results of a query by that field:

• • SELECT *• FROM Carsale• ORDER BY spno

7

• 3. What if you would like the subtotals of the sums of the salesprices for the cars sold by each salesperson?

• This would involve finding a SUM, and it would also depend on the spno

• Both of these fields are in the Carsale table. • Here is a query that accomplishes this:• • SELECT spno, SUM(salesprice)• FROM Carsale• GROUP BY spno

8

• This query will give the subtotal for each spno in the Carsale table.

• There will be only one row for each spno in the results of the query.

• In a sense when you GROUP BY, it is like having the keyword DISTINCT in the query.

9

• The aggregate functions ignore nulls, but GROUP BY does not

• If any sales records had null spno's, the query results would also include a row where the sum of the salesprices for such records appeared.

• However, in calculating the sums, null values for salesprice would still be ignored

10

• The keyword GROUP has this in common with the keyword ORDER:

• The results of this query will be sorted by the spno values.

11

• 4. Here is another example, using COUNT, where the function is applied to * rather than to a single field in the table.

• The results will be what you would expect• —the count of the number of car sales by each

salesperson:• • SELECT spno, COUNT(*)• FROM Carsale• GROUP BY spno

12

• Recall that the meaning of COUNT(*) is to count all of the records where any of the fields are non-null.

• None of the records can be all null, so this counts all records.

• GROUP BY will include in the results a group that counts how many records had a null spno, if there were any such records.

13

• 5. It's not necessary to include the GROUP BY field in the query results.

• These results may not be very useful, but this query is syntactically OK:

• • SELECT SUM(salesprice)• FROM Carsale• GROUP BY spno

14

• On the other hand, there are limitations on what fields can be included in the results of a GROUP BY query.

• A query like this is wrong:• • SELECT spno, custno, SUM(salesprice)• FROM Carsale• GROUP BY spno

15

• The reason is simple. • By definition, there will only be one row per

spno in the results of the query. • However, it is possible that there would be

more than one custno per salesperson.

16

• It would not be possible to show the multiple custno's belonging to a single spno, so this is not allowed.

• It's true that in some cases there may only be one custno for a given spno, but even so, the syntax will not support exceptions like these.

17

• The bottom line is that in a GROUP BY query, the SELECT can include at most the GROUP BY field and the field that the aggregate is calculated on.

18

• 6. It is possible to use GROUP BY and ORDER BY together in a single query.

• This is a simple, practical example. • It illustrates the fact that you can order the results by the

aggregate if you want to.• Recall that the default order is by the GROUP BY field.• • SELECT spno, COUNT(*)• FROM Carsale• GROUP BY spno• ORDER BY COUNT(*)

19

5.2 Grouping By More than One Field

20

• 1. It is also possible to GROUP BY more than one field at a time in a query.

• For example:• • SELECT make, model, SUM(stickerprice)• FROM Car• GROUP BY make, model

21

• This query will give the sum of the stickerprices for every possible combination of make and model.

• Each of these combinations will appear only once in the results.

• Again, the effect is similar to having the keyword DISTINCT in a query.

22

• The results would also include rows for the three cases where either the make, model, or both fields were null in the original records in the Car table.

• No fields other than make and model (and the aggregate) could be included in the select clause.

• Also, both make and model are optional in the SELECT, although in most cases the query results would probably be more useful if they were included.

23

• 2. It is again useful to compare the GROUP BY query with the analogous ORDER BY query:

• • SELECT make, model• FROM Car• ORDER BY make, model

24

• In this query the primary sort key is make and the secondary sort key is model.

• The results of the query will show every combination of make and model that occurs in the Car table sorted first by make, and within make by model.

• The corresponding GROUP BY query will show the sums of the stickerprices for every combination of make and model in the table and the results will be given in the same order as the ORDER BY query.

25

• 3. Observe that it would also be possible to write queries where the order that the fields are selected is changed.

• The sums for the various combinations of make and model wouldn't change, but the orders of the columns and rows in the results would change.

• The first example would put the model column before the make column, but the sort order of the rows would be the same as in the previous example.

26

• It is conceivable that someone might want to write a query like this:

• • SELECT model, make, SUM(stickerprice)• FROM Car• GROUP BY make, model

27

• The second example would put the make column first and the model column second, but the sort order has been changed to sort first by model and than by make.

• It seems unlikely that anyone would write the query in this way intentionally, but it is possible that all they're interested in is the sum for each combination of make and model and the sort order doesn't make a difference.

28

• In any case, it's syntactically OK:• • SELECT make, model, SUM(stickerprice)• FROM Car• GROUP BY model, make

29

• 4. It bears repeating that including a GROUP BY field in the SELECT is optional.

• For example, the following example would be OK. • The results will only show the make and sum in each

row, but there will be a row for each combination of make and model:

• • SELECT make, SUM(stickerprice)• FROM Car• GROUP BY make, model

30

• It also bears repeating that it is not possible to include in the SELECT any fields except for the aggregate field and the fields in the GROUP BY.

• This is because there may be multiple values for the additional field for each combination of the GROUP BY fields.

31

• For example, this query is wrong:• • SELECT make, model, year, SUM(stickerprice)• FROM Car• GROUP BY make, model

32

• 5. It is always possible to specify an order for the results of a query in addition to doing GROUP BY.

• This example is kind of silly, because it simply accomplishes what could be accomplished by putting the fields in the GROUP BY in the other order.

• But it does illustrate how the syntax for ORDER BY will override the ordering that otherwise would be used by GROUP BY:

• • SELECT make, model, SUM(stickerprice)• FROM Car• GROUP BY model, make• ORDER BY make, model

33

• This example illustrates a more practical use of the syntax.

• Notice again that it's possible to use the aggregate function in the ORDER BY:

• • SELECT make, model, SUM(stickerprice)• FROM Car• GROUP BY make, model• ORDER BY SUM(stickerprice) DESC

34

5.3 GROUP BY with HAVING

35

• 1. In a simple query, a WHERE clause causes the SELECT to pick out only certain sets of records in a table based on a condition on the value of an individual field.

• This is known as a selection or a restriction. • It might also be called a refinement of the query's

results. • A query with a WHERE clause will potentially give as its

results a subset of the results that would be returned by the same query without the WHERE clause.

36

• In a query with GROUP BY, the HAVING clause can be used to achieve similar results as the WHERE clause in a simple query.

• In other words, it can be used to restrict the results based on the results of the aggregate function in the query.

37

• For example, this query will show the spno's and the sums of the salesprices of cars that they sold, but only for those salespeople who sold a total of at least 50000 dollars worth of cars overall:

• • SELECT spno, SUM(salesprice)• FROM Carsale• GROUP BY spno HAVING SUM(salesprice) >=

50000

38

• Here is another straightforward example which will find the salespeople and the counts of the numbers of cars they sold, if they sold more than 4 cars:

• • SELECT spno, COUNT(*)• FROM Carsale• GROUP BY spno HAVING COUNT(*) > 4

39

• 2. For better or worse, the HAVING clause can also be applied to the GROUP BY field or fields.

• So, for example, this query is possible. • It will find the sum of the stickerprices for all

of the Chevrolets and only the Chevrolets.

40

• There will be only one row in the results:• • SELECT make, SUM(stickerprice)• FROM Car• GROUP BY make HAVING make = 'Chevrolet'

41

• There is nothing wrong with the previous example, but the following alternative may be preferable.

• It is possible to have both WHERE and GROUP BY in the same query, and it might be helpful to use WHERE instead of HAVING whenever that is possible.

42

• Here is a query that has the same results as the previous one:

• • SELECT make, SUM(stickerprice)• FROM Car• WHERE make = 'Chevrolet'• GROUP BY make

43

• Keep in mind that it is possible to do inequalities on text fields.

• This query would find the sums of the stickerprices for all makes whose names appear after Chevrolet in alphabetical order:

• • SELECT make, SUM(stickerprice)• FROM Car• WHERE make > 'Chevrolet'• GROUP BY make

44

• 3. It is possible to have both a condition on a GROUP BY field (a non-aggregate field) and the aggregate field in a query.

• Again, it may be helpful to keep them straight by using WHERE for the condition on the GROUP BY field.

• You have to use HAVING on the aggregate field in any case.

45

• So, for example, this query will find the makes and the sums of their stickerprices for makes that appear after Chevrolet in alphabetical order, and whose stickerprice sums are greater than or equal to 50000.

• Notice that even though the word "and" appears in the verbal description, the keyword AND does not belong in the syntax of a correct query implementing this:

46

• SELECT make, SUM(stickerprice)• FROM Car• WHERE make > 'Chevrolet'• GROUP BY make HAVING SUM(stickerprice) >=

50000

47

• 4. All of the examples so far have concentrated on conditions on the group by fields or the aggregate.

• As usual, most things in SQL mix and match.• It is also possible to have a condition on any

field or fields.

48

• For example:

• SELECT make, SUM(stickerprice)• FROM Car• WHERE make > 'Chevrolet‘• AND year > 2005• GROUP BY make HAVING SUM(stickerprice) >=

50000

49

• 5. The ability to mix and match extends to joins.

• It is possible to have a join query where the grouping is done on the field of one table, while the aggregate is done on a field of the other table.

• Such a query could also include the keyword HAVING as well as other elements of SQL queries unrelated to grouping.

50

• This last example dispenses with HAVING and where clauses except for the joining condition in order to clearly illustrate doing a join and GROUP BY together.

• • SELECT commrate, SUM(salesprice)• FROM Salesperson, Carsale• WHERE Salesperson.spno = Carsale.spno• GROUP BY commrate

51

5.4 More on Nulls

52

1. For the purposes of the following discussion, here are the contents of the Salesperson table:

Salesperson

spno name addr city state phone bossno commrate

111 Fred Flintstone 123 C Street Anchorage AK 723-6666 333 0.03

222 Wile E. Coyote 456 Karluk Anchorage AK 724-7777 333

333 Bugs Bunny 789 Otis Anchorage AK 725-8888 0.05

444 Rocky the Squirrel 345 Tudor Anchorage AK 727-3333 333 0.05

555 Yosemite Sam 678 Muldoon Anchorage AK 525-2222 333 0.03

53

• The Salesperson table is the table in the example database which includes nulls.

• Recall that the aggregate functions, COUNT, SUM, AVG, MAX, MIN and so on, ignore nulls.

• The following query will return a result of 4:• • SELECT COUNT(commrate)• FROM Salesperson

54

• The following query will return an average calculated by dividing by 4 rather than 5:

• • SELECT AVG(commrate)• FROM Salesperson

55

• If you want to make sure that nulls are included, you have to use the NZ function.

• For sums, if nulls are treated as zero, this won't make a difference, but for counts and averages, it will.

• In the query below the average will be calculated by dividing by 5 rather than 4:

• • SELECT AVG(NZ(commrate, 0))• FROM Salesperson

56

• 2. The thing to remember is that GROUP BY will include a group for null values even though the aggregate functions ignore nulls.

• Focus on the last two columns in the Salesperson table and consider this query:

• • SELECT bossno, AVG(commrate)• FROM Salesperson• GROUP BY bossno

57

This is what the results look like:

Query1

bossno Expr1001

0.05

333 3.66666666666667E-02

58

• There is nothing surprising here.• GROUP BY returns a row for the case where

bossno is null. • There is only one record that meets this

condition, and the commrate for that salesperson is 0.05.

• That means that the average is also 0.05.

59

• GROUP BY also returns a row for the case where bossno equals 333, which happens to be a group where 4 records in the Salesperson table have that value.

• 3 of those 4 records in the Salesperson table have non-null commrates.

• The average for them is calculated as (0.03 + 0.05 + 0.03) / 3, giving the value shown above.

• The null value is ignored both in the sum in the numerator and in the count in the denominator.

60

• Other examples could be devised. • The point simply is that you need to keep this

in mind:• GROUP BY will return rows for those cases

where the GROUP BY fields are null.• However, the aggregate functions still do

ignore nulls in the aggregate fiels, unless you include NZ in the expression.

61

The End