sql chapters 4, 5 (ed. 7 chaps. 6,7). sql or sequel (structured english query language) based on...

119
SQL Chapters 4, 5 (ed. 7 Chaps. 6,7)

Upload: piers-lawrence

Post on 31-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

SQL Chapters 4, 5

(ed. 7 Chaps. 6,7)

SQL or SEQUEL (Structured English Query Language)

• Based on relational algebra• First called ‘Square’• Developed in 1970's released in early 1980's• Standardized - SQL-92 (SQL2), SQL-3, SQL:1999 (SQL-

99), 2003 (aka SQL: 200n), SQL:2008– current standard - SQL:2011

• 2011 includes better support for temporal databases

• High-level DB language used in ORACLE, etc. created at IBM with System R

•  SQL provides DDL and DML  – DDL - create table, alter table, drop table– DML - Queries in SQL

OLTP

• Will be talking about On Line Transaction Processing OLTP for most of this course

SQL

• Is SQL useful?

• http://www.langpop.com/

SQL

• Basic building block of SQL is the Select Statement

SELECT <attribute list>

FROM <table list >

[WHERE <search conditions>]

Select Statement

• Select - chooses columns (project operation in relational algebra)

• From - combines tables if > 1 table (join operation |X| in relational algebra)

• Where - chooses rows (select operation in relational algebra)– Result of a query is usually considered another

relation– Results may contain duplicate tuples

Queries

• Select specified columns for all rows of a table• Select all columns for some of the rows of a table• Select specified columns for some rows of a table• Select all rows and columns of a table• All of the above for multiple tables

select lname from employee

LNAME

----------

Smith

Wong

Zelaya

Wallace

Narayan

English

Jabbar

Borg

select salary from employee;

SALARY

----------

30000

40000

25000

43000

38000

25000

25000

55000

Differences with relational model

• Relation not a set of tuples - a multiset or bag of tuples

• Therefore, 2 or more tuples may be identical

Queries

• To retrieve all the attribute values of the selected tuples, a * is used:

Select * From Employee

Select Clause

Select <attribute list>– Attribute list can be:

• column names• Constants• arithmetic expressions involving columns, etc.• In Oracle, can also be a select statement (but select can only

return 1 column and 1 row)• * lists all attributes in a table

– To rename an attribute, keyword ‘as’ is optionalSelect lname as last_name

From employee

From clause

From <table list>

• Table list can be: – one or more table names– a select statement itself

Where clause

Where <search conditions>

• You can specify more than one condition in the where clause separated by:– and– or

Where clause

Where <search conditions> ( in relational algebra)

• Search conditions can be:– Comparison predicate: expr § expr2

where § is <, >, <=, etc.

in, between, like, etc.

expr is constant, col, qual.col,

aexpr op aexpr, fn(aexpr), set_fn(aexpr)

expr2 is expr | select statement• Note: expr can be a select statement!

• Retrieve the ssn of the employee whose name is 'Smith‘

SQL> select ssn 2 from employee 3 where lname='Smith';

SSN---------- 123456789

Miscellaneous• SQL is not case sensitive

Select from employee

select FROM EMPLOYEE

• Except when comparing character strings

• All character strings in SQL are surrounded by single quotes

where lname=‘Smith’

• However, tables names in some RDMS (MySQL) are case sensitive

Select statement

• Multiple levels of select nesting are allowed• Like predicate, Between predicate and Null

predicate• Can apply arithmetic operations to numeric

values in SQL

Combining tuples using where clause• To retrieve data that is in more than one table can

use:– a cartesian product X

Select * From Empnames, Dependent

– A join operation |X| • List all info about each department and its manager  

Select * From Empnames, Dependent Where ssn=essn

              

Combining tuples in from clause

• A cartesian product combines each tuple in one table, with all the tuples in the second table (and all columns unless specified in select clause)

• A join combines a tuple from the first table with tuple(s) in the second table if the specified (join) condition is satisfied (again, all columns included unless specified in select clause)

• A join is also referred to as an inner join

Alternative SQL notation for Join

  Select lname, dname    From Employee Join Department on dno=dnumber

where sex=‘M’     

Select lname, relationship    From Employee Join Dependent on ssn=essn

Where dno=5

Where clause

Select * From Employee, Department Where mgrssn=ssn and sex=‘F’

Mgrssn=ssn is a join conditionSex=‘F’ if a select condition

Select lname, relationship    From Employee Join Department on dno=dnumber

Where dno=5

Additional characteristics

• In SQL we can use the same name for 2 or more attributes in different relations. Must qualify the attributes names:

employee.lname department.*

• Use distinct to eliminate duplicate tuples

Sample queries

• Write queries to do the following:– List the lname of all female employees with

supervisor ssn=333445555– List ssn and dname of department employees

work for– List the ssn, lname of all female employees

working in the ‘Research’ department

Sample queries

• Write queries to do the following:– List the lname of all female employees with

supervisor ssn=333445555– List ssn of employee and name of department

they work for– List ssn and dname of department employees

who work for a department located in Bellaire– List the ssn, lname of all employees who earn

more than $30,000 and work in the ‘Research’ department

Predicates

• Predicates evaluate to either T or F. Many of the previous queries can be specified in an alternative form using nesting.

In predicate

• The in predicate tests set membership for a single value at a time.

• In predicate: expr [not] in (select | val {, val})

Select <attribute list> From <table list>

Where expr in (select | val {, val})

In predicate

• Select SSN of employees who work in departments located in Houston

• Select SSN of employees who work in the research department

• The outer query selects an Employee tuple if its dno value is in the result of the nested query.

Quantified predicate

• Quantified predicate compares a single value with a set according to the predicate.

• Quantified predicate: expr § [all | any] (select)

Select <attribute list> From <table list>

Where expr § [all | any] (select)

§ is < > = <> <= >=

Quantified predicate

• Write using quantified predicate:

• Select SSN of employees who work in departments located in Houston

• Select SSN of employees who work in the research department

• Which predicate should be used?= all, = any, > all, etc.?

Quantified predicate

What does the following query?

Select * From Employee Where salary > all (Select salary From Employee Where sex = 'F')• = any equivalent to in

• not in equivalent to <> all

Exists predicate

• The exists predicate tests if a set of rows is non-empty

• Exists predicate: [not] exists (select)

Select <attribute list> From <table list>

Where exists (select)

Exists predicate

• Exists is used to check whether the result of the inner query is empty or not. If the result is NOT empty, then the tuple in the outer query is in the result.

Exists predicate

• Write using exists predicate:

• Select SSN of employees who work in departments located in Houston

• Select SSN of employees who work in the research department

Exists predicate

• Exists is used to check whether the result of the inner query is empty or not. If the result is NOT empty, then the tuple in the outer query is in the result.

• Exists is used to implement difference (‘not in’ used) and intersection.

Exists predicate

• Retrieve all the names of employees who do not work in a department located in Houston.

• Retrieve all the names of employees who do not work in the research department.

• Retrieves the locations of the department Employee works for to see if one of them is Houston. If none exist (not exists is true and the inner query is empty) the Employee tuple is in the result.

select * from employee

where dno in (select dnumber from department

where dname='Research');

select * from employee

where dno =any (select dnumber from department

where dname='Research')

select * from employee

where exists (select * from department

where dname='Research' and dno=dnumber);

Correlated Nested Queries

• Correlated Nested Queries:• If a condition in the where-clause of a nested query

references an attribute of a relation declared in an outer query, the two queries are said to be correlated.

• The result of a correlated nested query is different for each tuple (or combination of tuples) of the relation in the outer query.

• Which takes longer to execute? a correlated nested query or a non-correlated nested query?

Correlated queries

List the name of employees who have dependents with the same birthday as they do.

Can this be written as correlated nested and uncorrelated nested?

Single block queries

• An Expression written using = or IN may almost always be expressed as a single block query.

• Find example where this is not true in your textbook

Join Conditions

• For every project located in 'Stafford' list the project number, the controlling department number and department manager's last name, address and birthdate.

  

• How many join conditions in the above query? • How many selection conditions?

Additional characteristics

• Aliases are used to rename relations:Select E.lname, D. dnameFrom Employee E, Department DWhere E.dno = D.dnumber

NOTE: cannot use ‘as’ keyword here in Oracle

• List all employee names and their supervisor names

Expr as a select statement

Select lname, dno

From employee

Where dno = (select dnumber

from department

where dname = ‘Research’)

– You need to be careful using this. Result must be a single value

• List All Employees and the name of any department if they manage one

• The following won’t give all employees

Select Employee.*, dname

From Employee, Department

Where ssn=mgrssn

Outer Join

•  Outer Join - extension of join and union

• In a regular join, tuples in R1 or R2 that do not have matching tuples in the other relation do not appear in the result.

• Some queries require all tuples in R1 (or R2 or both) to appear in the result

• When no matching tuples are found, nulls are placed for the missing attributes.

Outer Join

• You can use the keywords left, right, full (works in Oracle)

• The following is a left outer join

       Select lname, dname        From Employee Left Outer Join

Department on ssn=mgrssn

• The keyword Outer is optional

LNAME DNAME---------- ---------------Wong ResearchWallace AdministrationBorg HeadquartersJabbarEnglishZelayaNarayanSmith

Outer Join

• You can also use a + to indicate an outer join• The following example indicates a left outer join in

Oracle

         Select lname, dname        From Employee, Department        Where ssn=mgrssn(+)

     Select lname, dname        From Employee Left Outer Join

Department on ssn=mgrssn

Nested queries

• In general we can have several levels of nested queries.

• A reference to an unqualified attribute refers to the relation declared in the inner most nested query.

• An outer query cannot reference an attribute in an inner query (like scope rules in higher level languages).

• A reference to an attribute must be qualified if its name is ambiguous. 

Will this work?

Suppose you want the ssn and dname:

Select ssn, dname

from employee

where dno in (select dnumber

from department)

• Company Database

Join Conditions

• For every project located in 'Stafford' list the project number, the controlling department number and department manager's last name, address and birthdate.

  

• How many join conditions in the above query? • How many selection conditions?

• List employees who do not work on departments located in Houston

More SQL

• Anything missing to answer typical queries?

Aggregate functions

•  Aggregate Functions (set functions, aggregates):

• Include COUNT, SUM, MAX, MIN and AVG

aggr (col)• Find the maximum salary, the minimum salary

and the average salary among all employees.

Select MAX(salary), MIN(salary), AVG(salary) From Employee

Aggregates

• Retrieve the total number of employees in the company

Select COUNT(*) From Employee

• Retrieve the number of employees in the research department.

Select COUNT(*) From Employee, Department Where dno=dnumber and

dname='Research'

Aggregates

• Note that:Select COUNT(*) from Employee

Will give you the same result as:Select COUNT(salary)from Employee

Unless there are nulls - not counted

• To count the number of distinct salaries. Select COUNT(distinct salary) From Employee

Aggregates

• Additional aggregates have been added to RDBMS

• Read the Oracle documentation to see what has been added

• List average salary over all employees

• List lname, salary for employees with salaries > average salary

• List lname, salary for employees with salaries > average salary for their department

Example

SELECT dno, lname, salary

FROM employee e

WHERE salary >

(SELECT AVG(salary)

FROM employee

WHERE e.dno=dno);

What if we get rid of the ‘e’ in e.dno?

• List each department name and average salary

• Difficult to write?

Grouping

• We can apply the aggregate functions to subgroups of tuples in a relation.

• Each subgroup of tuples consists of the set of tuples that have the same value for the grouping attribute(s).

• The aggregate is applied to each subgroup independently.

• SQL has a group-by clause for specifying the grouping attributes.

Group By col {, col}

Grouping

• For each department, retrieve the department number, the total number of employees and their average salary.

Select dno, COUNT(*), AVG(salary) From Employee Group By dno• The tuples are divided into groups with the same dno. • COUNT and AVG are then applied to each group.

• List each department name and average salary

• In the above query, the joining of the two relations is done first, then the grouping and aggregates are applied.

Oracle group by – STANDARD SQL

• Only grouping attribute(s) and aggregate functions can be listed in the SELECT clause.

• Expressions in the GROUP BY clause can contain any columns of the tables or views in the FROM clause, regardless of whether the columns appear in the SELECT clause.

• Some DBMS (e.g. MySQL) do not implement standard SQL

• In this class everyone will use standard SQL

• Write the following SQL queries:– list employee name, their department name

and number, and salary for employees with salary > $32,000.

– list department name, department number and average salary

– list department name for departments with average salary > $32,000.

Grouping

• Now try:• list department name, average salary for

departments with average salary > $32,000.• Will this work?

Select dname, avg(salary)

From department, employee

Where dno=dnumber and avg(salary) > 32000

Group by dname;

//instead these work

select dname, avg(salary)

from department, employee

where dno=dnumber and (select avg(salary) from employee where dno=dnumber) > 32000

group by dname;

select dname, avgsal

from (select dno, avg(salary) as avgsal from employee group by dno), department

where dno=dnumber and avgsal > 32000

• Try to nest select in select clause

• //Does NOT work!! - can't recognize avgsal if inside () or outside ()

select dname, (select avg(salary) as avgsal from employee where dno=dnumber)

from department

where avgsal > 32000;

Having Clause

• Sometimes we want to retrieve those tuples with certain values for the aggregates (Group By).

• The having clause is used to specify a selection condition on a group (rather than individual tuples).

• If a having is specified, you must specify a group by.

Having search_condition

With group by / Having

select dname, avg(salary)

from department, employee

where dno=dnumber

group by dname

having avg(salary) > 32000;

Subselect formal definition

• Select called Subselect Select expr {, expr}

From tablename [alias] {, tablename [alias]}

[Where search_condition]

[Group By col {, col}]

[Having search_condition]

Select

• Select is really:

Subselect {Set_Operation [all] Subselect}  [Order By col [asc | desc] {, col [asc | desc]}]

Order By

• To sort the tuples in a query result based on the values of some attribute:

Order by col_list

• Default is ascending order (asc), but can specify descending order (desc)

Order by

• Retrieve names of the employees and their department, order it by department and within each department order the employees alphabetically by last name.

Select lname, fname, dname

From department, employee

Where dno=dnumber

Order by dname, lname

Subselect formal definition

• Select called Subselect Select expr {, expr}

From tablename [alias] {, tablename [alias]}

[Where search_condition]

[Group By col {, col}]

[Having search_condition]

Select – set operations

• Select is really:

Subselect {Set_Operation [all] Subselect}  [Order By col [asc | desc] {, col [asc | desc]}]

Set Operations

• The Set Operations are:– UNION, MINUS and INTERSECT

• The resulting relations are sets of tuples; duplicate tuples are eliminated.

• Operations apply only to union compatible relations. The two relations must have the same number of attributes and the attributes must be of the same type.

Union

SELECT bdate

FROM employee

UNION

SELECT bdate

FROM dependent

Minus

• Example using minus to list all employees who don’t work on a project:

Select ssn from employee

Minus

Select essn from works_on

Minus

Select employees who do not work on project 20

Select essn from works_on

Minus

Select essn from works_on

Where pno=20;

Alternatives to MinusSelect employees who do not work on project 20

Write using ‘in’ predicateselect distinct essn

from works_on

where essn not in (select essn from works_on where pno=20);

Without minus or ‘in’?

select essn

from works_on

where pno<>20;

1:1, 1:N, N:M relationships

• How about list everyone who does not work for dno=5?

• The difference is an 1:1 or 1:N versus N:M relationship

• What are all the 1:1, 1:N, N:M relationships in the Company DB?

Set operations - Union

• List all project names for projects that is worked on by an employee whose last name is Smith or has a Smith as a manager of the department that controls the project

(Select pname From Project, Works_on, Employee Where pnumber=pno and essn=ssn and

lname='Smith') Union (Select pname

From Project, Department, Employee Where dnum=dnumber and mgrssn=ssn and

lname='Smith')

Example - Queries

• Compute the number of dependents

• List the essn and number of dependents for employee with dependents

• List the essn and number of dependents for all employees

• Compute the average number of dependents over employees with dependents

Example

• Compute the average number of dependents over employees with dependents

• There are several ways to do this, but note that you can do:

aggr(aggr(col))

DDL – Data Definition in SQL

• Used to CREATE, DROP and ALTER the descriptions of the relations of a database

• CREATE TABLE– Specifies a new base relation by giving it a

name, and specifying each of its attributes and their data types

CREATE TABLE name (col1 datatype, col2 datatype, ..)

Data Types

• Data types: (ANSI SQL vs. Oracle)

There are differences between SQL and Oracle, but Oracle will convert the SQL types to its own internal types

– int, smallint, integer converted to NUMBER• Can specify the precision and scale

– Float and real converted to number– Character is char(l) or varchar2(l), varchar(l) still works– Have date, blob, etc.

Constraints

• Constraints are used to specify primary keys, referential integrity constraints, etc.

[CONSTRAINT constr_name] PRIMARY KEY

need to name it if want to alter it later

CONSTRAINT constr_name REFERENCES

table (col)• The table(col) referenced must exist• Constraint names must be unique across database• You can also specify NOT NULL for a column• You can also specify UNIQUE for a column

Create table – In line constraint definition

Create table Project1 (pname varchar2(9)

CONSTRAINT pk PRIMARY KEY,

pnumber int not null,

plocation varchar2(15),

dnum int CONSTRAINT fk

REFERENCES Department (dnumber),

phead int);

Create table

• To create a table with a composite primary key must use out of line definition:

Create table Works_on (essn char(9), pno

int, hours number(4,1),

PRIMARY KEY (essn, pno));

Oracle Specifics

• A foreign key may also have more than one column so you need to specify an out of line definition

• There are differences with the in line– When you specify a foreign key constraint out of line,

you must specify the FOREIGN KEY keywords and one or more columns.

– When you specify a foreign key constraint inline, you need only the REFERENCES clause.

Create table – out of line constraint definition

Create table Project2 (pname varchar2(9),

pnumber int not null,

plocation varchar(15),

dnum int, phead int,

PRIMARY KEY (pname),

CONSTRAINT fk FOREIGN KEY (dnum)

REFERENCES Department (dnumber));

DROP TABLE

• Used to remove a relation and its definition

• The relation can no longer be used in queries, updates or any other commands since its description no longer exists

Drop table dependent;

ALTER TABLE

• To alter the definition of a table in the following ways:– to add a column– to add an integrity constraint– to redefine a column (datatype, size, default

value) – there are some limits to this– to enable, disable or drop an integrity

constraint or trigger– other changes relate to storage, etc.

Alter table - Oracle

• The table you modify must have been created by you, or you must have the ALTER privilege on the table.

• If used to add an attribute to one of the base relations, the new attribute will have NULLS in all the tuples of the relation after command is executed; hence, NOT NULL constraint is not allowed for such an attribute.

Alter table employee add job varchar(12);

• The database users must still enter a value for the new attribute job for each employee tuple using the update command. Oracle alter

How to create a table when?

CONSTRAINT constr_name REFERENCES

table (col)• The table(col) referenced must exist

Department mgrssn references employee ssn with mgrssn

Employee dno references department dnumber

Alter is useful when …

– You have two tables that reference each other– Table must be defined before referenced, so how to

define?:• department mgrssn references employee ssn with mgrssn• Employee dno references department dnumber

– Create employee table without referential constraint for dno

– Create department table with reference to mgrssn– Alter employee and add dno referential constraint– Or when you specify create table you can disable the

references, then enable them later

Updates (DML)

• Insert, delete and update– INSERT

Insert into table_name ( [(col1 {, colj})] values (val1 {, valj}) | (col1 {, colj}) subselect )

– add a single tuple– attribute values must be in the same order

as the CREATE table

Insert

Insert into Employee values ('Richard', 'K', 'Marini', '654298343', '30-DEC-52', '98 Oak

Forest, Katy, TX', 'M', 37000, '987654321, 4);

• Use null for null values in ORACLE

Insert

• Alternative form - specify attributes and leave out the attributes that are null

Insert into Employee (fname, lname, ssn) values ('Richard', 'Marini', '654298343');

 

• Constraints specified in DDL are enforced when updates are applied.

Insert

• To insert multiple tuples from existing table:

create table ename (name varchar(15));Table created.

insert into ename (select lname from employee);8 rows created.

select * from ename;NAME---------------SmithWongZelayaWallaceNarayanEnglishJabbarBorg

Delete

Delete from table_name [search_condition]

• If include a where clause to select, tuples are deleted from table one at a time

• The number of tuples deleted depends on the where clause

• If no where clause included all tuples are deleted - the table is empty

DeleteExamples: Delete From Employee Where dno = 5;

Delete From Employee Where ssn = '123456789‘;

Delete from Employee Where dno in (Select dnumber From Department Where dname = 'Research');

Delete from Employee;

Update

• Modifies values of one or more tuples• Where clause used to select tuples• Set clause specified the attribute and value

(new)• Only modifies tuples in one relation at a time

Update <table name>

Set attribute = value {, attribute = value}

Where <search conditions>

Update

Examples:

Update Project Set plocation = 'Bellaire', dnum = 5 Where pnumber = 10

Update Employee Set salary = salary * 1.5 Where dno = (Select dnumber From department Where dname = ‘Headquarters')

Logical order of Evaluation

Select pnumber, pname, COUNT(*) From Project, Works_on Where pnumber =pno and hours > 5 Group By pnumber, pname Having COUNT(*) > 2

Order by pname

– Apply Cartesian product to tables, – Join and select conditions– then group by – Apply the select clause, compute any aggregate functions– Apply any Having conditions – order the result for the display.

Order of evaluation

• Actual order of evaluation?

– Which is?

– More efficient to apply join condition during Cartesian product (join operation)

– How can a DBMS implement a join?

Implementations of Join

• 3 different ways – what are they?

Equi-Join Algorithms |X|

1. nested (inner-outer) loop – for each record t in R retrieve every record s

from S and test if satisfy join condition– If match, combine records and write to

output file – CPU time: 

n*m

Equi-join2. Sort-merge join

– records of R and S ordered by value of join attribute

– both files scanned in order, need to scan each file only once • if duplicate values, have an inner loop and

must back up the pointer – When match, combine records and write to output

file • CPU time:

n+m plus time to sort (nlogn)

Equi-join

3. Hash join  – use same hashing function on join attributes

of both files R and S– hash smaller file first (hopefully, all fits in

memory else hash to a file)– single hash of second file, – if match combine record with matching

records of first file in output file – CPU time: (assume good hash function)

n+m but no sorting

Metadata

• To get information about a specific table:Describe employee

Lists all attributes and type• To get information about all user tables, can

query user_tables Select table_name from user_tables

System tables

• user_tables• user_tab_columns• user_constraints• user_cons_columns• user_triggers• user_views• user_tab_privs• user_tab_privs_made (lists privileges granted to

others)• user_col_privs

Standard SQL

• What is the deal with MySQL vs. standard SQL?– Oracle has standard SQL– MySQL does not

http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html

Example Queries

• Suppose you have created a tableQtrSales (ID, Q1, Q2, Q3, Q4)

• SQL to compute the total sales for each quarter?

• SQL to compute the total sales for each ID?

//instead these work

select dname, avg(salary)

from department, employee

where dno=dnumber and (select avg(salary) from employee where dno=dnumber) > 32000

group by dname;

select dname, avgsal

from (select dno, avg(salary) as avgsal from employee group by dno), department

where dno=dnumber and avgsal > 32000

• Try to nest select in select clause

• //Does NOT work!! - can't recognize avgsal if inside () or outside ()

select dname, (select avg(salary) as avgsal from employee where dno=dnumber)

from department

where avgsal > 32000;