chapters 3-6 relational data models, relational constraints, and relational algebra
DESCRIPTION
Chapters 3-6 Relational Data Models, Relational Constraints, and Relational Algebra. Flat file: A two dimensional array of attributes or data items ProductX 1 Bellaire 5 ProductY 2 Sugarland 5 - PowerPoint PPT PresentationTRANSCRIPT
CHAPTERS 3-6RELATIONAL DATA MODELS, RELATIONAL CONSTRAINTS, AND RELATIONAL ALGEBRA
Chapters 5-8
1
Flat file: A two dimensional array of attributes or data items
ProductX 1 Bellaire 5ProductY 2 Sugarland 5ProductZ 3 Houston 5Computerization 10 Stafford 4Reorganization 20 Houston 1Newbenefits 30 Stafford 4
Database Management Systems (DBMS): A generalized software system that is used to create, manage, and protect data bases
Chapters 5-8
2
Chapters 5-8
3
Attribute: A name characteristic or property of an entity
= column header
Entity: A “thing” in the real world with an independent
existence physical existence: person, student, car
Domain - The valid set of atomic value for an attribute in a relation
e.g. SSN set of 9 digits GPA: 0<= GPA <= 4.0
Atomic - each value in the domain is indivisible
Name (Fname, Minit, Lname) – not atomic
Fname -- atomic Minit -- atomic Lname -- atomic
4
Chapters 5-8
RELATIONAL MODEL CONCEPTS
A Relation is a mathematical concept based on the ideas of sets
The model was first proposed by Dr. E.F. Codd of IBM Research in 1970 in the following paper:"A Relational Model for Large Shared Data
Banks," Communications of the ACM, June 1970
The above paper caused a major revolution in the field of database management and earned Dr. Codd the coveted ACM Turing Award 5
Chapters 5-8
INFORMAL DEFINITIONS
Informally, a relation looks like a table of values.
A relation typically contains a set of rows.
The data elements in each row represent certain facts that correspond to a real-world entity or relationship In the formal model, rows are called tuples
Each column has a column header that gives an indication of the meaning of the data items in that column In the formal model, the column header is called
an attribute name (or just attribute) 6
Chapters 5-8
FORMAL DEFINITIONS - SCHEMA
The Schema (or description) of a Relation: Denoted by R(A1, A2, .....An) R is the name of the relation The attributes of the relation are A1, A2, ..., An
Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#) CUSTOMER is the relation name Defined over the four attributes: Cust-id, Cust-
name, Address, Phone# Each attribute has a domain or a set of valid
values. For example, the domain of Cust-id is 6 digit
numbers.7
Chapters 5-8
FORMAL DEFINITIONS - TUPLE
A tuple is an ordered set of values (enclosed in angled brackets ‘< … >’)
Each value is derived from an appropriate domain.
A row in the CUSTOMER relation is a 4-tuple and would consist of four values, for example: <632895, "John Smith", "101 Main St. Atlanta,
GA 30332", "(404) 894-2000"> This is called a 4-tuple as it has 4 values A tuple (row) in the CUSTOMER relation.
A relation is a set of such tuples (rows) 8
Chapters 5-8
FORMAL DEFINITIONS - DOMAIN A domain has a logical definition:
Example: “USA_phone_numbers” are the set of 10 digit phone numbers valid in the U.S.
A domain also has a data-type or a format defined for it. The USA_phone_numbers may have a format: (ddd)ddd-
dddd where each d is a decimal digit. Dates have various formats such as year, month, date
formatted as yyyy-mm-dd, or as dd mm,yyyy etc.
The attribute name designates the role played by a domain in a relation: Used to interpret the meaning of the data elements
corresponding to that attribute Example: The domain Date may be used to define two
attributes named “Invoice-date” and “Payment-date” with different meanings
9
Chapters 5-8
FORMAL DEFINITIONS - STATE The relation state is a subset of the
Cartesian product of the domains of its attributeseach domain contains the set of all
possible values the attribute can take. Example: attribute Cust-name is defined over
the domain of character strings of maximum length 25dom(Cust-name) is varchar(25)
The role these strings play in the CUSTOMER relation is that of the name of a customer.
10
Chapters 5-8
FORMAL DEFINITIONS - SUMMARY Formally,
Given R(A1, A2, .........., An) r(R) dom (A1) X dom (A2) X ....X dom(An)
R(A1, A2, …, An) is the schema of the relation R is the name of the relation A1, A2, …, An are the attributes of the relation r(R): a specific state (or "value" or “population”) of
relation R – this is a set of tuples (rows) r(R) = {t1, t2, …, tn} where each ti is an n-tuple ti = <v1, v2, …, vn> where each vj element-of
dom(Aj)
11
Chapters 5-8
FORMAL DEFINITIONS - EXAMPLE Let R(A1, A2) be a relation schema:
Let dom(A1) = {0,1} Let dom(A2) = {a,b,c}
Then: dom(A1) X dom(A2) is all possible combinations:{<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> }
The relation state r(R) dom(A1) X dom(A2) For example: r(R) could be {<0,a> , <0,b> , <1,c> }
this is one possible state (or “population” or “extension”) r of the relation R, defined over A1 and A2.
It has three 2-tuples: <0,a> , <0,b> , <1,c> 12
Chapters 5-8
DEFINITION SUMMARY
13
Chapters 5-8
Informal Terms Formal Terms
Table Relation
Column Header Attribute
All possible Column Values
Domain
Row Tuple
Table Definition Schema of a Relation
Populated Table State of the Relation
SUPER KEY: AN ATTRIBUTE OR A SET OF ATTRIBUTES THAT IDENTIFIES AN ENTITY UNIQUELY (MAY NOT BE MINIMAL SET) SSN SSN, NAME SSN, NAME, MAJOR
14
Chapters 5-8
CANDIDATE KEY: A SUPER KEY SUCH THAT NO PROPER SUBSET OF ITS ATTRIBUTES IS ITSELF A SUPER KEY. SO CANDIDATE KEYS MUST HAVE A MINIMAL IDENTIFIER.
STUIDSSN
PRIMARY KEY: THE CANDIDATE KEY THAT IS CHOSENOR THE CANDIDATE KEY THAT IS USED TO IDENTIFY TUPLES IN A RELATION
-- UNIQUE, MUST EXIST ALTERNATE KEY: A CANDIDATE KEY IN A RELATION THAT IS NOT SELECTEDE.G. IF PRIMARY KEY IS SSN THEN STUID IS A ALTERNATE KEY 15
Chapters 5-8
Chapters 5-8 16
CONCATENATED (COMPOSITE) KEY: A PRIMARY KEY THAT IS COMPRISED OF TWO OR MORE ATTRIBUTES OR DATA ITEMS
G RADE_REPORT(STUID, COURSE#, GRADE)
FOREIGN KEY: A NON-KEY ATTRIBUTE IN ONE RELATION THAT APPEARS AS THE PRIMARY KEY (OR PART OF THE KEY) IN ANOTHER RELATION
EMPLOYEE(SSN, FNAME, MINIT, DNO)
DEPARTMENT(DNUMBER, DNAME, MANAGER)
17
Chapters 5-8
SECONDARY KEY: A FIELD THAT CAN HAVE DUPLICATE VALUES, AND THAT CAN BE USED AS SEARCH PATH BY THE USERS
18
Chapters 5-8
Chapters 5-8
19
Chapters 5-8
20
Referential Integrity Constraints for COMPANY database
Chapters 5-8
21
RELATIONAL ALGEBRA OVERVIEW Relational algebra is the basic set of operations
for the relational model These operations enable a user to specify
basic retrieval requests (or queries) The result of an operation is a new relation,
which may have been formed from one or more input relations This property makes the algebra “closed” (all
objects in relational algebra are relations)
22
Chapters 5-8
RELATIONAL ALGEBRA OVERVIEW (CONTINUED) The algebra operations thus produce new
relations These can be further manipulated using
operations of the same algebra A sequence of relational algebra operations
forms a relational algebra expressionThe result of a relational algebra
expression is also a relation that represents the result of a database query (or retrieval request)
23
Chapters 5-8
RELATIONAL ALGEBRA OVERVIEW Relational Algebra consists of several groups of
operations Unary Relational Operations
SELECT (symbol: (sigma)) PROJECT (symbol: (pi))
Relational Algebra Operations From Set Theory UNION ( ), INTERSECTION ( ), DIFFERENCE (or MINUS,
– ) CARTESIAN PRODUCT ( x )
Binary Relational Operations JOIN (several variations of JOIN exist) DIVISION
Additional Relational Operations OUTER JOINS, OUTER UNION AGGREGATE FUNCTIONS (These compute summary of
information: for example, SUM, COUNT, AVG, MIN, MAX)
24
Chapters 5-8
Unary Relational Operations: SELECT
The SELECT operation (denoted by (sigma)) is used to select a subset of the tuples from a relation based on a selection condition.The selection condition acts as a filterKeeps only those tuples that satisfy the
qualifying conditionTuples satisfying the condition are selected
whereas the other tuples are discarded (filtered out)
Examples: Select the EMPLOYEE tuples whose
department number is 4: DNO = 4 (EMPLOYEE)
Select the employee tuples whose salary is greater than $30,000:
SALARY > 30,000 (EMPLOYEE)25
Chapters 5-8
UNARY RELATIONAL OPERATIONS: SELECT
In general, the select operation is denoted by <selection condition>(R) where the symbol (sigma) is used to denote the select
operator the selection condition is a Boolean (conditional)
expression specified on the attributes of relation R tuples that make the condition true are selected
appear in the result of the operation tuples that make the condition false are filtered out
discarded from the result of the operation26
Chapters 5-8
UNARY RELATIONAL OPERATIONS: SELECT (CONTD.) SELECT Operation Properties
The SELECT operation <selection condition>(R) produces a relation S that has the same schema (same attributes) as R
SELECT is commutative: <condition1>( < condition2> (R)) = <condition2> ( < condition1> (R))
Because of commutativity property, a cascade (sequence) of SELECT operations may be applied in any order: <cond1>(<cond2> (<cond3> (R)) = <cond2> (<cond3> (<cond1> (
R))) A cascade of SELECT operations may be replaced
by a single selection with a conjunction of all the conditions: <cond1>(< cond2> (<cond3>(R)) = <cond1> AND < cond2> AND <
cond3>(R))) The number of tuples in the result of a SELECT
is less than (or equal to) the number of tuples in the input relation R
27
Chapters 5-8
Select Works on single table and takes rows that meet a specified condition, copy them into a new table
(Table name) Condition(s)
SQL (Structured Query language)
SELECT * FROM (table name) WHERE condition 1 AND condition 2 AND condition 3…
28
Chapters 5-8
29
Chapters 5-8
Table
Condition(s)
Find employees who work for department number 5.
employee DNO = 5
SQL:SELECT * FROM employeeWHERE dno = 5;
30
Chapters 5-8
Chapters 5-8
31
Chapters 5-8
32
Employee
DNO=5
Query tree
33
Chapters 5-8
s(DNO=4 AND SALARY>25000) OR (DNO=5 AND SALARY>30000)(EMPLOYEE)
---------------------------------------------------------s<cond1>(s<cond2>(R)) = s<cond2>(s<cond1>(R))
s<cond1>(s<cond2>(. . .(s<condn> (R)) . . .)) = s<cond1> AND <cond2> AND . . .
AND <condn>(R)
Project Operates on a single table,
produces a vertical subset of the table, extract the values of specified columns
eliminate duplicate rows place the value in a new table
(table name)
column1, column2, column3, …
34
Chapters 5-8
SQL: SELECT column1, column2, column3, … FROM (table name)
35
Chapters 5-8
Chapters 5-8
36
Table
column(s)
E.g. Show the names of all employees
employee fname, minit, lname
SELECT fname, minit, lname FROM employee;
37
Chapters 5-8
Chapters 5-8
38
Chapters 5-8
39
Employee
fname,minit,lname
Select & project
Show the names of all employees who work for department number 5
( employee)
fname, minit, lname dno = 5
SELECT fname, minit, lname FROM employee WHERE dno = 5;
40
Chapters 5-8
Chapters 5-8
41
Chapters 5-8
42
Employee
fname,minit,lname
DNO = 5
EXAMPLES OF APPLYING SELECT AND PROJECT OPERATIONS
43
Chapters 5-8
PRODUCT (or Cartesian product) R1 x R2
R1 X R2 is a table where width is the width of R1 plus the width of R2 and whose columns are the columns of R1 followed by the columns of R2
If R1 has X rows and M columnsR2 has Y rows and N columns
R1 X R2 = X * Y rows and M + N columns
44
Chapters 5-8
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85
Chapters 5-8 45
Student X Credit_HoursID Fname Lname Stuid Hours
101 Jim Smith 101 60101 Jim Smith 102 85102 Tim Brown 101 60102 Tim Brown 102 85103 Babara Houston 101 60103 Babara Houston 102 85
Cartesian Product
QUERY TREE FOR CARTESIAN PRODUCT
46
Chapters 5-8
Table1 Table2
X
47
Chapters 5-8
Example of Query Tree
Theta JoinThe result of performing a SELECT operation using a comparison operator theta (=,<, <=, >, <=, <>) on the product
48
Chapters 5-8
Credit_HoursSTUID Hours
101 60102 85
Chapters 5-8 49
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Student X Credit_Hours (ID > STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60101 Jim Smith 102 85102 Tim Brown 101 60102 Tim Brown 102 85103 Babara Houston 101 60103 Babara Houston 102 85
Theta Join (>)
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85103 50
Student X Credit_Hours (ID > STUID)ID Fname Lname Stuid Hours
102 Tim Brown 101 60103 Babara Houston 101 60
Chapters 5-8 50
Theta Join (ID>STUID)
QUERY TREE FOR THETA JOIN
51
Chapters 5-8
Student Credit_Hours
X ID > STUID
Equijoin Product with “theta” is equality
52
Chapters 5-8
Chapters 5-8 53
Student X Credit_Hours (ID = STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60101 Jim Smith 102 85102 Tim Brown 101 60102 Tim Brown 102 85103 Babara Houston 101 60103 Babara Houston 102 85
Equijoin
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85
Chapters 5-8 54
Student X Credit_Hours (ID = STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60102 Tim Brown 102 85
Equijoin
QUERY TREE FOR EQUIJOIN
55
Chapters 5-8
Student Credit_Hours
X ID = STUID
Natural Join |X| Is an equijoin which the repeated column is
eliminated
Usually join performs over column with the same names
56
Chapters 5-8
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit HoursID Hours
101 60102 85103 50
Chapters 5-8 57
Student X Credit_Hours (ID = STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60101 Jim Smith 102 85101 Jim Smith 103 50102 Tim Brown 101 60102 Tim Brown 102 85102 Tim Brown 103 50103 Babara Houston 101 60103 Babara Houston 102 85103 Babara Houston 103 50
Remove
Equi-join
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85103 50
Chapters 5-8 58
Student X Credit_Hours (ID = STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60102 Tim Brown 102 85103 Babara Houston 103 50
Remove this column
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85103 50
Chapters 5-8 59
Student |X| Credit_Hours ID Fname Lname Hours
101 Jim Smith 60102 Tim Brown 85103 Babara Houston 50
QUERY TREE FOR NATURAL JOIN
60
Chapters 5-8
Student Credit_Hours
|X|
Semi-join: If R1 and R2 are tables
Semijoin of R1 and R2 is natural join of R1 and R2 and then projecting the result into the attributes of A
Semijoin is not cumulative
61
Chapters 5-8
Create tablescreate table student1
(id char(3) primary key, fname char(10), lname char(10));
insert into student1 values(‘101’,’Jim’,’Smith’);insert into student1 values(‘102’,’Tim’,’Brown’);insert into student1 values(‘103’,’Babara’,’Houston’);
----------------- ---- create table credit_hours (stuid char(3) primary key, hours number(3));
insert into credit_hours values(101,60);insert into credit_hours values(102,85);
62
Chapters 5-8
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85
Chapters 5-8 63
Student |X Credit_Hours ID Fname Lname
101 Jim Smith102 Tim Brown
Left Semi-Join
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85104 100
Chapters 5-8 64
Right Semi-Join
Student X| Credit_Hours ID Hours
101 60102 85
Outer Join:
Is an extension of a THETA JOIN, an EQUIJOIN, or a NATURAL JOIN
An outer join consists of all rows that appear in the usual theta join, plus an additional row for each of the tuples from the original tables that do not participate in the theta join.
In those rows that are unmatched original tuples, extend it by assigning null values to the other attributes.
65
Chapters 5-8
Left outer join unmatched rows from the first (left) table appear in the resulting table
Right outer join unmatched rows from the second (right) table appear in the resulting table
66
Chapters 5-8
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85104 100
Chapters 5-8 67
Student |X| Credit_Hours ID Fname Lname Hours
101 Jim Smith 60102 Tim Brown 85
Student |X| Credit_Hours ID Fname Lname Hours
101 Jim Smith 60102 Tim Brown 85103 Babara Houston
Left Outer Join Right Outer Join
Student |X| Credit_Hours ID Fname Lname Stuid Hours
101 Jim Smith 101 60102 Tim Brown 102 85
104 100
Outer Join -- OracleLeft-outer join
select * from student, credit_hours where id = stuid(+);
SELECT E.FNAME, E.LNAME, dependent_nameFROM EMPLOYEE E, DEPENDENT DWHERE E.SSN = D.ESSN(+);
68
Chapters 5-8
RIGHT-OUTER JOIN
select * from student, credit_hours where id(+) = stuid;
69
Chapters 5-8
Sample SQL create view: create view v_emp_dno as select fname, lname, dno from
employee;
select * from v_emp_dno;
create view v_department as select dnumber, dname from department;
select * from v_department;
Cartesian product:select * from v_emp_dno, v_department;
Natural join:
select * from v_emp_dno, v_department where dno = dnumber;
Left Outer join
select fname, lname, ssn, essn, dependent_name from employee, dependent where ssn = essn (+);
Right Outer joinselect essn, dependent_name, fname, lname, ssn from employee, dependent where essn (+) = ssn;
70
Chapters 5-8
Set operations: Union, Difference, Intersection, Division
Union (U) tables must be compatible - they must have same basic structure, both relations must have the same domains.
The union of two relations is the set of tuples in either or both relations 71
Chapters 5-8
EXAMPLE TO ILLUSTRATE THE RESULT OF UNION, INTERSECT, AND DIFFERENCE
72
Chapters 5-8
SQL--UnionSelect ssn from employee where dno = 5Unionselect distinct(essn) from dependent;
73
Chapters 5-8
SSN --------- 123456789 333445555 666884444 453453453
4 rows selected
ESSN --------- 123456789 333445555 987654321
3 rows selected
SSN --------- 123456789 333445555 453453453 666884444 987654321
5 rows selected
U =
Difference (-) The difference between two relations is the set of tuples that belong to the first relation but not in the second relation.
74
Chapters 5-8
SQL--Minus Select ssn from employeeminusselect distinct(essn) from dependent;
75
Chapters 5-8
SSN --------- 123456789 333445555 999887777 987654321 666884444 453453453 987987987 888665555
8 rows selected
ESSN --------- 123456789 333445555 987654321
3 rows selected
SSN --------- 453453453 666884444 888665555 987987987 999887777
5 rows selected
- =
Intersection () The intersection of two relations is the set of tuples that belong to both relations simultaneously.
76
Chapters 5-8
Student1ID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Student2ID Fname Lname
101 Jim Smith102 Tim Brown105 Kim Lee110 Mike Moore
Chapters 5-8 77
Intersection
Student1 Student2ID Fname Lname
101 Jim Smith102 Tim Brown
Division () A binary operation that can be defined on two relations where the entire structure of one (the divisor) is a portion of the structure of the other (the dividen)
78
Chapters 5-8
SC1 C2a 1b 1c 2d 4a 3b 3e 6a 9b 9
RC1ab
Chapters 5-8 79
S /RC2139
Division
EXAMPLE OF DIVISION
80
Chapters 5-8
AGGREGATE FUNCTIONS AND GROUPING
Script F: (group attributes) <function, attribute> (R)
Functions = sum, average, maximum, minimum, count
81
Chapters 5-8
ALL EMPLOYEES (NO GROUP BY)
SELECT sum(salary), Max (salary), min(salary), avg(salary)
FROM employee;
SUM(SALARY) MAX(SALARY) MIN(SALARY) AVG(SALARY)
----------- ----------- ----------- ---------------------- ----------- ----------- -----------
281000 55000 25000 35125
82
Chapters 5-8
EXAMPLE: RETRIEVE THE DEPARTMENT NUMBER, NUMBER OF EMPLOYEES, AND AVERAGE SALARY IN THE DEPARTMENT – GROUP BY DNO
RESULT(DNO, NUMBER_OF_EMPLOYEES, AVG_SAL) count SSN, Average SALARY EMPLOYEE
SELECT dno, count(ssn), avg(salary)
FROM employee
GROUP BY dno
order by dno;
DNO COUNT(SSN) AVG(SALARY)
-------------------------------------- ---------- -----------
1 1 55000
4 3 31000
5 4 33250
83
Chapters 5-8
GROUP BY
SELECT dno, sum(salary), Max (salary), min(salary), avg(salary)
FROM employeeGROUP BY dno;
DNO SUM(SALARY) MAX(SALARY) MIN(SALARY) AVG(SALARY)
-------------------------------------- -----------
1 55000 55000 55000 55000
5 133000 40000 25000 33250
4 93000 43000 25000 3100084
Chapters 5-8
DNO count SSN, Average SALARY (EMPLOYEE)
85
Chapters 5-8
RESULTDNO COUNT_SSN AVERAGE_SALARY
5 4 332504 3 310001 1 55000
IF GROUPING ATTRIBUTES ARE NOT SPECIFIED
count SSN, Average SALARY (EMPLOYEE)
86
Chapters 5-8
RESULTCOUNT_SSN AVERAGE_SALARY
8 35125
SELECT sum(salary), Max (salary), min(salary), avg(salary)
FROM employee, departmentWHERE dno = dnumberAND dname = 'Research';
SUM(SALARY) MAX(SALARY) MIN(SALARY) AVG(SALARY)
----------- ----------- ----------- -----------
133000 40000 25000 33250
87
Chapters 5-8
ViewCreate View V_Dno5 as (select fname, lname, dno from employee where dno = 5)--------------view V_DNO5 created.-------------Select * from V_DNO5;FNAME LNAME DNO
--------------- --------------- -----------------
John Smith 5
Franklin Wong 5
Ramesh Narayan 5
Joyce English 588
Chapters 5-8