seven cs of communication
DESCRIPTION
Usefull LectureTRANSCRIPT
ISOM
Structure of this semester
Database Fundamentals
Relational Model
Normalization
ConceptualModeling Query
Languages
AdvancedSQL
Transaction Management
Java DB Applications –JDBC
DataMining
0. Intro 1. Design 3. Applications 4. AdvancedTopics
Newbie Users ProfessionalsDesigners
MIS710
2. Querying
Developers
ISOM
Today’s Buzzwords
• Relational Model• Superkey, Candidate Key, Primary Key
and Foreign Key• Entity Integrity Rule• Referential Integrity Rule• Normalization• First, Second, Third, and Boyce-Codd
Normal Forms• Unnormalization
ISOM
Objectives of this lecture
• Understand the Relational Model and its properties• Understand the notion of keys• Understand the use and importance of referential
integrity• Provide an alternative way to design relations using
semantics rather than concepts• Take an existing “flat file” design and creating a
relational design from it through the process of Normalization
• Identify sources of problems (or anomalies) within a given relational design
• Argue about improvements to designs created by others
ISOM
Relational Data Model
• Originally proposed by Codd in 1970• Based on mathematical set theory
ID Name Age Address GPAS1 Jose 21 Stoned Hill 3.1S2 Alice 18 BigHead 3.2S3 Lin 32 Done-Audy 2.9S4 Joyce 20 Atlanta 3.7S5 Sunil 27 Mare-iota 3.2Tuples
AttributesAttributeValues
Attribute NamesRelation
ISOM
Relation: Properties
• A relation is a set of tuples• A tuple is a set of attribute-value properties
(relations) Ordering of attributes is immaterial Ordering of Tuples is immaterial
• Tuples are distinct from one another• Attributes contain atomic values only
Emp# Name AddressE1 Jose' 'M.' 'Smith' 3413 Main Street', 'Atlanta', GA
ISOM
Attributes
• Attribute nameAttribute names are unique within a relation
• Attribute domainSet of all possible values an attribute may
takeDomain (GPA) = Domain (name) =Domain (DateOfBirth) = Domain (year)
• Number of attributes: degree of the relation
ISOM
Tuples
• Aggregation of attribute valuesS1 = (s1, ‘Jose’, 21, ‘StonedHill’, 3.1)S2 = (s2, ‘Alice’, 18, ‘BigHead’, 3.2)
• Cardinality: Number of tuples in a relation
• What is the difference between the cardinality and the degree?
ID Name Age Address GPAS1 Jose 21 Stoned Hill 3.1S2 Alice 18 BigHead 3.2S3 Lin 32 Done-Audy 2.9S4 Joyce 20 Atlanta 3.7S5 Sunil 27 Mare-iota 3.2
ISOM
Primary Keys
• Superkey: SK, a subset of attributes of R, satisfying Uniqueness, that is, no two tuples have the same combination of values for these attributes
• Candidate Key: K, a superkey SK, satisfying minimality, that is, no component of K can be eliminated without destroying the uniqueness property.
• Primary Key: PK, the selected Candidate key, K.
• Can a primary key be composed of multiple attributes?• Can a relation have multiple primary keys?
ISOM
Keys - example
• Superkeys?
• Candidate keys?
• Primary key?
Disk: (ISBN#, Artist_name, Album_name, Year, Producer, Genre, time, price)
ISOM
Entity Integrity Rule
• The primary key of a base relation cannot contain a NULL value.
• Enforcement of the rule:An update which results in a NULL value
in the primary key must be rejected.
• Are the following ok?
Course Section Meets Enrolled201 1 MW 20201 NULL TTh 25
NULL NULL MWF 18
Primary Key
ISOM
Foreign Key
Physician (ID, Name, …) Patient (ID, Name, PhysID*, …)
Club (ID, Name, …) Player (ID, Name, ?*, …)
Order (OrdID, Date, …, ?*) Customer (ID, Name, …, ?*)
Dept (DeptID, Name, …, ?*) Employee (EID, Name, …, ?*)
• Attribute(s) of one relation that reference(s) the PK of another relation
• FK may or may not be (a part of) the PK of this relation
Course (CourseID, Name, …, ?*) Class (ClassID, Meets, …, ?*)Student (SID, Name, …, ?*) Registration (?)
• Can an FK refer to a part of the PK of another relation?• Can an FK refer to a PK of the same relation?
ISOM
Foreign Key ..
• FK and referenced PK may have different names
• The values of FK must draw from the value set of PK
• How do we define the Domain of an FK?• Can an FK have a NULL value?• What can we enforce with PKs and FKs?
Domain
Value Set Domain
Primary Key Foreign Key
ISOM
Referential Integrity Rule
• If FK is the foreign key of a relation R2, which matches the primary key PK of the relation R1, then: the FK value must match the PK value in some tuple of R1,
or the FK value may be NULL, but only if the FK is not (a part
of) the PK of R2.
• Enforcement of the Rule An update on either a referenced PK or an FK must satisfy
the rule. Otherwise, the operation is rejected.
• Which operation on the primary key may violate this rule?• Which operation on the foreign key may violate this rule?
ISOM
Referential Integrity Enforcement
• If an operation violates referential integrity:Restrict
• reject the operation
Cascade• try to propagate the operation to all dependent FK
values, if it is not possible, reject the operation
Nullify (or Default)• set all dependent FK values to NULL (or a default
value), if that is not possible, reject the operation
• Cases for each of the above situations?
ISOM
Creating Relations
create table STUDENT (ID char (11) not null primary key,Name char(30) not null,age int,GPA number (2,1));
create table COURSE (courseno char (6) not null primary key,coursename char(30) not null,credithours number (2,1));
create table REGISTRATION (ID references STUDENT (ID)
on delete cascade,CourseNum references COURSE (courseno),primary key (ID, CourseNum) );
ISOM
Normalization - Motivating Example
• Is there any redundant data?
• Can we insert a new course# with a new textbook?
• What should be done if ‘CIS’ is changed to ‘MIS’?
• What would happen if we remove all CIS 800 students?
SID Name Grade Course# Text Major Depts1 Joseph A CIS800 b1 CIS CISs1 Joseph B CIS820 b2 CIS CISs1 Joseph A CIS872 b5 CIS CISs2 Alice A CIS800 b1 CS MCSs2 Alice A CIS872 b5 CS MCSs3 Tom B CIS800 b1 Acct Accts3 Tom B CIS872 b5 Acct Accts3 Tom A CIS860 b1 Acct Acct
ISOM
Why Normalization?
• Poor Relation Design causes Anomalies Insertion anomalies - Insertion of some piece of
information cannot be performed unless other irrelevant information is added to it.
Update anomalies - Update of a single piece of information requires updates to multiple tuples.
Deletion anomalies - Deletion of a piece of information removes other unrelated but necessary information.
• Normalization improves the design to remove these anomalies
ISOM
Why Normalization?
• Benefitscontain minimum amount of redundancyallow users to insert, delete and modify tuples
in the relation without errors or inconsistencies. improve quality of information in the databasedecrease storage space for the database
• Costsmay contribute to performance problemsmay require more storage in some cases
ISOM
Unnormalized Relation
• Create a ‘Definition’ for this relation.• Do you see any problems in the definition?• Do you see any anomalies in the data?
STUDENT STUDENT COURSE COURSE INSTR ROOM CREDITS GRADEID NAME ID NAME NAME
224 Waters CIS20 Intro CBIS Greene 205G 5 ACIS40 Database Mgt Hong 311S 5 BCIS50 Sys.Analysis Purao 139S 5 B
351 Byron CIS30 COBOL Brown 629G 3 BCIS50 Sys.Analysis Purao 139S 5 C
421 Smith CIS20 Intro CBIS Greene 205G 5 BCIS30 COBOL Brown 629G 3 BCIS50 Sys.Analysis Purao 139S 5 B
ISOM
Normal Forms
Unnormalized Relation
First Normal Form
Second Normal Form
Third Normal Form
Higher Order Forms
Only atomic attributes
Remove nonkey dependency
Remove transitive dependency
Dependency preservation: BCNFRemove Multi-valued Dependencies: 4NFRemove Join Dependencies: 5NF
NF2
1NF
2NF
3NF
BCNF
ISOM
The Basis of Normalization
• Functional Dependency (FD)Consider two attributes, X and Y, and two
arbitrary tuples r1 and r2 of a relation R.
• Y is functionally dependent on X iff:
value of x in r1 = value of x in r2implies
value of Y in r1 = value of Y in r2
• Also stated as: R.X R.Y or X Y
ISOM
Properties of FDs
• If R.X R.Y or X Y X is called the determinant of Y. X may or may not be the key attribute of R. A FD changes with its semantic meaning
• Name Address?
X and Y may be composite X and Y may be mutually dependent on each other
• Husband Wife, Wife Husband
The same Y value may occur in multiple tuples• Course# Text
ISOM
Fully Functional Dependencies
• When is X Y a FFD?When Y is not functionally dependent on any proper subset
of X
• X Y is a fully functional dependency ( FFD )( SID, Course# ) Name? ( SID, Course# )
Grade?
( SID, Name ) Major? ( SID, Name ) SID?
• By default, the term FD refers to FFD
ISOM
Transitive Dependencies
• Given attributes X, Y, and Z of a relation R,• Z is transitively dependent on X (X Z)
iff X Y and Y Z
• For example:SID Dept, SID Major,
Dept School, Major Dept
• Do you see any Transitive Functional Dependencies?
ISOM
Some Inference Rules for FDs
• An FD is redundant if it can be derived from other FDs based on a set of inference rules. Some of these rules are:
• Reflexive rule: If X Y, then X Y X always determines a subset of itself.
• Augmentation rule: If X Y, then XZ YZ Adding an attribute(s) on both side does not change the FD.
• Transitive rule: If X Y & Y Z, then X Z Functional dependencies can be ‘chained’.
• Decomposition rule: If X YZ, then X Y and X Z• Given: { SID Name, SID Major, Major Dept }, which ones
is/are redundant?SID School, SID Dept, Dept SchoolSID ( Name, Major ), (SID, Name) (Major, Name)SID SID, SID (Name, SID)
ISOM
First Normal Form
• DEFINITIONA relation R is in first normal form (1NF) if and
only if all underlying domains contain atomic values only.
• TranslationTo be in first normal form the table must not
contain any repeating attributes.
• ImplicationAre all ‘relations’ in First Normal Form (1NF) ?
ISOM
Example - 1NF
The ‘unnormalized’ relation has been decomposed in two.
• What are the PKs?
StudentID Course# Course Title Instrname ROOM CREDITS GRADE224 CIS20 Intro CBIS Greene 205G 5 A224 CIS40 Database Mgt Hong 311S 5 B224 CIS50 Sys.Analysis Purao 139S 5 B351 CIS30 COBOL Brown 629G 3 B351 CIS50 Sys.Analysis Purao 139S 5 C421 CIS20 Intro CBIS Greene 205G 5 B421 CIS30 COBOL Brown 629G 3 B421 CIS50 Sys.Analysis Purao 139S 5 B
StudentID StudentName224 Waters251 Byron421 Smith
Relation: Student-CourseRelation: Student
ISOM
Anomalies (with only 1NF)
• Insertion Anomaly A new course cannot be inserted in the database (relation
Student-Course) until a student registers for that course.
• Update Anomaly If the instructor of a course is changed, this fact would have
to be noted at many places in the database (many tuples of the relation Student-Course).
• Deletion Anomaly Withdrawal of all students from an existing course (that is,
deletion of related tuples from the relation Student-Course) will result in unwarranted removal of that course from the database.
ISOM
Anomalies in 1NF
Course (SID, Name, Grade, Course#, Text, Major, Dept)
• 1NF Relations have anomaliesRedundant Information ?Update Anomalies ? Insertion Anomalies ?Deletion Anomalies ?
Major
Dept
SID
Course#
Name
Grade
Text
ISOM
Second Normal Form
• DEFINITIONA relation R is in second normal form (2NF) if
and only if it is in 1NF and every nonkey attribute is dependent on the full primary key.
• TranslationA table is in second normal form if there are no
partial dependencies.
• ImplicationWhat kinds of primary keys may lead to a
violation of the Second Normal Form (2NF) ?
ISOM
Bubble Chart
• Reconsider the example ..
StudentId+CourseId
StudentName
CourseTitle
Credits
Instructor
Classroom
Grade
ISOM
Dealing with Compound Keys
• Revised Bubble Chart
StudentId
StudentName
CourseTitle
Credits
Instructor
Classroom
Grade
CourseId
ISOM
Example - 2NF
STUDENT STUDENTID NAME
224 Waters251 Byron421 Smith
STUDENT COURSE GRADEID ID
224 CIS20 A224 CIS40 B224 CIS50 B351 CIS30 B351 CIS50 C421 CIS20 B421 CIS30 B421 CIS50 BCOURSE COURSE CREDITS
ID TITLECIS20 Intro to CIS 5CIS30 Java 3CIS40 DBMS 5CIS50 Systems Analysis 5
ISOM
Anomalies with (only) 2NF
• Insertion anomaly Information about a faculty (potential advisor) cannot be
added to the database unless a student is assigned to him/her.
• Update anomaly If the advisor’s office location or phone were changed, many
tuples would need to be changed.• Deletion anomaly
If all students assigned to an advisor graduate, information about the advisor will disappear from the database.
STUDENT STUDENT STATUS ADVISOR ADVISOR ADVISOR TOTALID NAME OFFICE PHONE CREDITS
224 Waters Junior Young CBA221 726104 105351 Byron Soph Greene CBA215 718434 77421 Smith Junior Young CBA221 726104 97
ISOM
Third Normal Form
• DEFINITION A relation R is in third normal form (3NF) if and only if
it is in 2NF and every nonkey attribute is non-transitively dependent on the primary key.
• Translation A table is in Third Normal Form if every non-key
attribute is determined by the key, and nothing else.
• Implication How many total attributes must the relation have for a
possible violation of the Third Normal Form (3NF) ?
ISOM
3NF Example
• Chalk out the relations.
How do you maintain student-advisor relation?
StudentName
Status
TotalCredits
AdvisorOffice
AdvisorPhone
StudentId
Advisor
Advisor
ISOM
Boyce-Codd Normal Form (BCNF)
• Update anomalies occur in an 3NF relation R ifR has multiple candidate keys,Those candidate keys are composite, andThe candidate keys are overlapped.
Computer-Lab (SID, Account, Class, Hours)
• A relation R is in BCNF iff every determinant is a candidate key.
ISOM
The Normalization Process
1. Flatten the Table Completely (no composite columns)
2. Find the Key and “all” FDs (well as many as you can possibly detect)
3. Find Partial Dependencies and decompose relation using them (2NF)
4. Find Transitive dependencies and decompose using them (3NF)
5. Remember – this is not a deterministic method – depends on the order in which FDs are chosen, so same Relation, same set of FDs can lead to different decompositions!
ISOM
Lossless Decomposition
• A bad decomposition loses information
• In a good decomposition The join of decomposed relations restores the original
relation Decomposed relations can be maintained independently
• Rissanen’s rule for non-loss decomposition: Two projections R1 and R2 of a relation R are independent iff: Every FD in R can be logically deduced from those in R 1
and R 2 , and The common attributes of R 1 and R 2 form a candidate
key for at least one of the pair.
ISOM
Higher Normal Forms
• Fourth Normal FormMultivalued Dependencies (Fagin 1977)
• Fifth Normal FormJoin Dependencies (Fagin 1979)
• Other Dependencies Inclusion Dependencies (Casanova 1981)Template Dependencies (Sadri 1982)Domain-Key Normal Form (Fagin 1981)