1
Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]
Relation Normalization
Why Normalization?
Functional DependenciesFunctional Dependencies.
First, Second, and Third Normal Forms.
Denormalization
Why Normalization?An ill-structured relation contains redundant data
Data redundancy causes modification anomalies:Insertion anomalies -- Suppose we want to enter SCUBA as an activity that costs $100 we can’t until a student signs up for it$100, we can t until a student signs up for it
Update anomalies -- If we change the price of swimming for student 150, there is no guarantee that student 200 will pay the new price
Deletion anomalies -- If we delete Student 100, we lose not only the fact that he/she is a skier, but also the fact that skiing costs $200
Normalization is the process used to remove modification anomalies
SID Activity Fee100 Skiing 200150 Swimming 50175 Squash 50200 Swimming 50
ACTIVITYHow can this table be changedto fix these problems???
2
Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]
Why Normalization...SIDs1s1s1
NameJosephJosephJoseph
GradeABA
Course#CIS8110CIS8120CIS8140
Textb1b2b5
MajorCISCISCIS
DeptCISCISCIS
Course
s2s2s3s3s3
AliceAliceTomTomTom
AABBA
CIS8110CIS8140CIS8110CIS8140CIS8680
b1b5b1b5b1
CSCS
AcctAcctAcct
MCSMCSAcctAcctAcct
Is there any redundant data?
Insertion anomalies?
Update anomalies?
Deletion anomalies?
Functional DependenciesGiven two attributes, X and Y, of a Table T, attribute Y is functionally dependent on attribute X iff each attribute X value must always occur with the same attribute Y value in R.
Employee.ID -> Employee.LastName
List all FDs in the Course relation:
3
Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]
Functional Dependencies...Attribute X is called the determinant of attribute Y.
X and Y may be composite (made up of more than one attribute)attribute).
Dependency relationships change with attribute semantics.
Attribute X and Attribute Y could be mutually dependent on each other.
Husband --> Wife, Wife --> Husband,H b d WifHusband <--> Wife
Attribute X may or may not be the primary key of the table.
Attribute Y value can occur in more than one field in a table.
Course# --> Text
Fully Functional DependenciesA fully functional dependence ( FFD ) exists between attributes X and Y if attribute Y is not functional dependent on any proper subset of attribute(s) X.
( SID, Course# ) --> Name?
( SID, Course# ) --> Grade?
( SID, Name ) --> Major?
( SID, Name ) --> SID?
Note that if X is not composite, then X --> Y is always a FFD.
By default, the term FD refers to FFD
4
Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]
Transitively Functional Dependencies
Given attributes X, Y, and Z of a Table T, attribute Z is transitively dependent on attribute(s) X iff( )
Attribute X --> Attribute Y and Attribute Y --> Attribute Z.
Given SID --> Dept and Dept --> CollegeGiven SID > Dept and Dept > College
SID -->?
Given SID --> Major and Major --> Dept,
SID --> ?
A Graphical Representation
Course (SID, Name, Grade, Course#, Text, Major, Dept)
SID Name
Major
Dept Course#
Grade
Primary Key
Text
5
Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]
First Normal Form (1NF)A Table T is in 1NF iff all attribute domains contain atomic
(single) values only.
A Table in 1NF has modification anomalies…in other words, the process of normalization must continue
Part#
QTY
WHouse#WAddress
INVENTORY (Part#, WHouse#, WAddress, QTY)
Second Normal Form (2NF)A table T is in 2NF iff T is in 1NF and every non key attribute
is fully dependent on the primary key (i.e. has no partial functional dependencies).
The term non key attribute refers to any attribute that does not belong to anyThe term, non key attribute, refers to any attribute that does not belong to any candidate key.
Part#
QTY
WHouse#WAddress
INVENTORY (Part#, WHouse#, WAddress, QTY)
6
Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]
Modification Anomalies in 2NF2NF tables have modification anomalies:
Redundant Information?
Update anomalies?Update anomalies?
Insertion anomalies?
Deletion anomalies?
Which FD causes the redundant data?INVENTORYPart# WHouse# WAddress QTY
123 4 Atlanta 10456 5 Birmingham 6456 2 Columbus 10123 7 Oakland 8235 1 Denver 2
Third Normal Form (3NF)A table T is in 3NF iff T is in 2NF and every
non key attribute is non transitively dependent on the primary key.on the primary key.
Student (SID, Name, Major, Dept)
Discussion:
If a relation does not have any non-key attribute, would it automatically be in 3NF?
7
Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]
LOCATION (Employee, Department, Location)
Redundant Information?
Modification Anomalies in 3NF
Redundant Information?
Update anomalies?
Insertion anomalies?
Deletion anomalies?
All determinants?E l Department
Location
Employee Department
Denormalization
Denormalization is needed ifRelations in higher normal form (not mentioned here) cause the performance
blproblem
Denormalization can speed up the data retrieval, and
Denormalization does not introduce severe update anomalies.
Is a Zipcode attribute normalized when kept with a street, city, and state attributes?
8
Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]
ExerciseGiven relation, Lab_Usage, has been defined as follows.
Lab_usage( SID, Class#, Course, SName, Account#, Lab_Hours )_ g ( _ )
whereSID is a unique student ID,
Class# is a unique class number,
SName is a student name,
Lab_Hours is the maximum laboratory hours assigned to each student in a class,
Account# is a unique computer account. A student is assigned an account for each class he/she takes. Assume that no student takes the same course twice.
Exercise...1. Determine all candidate keys and select a primary key.
2. List all FFDs.
3 Discuss update anomalies found in the relation3. Discuss update anomalies found in the relation.
4. Decompose the relation into 2NF relations.
5. Discuss update anomalies found in the 2NF relations.
6. Decompose the relation into 3NF relations.
7. Discuss update anomalies found in the 3NF relations if anyany.