database normalization. designing good schemas we know how to create schemas, but... how do we...
TRANSCRIPT
Database
Normalization
Designing Good Schemas
We know how to create schemas, but ... how do we create good schemas? what does good mean?
Schema quality measurements: semantics of the attributes minimal redundancy minimal frequency of null values
Functional Dependences
A column Y of relational table R is functionally dependent up on column X of relational table R if and only if:
Each value of X in R associated with each value of Y at any given time
Functional dependences
Y is functional dependent up on X same as values of X identify values of Y
If X Y then XZYZ IF XY and Y Z then XZ X Y means that Y depend on X or
X identify Y
Examples
S# Ename {S#, P#} Hours If for each value of S#, there are exactly one
corresponding value for sname, state, city then:
Sname Sate CityS#
Example
If {S#, p#} Qty
P# QTYS#
Redundancy Example Where’s the redundancy?
Redundancy Example
Example FDs
Transitive FDsProper FDs
Partial Key FDs
Partial Key FD
10
Normal Forms
Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies)
The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)
Normalization0NF
removemulti-valued
attributes1NF 2NF 3NF
removepartial
dependencies
removetransitive
dependencies
BCNF 4NF 5NFremove
remainingFD anomal
dependencies
removemultivalue
dependencies
removeremaininganomalies
1 NF
First normal form is
NO multi-valued attributes
No composite attribute
No nested relation
We create new table or new field (telephone, visiting)
1NF Normalization
Proper translation from ER multi-value attributes will achieve 1NF.
Still not a good solution,since we have redundancy in Dnumber and Dmgr_ssn.(This will be handled by 2NF.)
2 NF form
Second normal form that if primary key is multiple attribute and non-key attribute depend on part of primary key
P# HoursS# Cname pname Loc
2NF Normalization
Move the partial key and dependent attributes to a new relation.
Transitive Dependencies
X → Y is a transitive dependency (PD)
if there exists Z ⊈ any key
such that X → Z → Y TDs can cause redundancy if there are multiple
values of X that determine the same value of Z the value of Y for that value of Z is stored multiple times
3NF normalization: move (Z,Y) to new relation in which Z is the primary key
3 NF
The relation in 3NF if it is 2 NF and every
non-key attribute is non-transitively
dependent on primary key
3NF Normalization
Create new relation to hold the attributes in the transitive FD.
LHS of transitive FD becomes PK of new relation.
Transitive Dependency Example
I_OFFICE (instructor's office) is determinedby the non-PK attribute INSTR
DEPT COURSE SECTION ROOM INSTR I_OFFICE
DEPT COURSE SECTIONCOMP 51 1COMP 51 2COMP 163 1COMP 53 1COMP 53 2
ROOMWPC122WPC219WPC122WPC130WPC130
INSTRDOHERTYCLIBURNDOHERTYBOWRINGCARMAN
I_OFFICECSB109CSB107CSB109CSB108CSB104
NF Decomposition: Foreign Keys
DEPT COURSE SECTION ROOM INSTR I_OFFICE
DEPT COURSE SECTION ROOM INSTR
INSTR I_OFFICE
Decomposition:
Arthur Keller – CS 180
NormalizationGoal = BCNF = Boyce-Codd Normal Form =all FD’s follow from the fact “key
everything.” Formally, R is in BCNF if for every nontrivial
FD for R, say X A, then X is a superkey. “Nontrivial” = right-side attribute not in left side.
Why?1. Guarantees no redundancy due to FD’s.
2. Guarantees no update anomalies = one occurrence of a fact is updated, not all.
3. Guarantees no deletion anomalies = valid fact is lost when tuple is deleted.
Boyce-Codd Normal Form Sample data for Course Section table
Because Prefix Department, we know that (Prefix, Num, SecNum) could also be a primary key for this table.
Department Prefix Num SecNum CourseName Instructor
Mathematics Math 101 1 Algebra I Al Jeebra
Mathematics Math 101 2 Algebra I Al Jeebra
Mathematics Math 201 1 Calculus I Kal Kuelus
Philosophy Phil 201 1 Greek Thought Arie Stottle
Philosophy Phil 202 1 Euro Thought Mike Angelo
Marketing Mktg 410 1 Marketing Strategy
Marc Ekking
Marketing SpMkg 401 1 Advanced Sports Marketing
Hulk Hogan
23
Example
Students(name, addr, phones, CarLiked) A student’s phones are independent of the cars
they like. Thus, each of a student’s phones appears with
each of the cars they like in all combinations. This repetition is unlike redundancy due to
FD’s, of which name->addr is the only one.
24
Example
Students(name, addr, CarLiked, manf, favCar) FD’s: name->addr favCar, carsLiked->manf
Only key is {name, CarsLiked}. In each FD, the left side is not a superkey. Any one of these FD’s shows Students is not
in BCNF
25
Boyce-Codd Normal Form
We say a relation R is in BCNF if whenever X ->A is a nontrivial FD that holds in R, X is a superkey. Remember: nontrivial means A is not a member
of set X. Remember, a superkey is any superset of a key
(not necessarily a proper superset).
26
Example
Students(name, addr, CarsLiked, manf, favCar) F = name->addr, name -> favCar, CarsLiked->manf Pick BCNF violation name->addr. Close the left side: {name}+ = {name, addr, favCar}. Decomposed relations:
1. Students1(name, addr, favCar)
2. Students2(name, CarsLiked, manf)
27
3NF and BCNF
3rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation.
X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.
ExercisesThe following relation schema
is not in third normal form (3NF).
Is this an example of a transitive dependency or a partial key dependency?
Give an equivalent schema that is in 3NF.
SID FROM_CITY TO_CITY DISTANCE
SHIPMENT
WEIGHT
Exercises
This relation has been proposed to track Pacific alumni:
Alumni( SID, LastName, FirstName, Degree, YearAwarded, Phone).
Pacific allows students to receive multiple degrees,possibly in different years. Identify all FDs.
Give a new schema that is in third normal form.
ExercisesConsider the following relation schema:
Movie(title, genre, length, actor, sag_id, studio, studio_addr)
Every movie has a unique title. A movie may have multiple actors. Each actor has a unique sag_id. An actor may appear in multiple movies. A movie has exactly one studio,
but a studio may produce more than one movie. Each studio has exactly one address.
Identify all functional dependencies.
Normalize the schema to 3NF.