murali mani normalization. murali mani what and why normalization? to remove potential redundancy in...
TRANSCRIPT
Murali Mani
What and Why Normalization? To remove potential redundancy in design
Redundancy causes several anomalies: insert, delete and update
Normalization uses concept of dependencies Functional Dependencies
Idea used: Decomposition Break R (A, B, C, D) into R1 (A, B) and R2 (B, C,
D) Use decomposition judiciously.
Murali Mani
Insert Anomaly
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 ER
Student
Note: We cannot insert a professor who has no students.
Insert Anomaly: We are not able to insert “valid” value/(s)
Murali Mani
Delete Anomaly
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 ER
Student
Note: We cannot delete a student that is the only student of a professor.
Note: In both cases, minimum cardinality of Professor in the correspondingER schema is 0
Student
sNumber
sName
Professor
pNumber
pName
HasAdvisor
(1,1) (0,1)
years
Delete Anomaly: We are not able to perform a delete without losing some “valid” information.
Murali Mani
Update Anomaly
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p1 MM
Student
Note: To update the name of a professor, we have to update in multiple tuples.Update anomalies are due to redundancy.
Student
sNumber
sName
Professor
pNumber
pName
HasAdvisor
(1,1) (0,*)
years
Note the maximum cardinality of Professor in the corresponding ER schema is *
Update Anomaly: To update a value, we have to update multiple rows.
Murali Mani
Keys : Revisited
A key for a relation R (a1, a2, …, an) is a set of attributes, K that uniquely determine the values for all attributes of R.
A key K is minimal: no proper subset of K is a key.
A superkey need not be minimal Prime Attribute: An attribute of a key
Murali Mani
Keys: Example
sNumber sName address
1 Dave 144FL
2 Greg 320FL
Student
Primary Key: <sNumber>Candidate key: <sName>Some superkeys: {<sNumber, address>,
<sName>, <sNumber>, <sNumber, sName>, <sNumber, sName, address>}
Prime Attribute: {sNumber, sName}
Murali Mani
Functional Dependencies (FDs)
sNumber sName address
1 Dave 144FL
2 Greg 320FL
Student
Suppose we have the FD sName address• for any two rows in the Student relation with the same value for sName, the value for address must be the same• i.e., there is a function from sName to address
Note: • We will assume no null values.• Any key (primary or candidate) or superkey of a relation R functionally determines all attributes of R
Murali Mani
Properties of FDs
Consider A, B, C, Z are sets of attributes
Reflexive (also called trivial FD): if A B, then A B
Transitive: if A B, and B C, then A C Augmentation: if A B, then AZ BZUnion: if A B, A C, then A BC Decomposition: if A BC, then A B, A C
Murali Mani
Inferring FDs
Why? Suppose we have a relation R (A, B, C) and we
have functional dependencies A B, B C, C A what is a key for R? Should we split R into multiple relations?
We can infer A ABC, B ABC, C ABC. Hence A, B, C are all keys.
Murali Mani
Algorithm for inference of FDs
Computing the closure of set of attributes {A1, A2, …, An}, denoted {A1, A2, …, An}+
1. Let X = {A1, A2, …, An}
2. If there exists a FD B1, B2, …, Bm C, such that every Bi X, then X = X C
3. Repeat step 2 till no more attributes can be added.
4. {A1, A2, …, An}+ = X
Murali Mani
Inferring FDs: Example 1
Given R (A, B, C), and FDs A B, B C, C A, what are possible keys for R
Compute the closure of attributes: {A}+ = {A, B, C} {B}+
= {A, B, C}
{C}+ = {A, B, C}
So keys for R are <A>, <B>, <C>
Murali Mani
Inferring FDs: Example 2
Consider R (A, B, C, D, E) with FDs A B, B C, CD E, does A E?
Let us compute {A}+
{A}+ = {A, B, C} Therefore A E is false
Murali Mani
Decomposing Relations
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 MM
StudentProf
FDs: pNumber pName
sNumber sName pNumber
s1 Dave p1
s2 Greg p2
Student
pNumber pName
p1 MM
p2 MM
Professor
Murali Mani
Decomposition: Lossless Join Property
Generating spurious tuples
sNumber sName pName
S1 Dave MM
S2 Greg MM
Student
pNumber pName
p1 MM
p2 MM
Professor
sNumber sName pNumber pName
s1 Dave p1 MM
s1 Dave p2 MM
s2 Greg p1 MM
s2 Greg p2 MM
StudentProf
Murali Mani
Normalization Step Consider relation R with set of attributes AR.
Consider a FD A B (such that no other attribute in (AR – A – B) is functionally determined by A).
If A is not a superkey for R, we may decompose R as: Create R’ (AR – B) Create R’’ with attributes A B Key for R’’ = A
Murali Mani
Normal Forms: BCNF
Boyce Codd Normal Form (BCNF): For every non-trivial FD X a in R, X is a superkey of R.
Murali Mani
BCNF example
SCI (student, course, instructor)
FDs: student, course instructorinstructor course
Decomposition:
SI (student, instructor)Instructor (instructor, course)
Murali Mani
Dependency PreservationWe might want to ensure that all specified FDs are captured.BCNF does not necessarily preserve FDs.2NF, 3NF preserve FDs.
student instructor
Dave MM
Dave ER
SI
instructor course
MM DB 1
ER DB 1
Instructor
student instructor course
Dave MM DB 1
Dave ER DB 1
SCI (from SI and Instructor)
SCI violates the FDstudent, course instructor
Murali Mani
Normal Forms: 3NF
Third Normal Form (3NF): For every non-trivial FD X a in R, either a is a prime attribute or X is a superkey of R.
Murali Mani
3NF - exampleLot (propNo, county, lotNum, area, price, taxRate)
Candidate key: <county, lotNum>
FDs: county taxRatearea price
Decomposition:
Lot (propNo, county, lotNum, area, price)County (county, taxRate)
Murali Mani
3NF - exampleLot (propNo, county, lotNum, area, price)County (county, taxRate)
Candidate key for Lot: <county, lotNum>
FDs: county taxRatearea price
Decomposition:
Lot (propNo, county, lotNum, area)County (county, taxRate)Area (area, price)
Murali Mani
Extreme ExampleConsider relation R (A, B, C, D) with primary key (A, B, C), and FDs B D, and C D. R violates 3NF. Decomposing it, we get 3 relations as:
R1 (A, B, C), R2 (B, D), R3 (C, D)
Let us consider an instance where we need these 3 relations and how we do a natural join ⋈
A B C
a1 b1 c1
a2 b2 c1
a3 b1 c2
B D
b1 d1
b2 d2
R1R2
C D
c1 d1
c2 d2
R3
A B C D
a1 b1 c1 d1
a2 b2 c1 d2
a3 b1 c2 d1
R1 R2: violates ⋈ C D
A B C D
a1 b1 c1 d1
R1 R2 R3: no FD is violated⋈ ⋈
Murali Mani
Define Foreign Keys? Consider the normalization step: A relation R
with set of attributes AR, and a FD A B, and A is not a key for R, we decompose R as: Create R’ (AR – B) Create R’’ with attributes A B Key for R’’ = A
We can also define foreign key R’ (A) references R’’ (A)
Question: What is key for R’?
Murali Mani
How does Normalization Help?
Employee DeptWorksFor
ssn
name
lotdid dName
(1, 1) (0, *)
Employee (ssn, name, lot, dept)Dept (did, dName)FK: Employee (dept) REFERENCES Dept (did)
Suppose: employees of a dept are in the same lotFD: dept lot
Decomposing:Employee (ssn, name, dept)Dept (did, dName)DeptLot (dept, lot)
(or) Employee (ssn, name, dept)Dept (did, dName, lot)
Employee DeptWorksFor
ssn
name
did dName
(1, 1) (0, *)
lot