software school of hunan university 2006.08 database systems design part iii normalization
TRANSCRIPT
Software School of Hunan University
200608
Database Systems Design Part III
Normalization
Database Design Flow Diagram
Individual Part 1Individual Part 1
Intelligence Identify Summary AbstractDeduce Refine
Individual Part nIndividual Part n
ER model ER model
Relations Relations Rational Relations
Rational Relations
User view 1User view 1
User view nUser view n
Transformation
Normalization
Requirements analyzing
Objectives
To answer
What is normalization (规范化) of relation
Why is normalization is necessary
What is general principle ( 基本原则 ) of normalization
What is the theory basic ( 理论基础 ) of normalization
How to normalize
Contents
① redundancy( 冗余 ) and update anomalies ( 更新异常) with improper relation design
② lossless-join ( 无损联接 ) property and dependency p
reservation ( 依赖保留 ) property
③ functional dependency (函数依赖) concept and it
applications
④Five Normalization forms ( 5 个范式)
1NF 2NF 3NF BCNF 4NF 5NF
Normalization
Given a collection of relations are they accurate rational and true representation of the enterprise data
Normalization
So we need a technology to validate our design
What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness
Motivation ( 动机 ) of Normalization
When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )
Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa
nceUpdate anomalies are problems that arise when tr
ying to insert delete or update tuples and are often caused by redundancy
Motivation of Normalization
Dream GameDepartment Staff Card
Department No CR74 Department Name Engineering
Staff No E2 Name MSmith
Title Database Designer Salary 66000
Remark
Department Manager Yes No⊙
Supervisor E5
Example of improper relation
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Database Design Flow Diagram
Individual Part 1Individual Part 1
Intelligence Identify Summary AbstractDeduce Refine
Individual Part nIndividual Part n
ER model ER model
Relations Relations Rational Relations
Rational Relations
User view 1User view 1
User view nUser view n
Transformation
Normalization
Requirements analyzing
Objectives
To answer
What is normalization (规范化) of relation
Why is normalization is necessary
What is general principle ( 基本原则 ) of normalization
What is the theory basic ( 理论基础 ) of normalization
How to normalize
Contents
① redundancy( 冗余 ) and update anomalies ( 更新异常) with improper relation design
② lossless-join ( 无损联接 ) property and dependency p
reservation ( 依赖保留 ) property
③ functional dependency (函数依赖) concept and it
applications
④Five Normalization forms ( 5 个范式)
1NF 2NF 3NF BCNF 4NF 5NF
Normalization
Given a collection of relations are they accurate rational and true representation of the enterprise data
Normalization
So we need a technology to validate our design
What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness
Motivation ( 动机 ) of Normalization
When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )
Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa
nceUpdate anomalies are problems that arise when tr
ying to insert delete or update tuples and are often caused by redundancy
Motivation of Normalization
Dream GameDepartment Staff Card
Department No CR74 Department Name Engineering
Staff No E2 Name MSmith
Title Database Designer Salary 66000
Remark
Department Manager Yes No⊙
Supervisor E5
Example of improper relation
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Objectives
To answer
What is normalization (规范化) of relation
Why is normalization is necessary
What is general principle ( 基本原则 ) of normalization
What is the theory basic ( 理论基础 ) of normalization
How to normalize
Contents
① redundancy( 冗余 ) and update anomalies ( 更新异常) with improper relation design
② lossless-join ( 无损联接 ) property and dependency p
reservation ( 依赖保留 ) property
③ functional dependency (函数依赖) concept and it
applications
④Five Normalization forms ( 5 个范式)
1NF 2NF 3NF BCNF 4NF 5NF
Normalization
Given a collection of relations are they accurate rational and true representation of the enterprise data
Normalization
So we need a technology to validate our design
What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness
Motivation ( 动机 ) of Normalization
When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )
Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa
nceUpdate anomalies are problems that arise when tr
ying to insert delete or update tuples and are often caused by redundancy
Motivation of Normalization
Dream GameDepartment Staff Card
Department No CR74 Department Name Engineering
Staff No E2 Name MSmith
Title Database Designer Salary 66000
Remark
Department Manager Yes No⊙
Supervisor E5
Example of improper relation
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Contents
① redundancy( 冗余 ) and update anomalies ( 更新异常) with improper relation design
② lossless-join ( 无损联接 ) property and dependency p
reservation ( 依赖保留 ) property
③ functional dependency (函数依赖) concept and it
applications
④Five Normalization forms ( 5 个范式)
1NF 2NF 3NF BCNF 4NF 5NF
Normalization
Given a collection of relations are they accurate rational and true representation of the enterprise data
Normalization
So we need a technology to validate our design
What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness
Motivation ( 动机 ) of Normalization
When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )
Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa
nceUpdate anomalies are problems that arise when tr
ying to insert delete or update tuples and are often caused by redundancy
Motivation of Normalization
Dream GameDepartment Staff Card
Department No CR74 Department Name Engineering
Staff No E2 Name MSmith
Title Database Designer Salary 66000
Remark
Department Manager Yes No⊙
Supervisor E5
Example of improper relation
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normalization
Given a collection of relations are they accurate rational and true representation of the enterprise data
Normalization
So we need a technology to validate our design
What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness
Motivation ( 动机 ) of Normalization
When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )
Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa
nceUpdate anomalies are problems that arise when tr
ying to insert delete or update tuples and are often caused by redundancy
Motivation of Normalization
Dream GameDepartment Staff Card
Department No CR74 Department Name Engineering
Staff No E2 Name MSmith
Title Database Designer Salary 66000
Remark
Department Manager Yes No⊙
Supervisor E5
Example of improper relation
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normalization
So we need a technology to validate our design
What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness
Motivation ( 动机 ) of Normalization
When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )
Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa
nceUpdate anomalies are problems that arise when tr
ying to insert delete or update tuples and are often caused by redundancy
Motivation of Normalization
Dream GameDepartment Staff Card
Department No CR74 Department Name Engineering
Staff No E2 Name MSmith
Title Database Designer Salary 66000
Remark
Department Manager Yes No⊙
Supervisor E5
Example of improper relation
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Motivation ( 动机 ) of Normalization
When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )
Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa
nceUpdate anomalies are problems that arise when tr
ying to insert delete or update tuples and are often caused by redundancy
Motivation of Normalization
Dream GameDepartment Staff Card
Department No CR74 Department Name Engineering
Staff No E2 Name MSmith
Title Database Designer Salary 66000
Remark
Department Manager Yes No⊙
Supervisor E5
Example of improper relation
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Motivation of Normalization
Dream GameDepartment Staff Card
Department No CR74 Department Name Engineering
Staff No E2 Name MSmith
Title Database Designer Salary 66000
Remark
Department Manager Yes No⊙
Supervisor E5
Example of improper relation
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Example of improper relation
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Insertion Anomaly Example
Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2
ndash You have to redundantly insert the department name and manager when adding this record
2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a
nd record because ENo is the primary key of the relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Deletion Anomaly Example
Consider this deletion anomaly- Delete employees E3 and E6 from the database
- Deleting those two employees removes them from the database and we now have lost information about department D2
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Modification Anomaly Example
Consider these modification anomalies
1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records
2) Change the manager of D1 to E4 You must update the superno field in 2 different records
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Update Anomalies
bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r
elation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored
Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Overview to Normalization
Normalization is a validation technique for producing relations with desirable properties
Normalization is a bottom-up design technique for producing relations
Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost
It was developed by Codd in 1972 preceding ER modeling and extended by others over the years
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Example Normalization to DeptEmp
bull DeptEmp Relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normalization to DeptEmp
Emp Relation Dept Relation
dno dname mgrno
D1 Management E8
D2 consulting E7
D3 Accounting E5
eno ename bdate title salary superNo dno
E1 J Doe 01-05-75 EE 30000 E2 null
E2 M Smith 06-04-66 SA 50000 E5 D3
E3 A Lee 07-05-66 ME 40000 E7 D2
E4 J Miller 09-01-50 PR 20000 E6 D3
E5 B Casey 12-25-71 SA 50000 E8 D3
E6 L Chu 11-30-65 EE 30000 E7 D2
E7 R Davis 09-08-77 ME 40000 E8 D1
E8 J Jones 10-11-72 SA 50000 null D1
Dept Emp= Emp dno Dept
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Overview to Normalization
Input a relation schema and constraints on attributes
Output one or more relation schemas
Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)
Significance
1) Validate our design correct our improper design
2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
When to use normalization
Normalization can be used after mapping ER model into relation model or independently
Normalization may be especially useful for databases that have already been designed without using formal techniques
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Theory basic of normalization
Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整
性约束) on the values of attributes in a relation and are used in normalization
A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y
Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample
eno rarr ename eno pno rarr Duration
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Notation for Functional Dependencies
A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side
eno pno rarr duration
determinant determined attribute
Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one
eno pno rarr durationeno pno rarr respeno pno rarr duration resp
Remember that this is really two functional dependencies
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
The Semantics ofFunctional Dependencies
Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)
Functional dependencies are directional
eno rarr ename ename rarr eno
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
ExampleFDs in EmpDept relation
bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8
Eno rarr ename bdate title salary superno dno
dnorarrdname mgrno
titlerarr salary
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Trivial Functional Dependencies ( 平凡函数依赖 )
A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side
Examples eno rarr eno
eno ename rarr eno
eno pno duration rarr eno Duration
Trivial functional dependencies are not interesting because they do not tell us anything
We are only interested in nontrivial FDs
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Full Functional Dependencies
A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A
Example
eno rarr ename (full FD)
eno ename rarr salary title (partial FD - can remove ename)
eno pno rarr hours resp (full FD)
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
How to Identify FDs in a relation
Examining the semantics of the attributes of relation
Identify all non-trivial functional dependencies in the Universal Relation
Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)
Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Formal Definition ofFunctional Dependencies
Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R
t1[X] = t2[X] implies t1[Y] = t2[Y]
then the functional dependency X rarr Y holds in R
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate keys of a relation
What is superkey candidate key primary key
For example if an attribute functionally determines all other attributes in the relation that attribute can be a key
eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Functional Dependencies and Keys
ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary
Question List the candidate key(s) of this relation
ENo EName BDate Title Salary SuperNo DNo
DName MgrNo
E1 J Doe 01-05-75 EE 30000 E2 null null null
E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J Jones 10-11-72 SA 50000 null D1 Management E8
EmpDept relation
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Applications of FDs
1) given a set of FDs F in relation R FD X rarr Y is true or not
2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not
3) given a set of FDs F in relation R which candidate keys are in R
4) given a set of FDs F in relation R some FDs in F is Excessive or not
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Rules for reasoning about functional dependencies
Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R
IR1 Reflexive Rule( 自反性 ) If AB then A rarr B
A set of attributes always trivially functionally determines any subset of itself
IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC
Adding a set of attributes to both the LHS and RHS results in another valid functional dependency
IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C
Transitivity holds with FDs
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Derived Rules from Armstrongs Axioms
There are three other well-known rules that can be derived from the basic three
IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS
IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule
IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Characteristics of Armstrongs Axioms
The rules are sound( 健壮的 ) and complete( 完备的 )
1) Sound The rules do not infer( 推断 ) any incorrect FDs
2) Complete The rules can be used to infer all correct FDs (no FDs are missing)
The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F
F+ is referred to as the closure of F
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Strategy of dealing with questions
The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F
For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+
For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R
For question 3 how to computing F+ we can compute X+ for all possible subsets X of R
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Computing the Attribute Closure
The algorithm is as follows
Given a set of attributes X
Let X+ = X
Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+
Until (X+ does not change or X+ contains all attributes of R)
After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form
X rarr A where A is in X+
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Computing Attribute Closure Example
R (ABCDEG)
F = A rarr BC C rarr D D rarr G
Compute A+
A+ = A (initial step)
A+ = ABC (by using FD A rarr BC)
A+ = ABCD (by using FD C rarr D)
A+ = ABCDG (by using FD D rarr G)
Similarly we can compute C+ and EG+
C+ = CDG
EG+ = EG
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+
Example
R = (ABCDEG)
F = A rarr BC C rarr D D rarr G
Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds
Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n
ot hold
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Using Attribute Closure to Find all FDs
The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R
This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Using Attribute Closureto Find all FDs Example
Compute F+ from F where
R (ABCD)
F = A rarr BC C rarr D
Must compute all these attribute closures
A+ B+ C+ D+
AB+ AC+ AD+ BC+ BD+ CD+
ABC+ ABD+ ACD+ BCD+
Do not have to compute ABCD+
Start with the single sets
A+ = ABCD B+ = B C+ = CD D+ = D
New FD A rarr D
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Using Attribute Closureto Find all FDs Example (cont)
Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
FD Closure Questions
1) Compute all nontrivial FDs and list all keys of R
R = (ABCD)
F = AB rarr C C rarr D D rarr A
2) Compute all nontrivial FDs and list all candidate keys of R
R = (ABCD)
F = AB rarr C BC rarr D CD rarr A AD rarr B
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Equivalent Sets of FDs
A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y
Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y
X and still have a set of dependencies that is equivalent to F
We cannot remove any FD and still have a set that is equivalent to F
A minimal sets of FDs is like a canonical( 规范的 ) form of FDs
Note that there may be many minimal covers for a given set of functional dependencies F
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties
There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next
For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values are atomic
That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)
1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t
his constraint
A relation that is not in 1NF is an unnormalized relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1 P2 Analyst Analyst 24 6
E3 A Lee P3P4 Consultant Engineer 10 48
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
A non-1NF Relation
One way to convert a non-1NF relation to a 1NF relation
1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes
Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation
Remove the repeating attributes from the original relation
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
ENo EName PNo Resp Duration
E1 J Doe P1 Manager 12
E2 M Smith P1P2
AnalystAnalyst
246
E3 A Lee P3P4
ConsultantEngineer
1048
E4 J Miller P2 Programmer 18
E5 B Casey P2 Manager 24
E6 L Chu P4 Manager 48
E7 R Davis P3 Engineer 36
ENo EName
E1 J Doe
E2 M Smith
E3 A Lee
E4 J Miller
E5 B Casey
E6 L Chu
E7 R Davis
ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key
If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation
Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Second Normal Form (2NF) Example
fd1 and fd4 are partial functional dependencies Normalize to
Emp (eno ename title bdate salary supereno dno)
WorksOn (eno pno resp hours)
Proj (pno pname budget)
ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration
EmpProj relation
fd1 fd2
fd3fd4
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Second Normal Form (2NF) Example (2)
ENo EName
BDate Title Salary SuperNo
DNo
Emp relation
fd1
fd2
fd3fd4
PNo PName Budget
Proj relationENo PNo Resp Duration
WorkOn relation
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Third Normal Form (3NF)
Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th
an 2 FDs
A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key
Converting a relation to 3NF from 2NF involves the removal of transitive dependencies
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Third Normal Form (3NF) Example
fd2 results in a transitive dependency eno rarr salary Remove it
ENo EName BDate Title Salary SuperNo DNo
Emp relation
fd1
fd2
fd2
Title Salary
Pay relation
ENo EName BDate Title SuperNo DNo
Emp relation
fd1
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
General Definitions of 2NF and 3NF
We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)
General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key
General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key
Note that a prime attribute is an attribute that is in any key (candidate or primary)
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
General Definition of 3NF Example
The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute
However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key
Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key
ENo EName BDate Title Salary SuperNo DNo SSN
Emp relation
fd1
fd2
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normalization Question
Consider the universal relation
R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ
List the keys for R
Decompose R into 2NF and then 3NF relations
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key
The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
ClientInterview relation
Relation is in 3NF but not BCNF
fd4
fd3
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Boyce-Codd Normal Form (BCNF)Example
Consider the ClientInterview relation
fd1
ClientNo InterviewDate InterviwTime StaffNo RoomNo
fd2
Note that we lose the FD fd2
fd4
fd3
ClientNo InterviewDate
InterviwTime
StaffNo StaffNo InterviwDate
RoomNo
Interview relation StaffRoom relation
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to if we lose a FD
The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency
Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)
In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Conversion to BCNF
There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y
Note that if any FD in F+ violates BCNF then a FD in F violates BCNF
2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey
3) Replace R by relations with schemas R1 = X+
R2 = (R ndash X+) U X
4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib
utes are all in R1 or R2
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normalization Question
Given this database schema normalize into BCNF
fd1
PID Country Lot Area Price TaxRate
fd2
fd4fd3
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Multi-Valued Dependencies
A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema
A MVD occurs when two independent 1N relationships are in the relational schema
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Multi-Valued Dependencies Example
Eno ename Title Salary Pno dno
E1 J Doe EE 30000 P1P2
E2 M Smith SA 50000 P1P4 D1D2D3
E3 A Lee ME 40000 P1P3 D2D3
E4 J Miller PR 20000 P3 D3
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1
E1 J Doe EE 30000 P2
E2 M Smith SA 50000 P1 D1
E2 M Smith SA 50000 P1 D2
E2 M Smith SA 50000 P1 D3
E2 M Smith SA 50000 P4 D1
E2 M Smith SA 50000 P4 D2
E2 M Smith SA 50000 P4 D3
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P1 D2
E3 A Lee ME 40000 P3 D3
E3 A Lee ME 40000 P3 D3
E4 J Miller PR 20000 P3 D3
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Multi-Valued Dependencies (MVDs)
A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C
A trivial MVD A mdashgtgt B occurs when either B is a subset of A or
A cup B = R
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Fourth Normal Form (4NF)
Fourth normal form (4NF) is based on the idea of multi-valued dependencies
A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Fourth Normal Form (4NF) Example
eno mdashgtgtpno
eno mdashgtgtdno
Eno Ename Title Salary Pno dno
E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3
eno ename title salary
E1 J Doe EE 30000
E2 MSmith SA 50000
E3 A Lee ME 40000
E4 J Miller PR 20000
eno pno
E1 P1
E1 P2
E2 P1
E2 P4
E3 P1
E3 P3
E4 P3
eno dno
E2 D1
E2 D2
E2 D3
E3 D2
E3 D3
E4 D3
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Lossless-join Dependency
The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation
A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Fifth Normal Form (5NF)
bull Fifth normal form (5NF) is based on join dependencies
A relation is in fifth normal form (5NF) if the relation has no join dependency
bull 5NF is also called project-join normal form (PJNF)
A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R
The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have
ΠR1(r) ΠR2(r) hellip ΠRn(r) = r
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Fifth Normal Form (5NF) Example
Consider a relation Supply (sname partName projName)
1048715Add the additional constraint that
If project j requires part p
and supplier s supplies part p
and supplier s supplies at least one item to project j
Then supplier s also supplies part p to project j
Original Supply Relation Supply Relation with new constraintsname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Fifth Normal Form (5NF) Example
sname partName projName
Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y
sname partName
Smith Bolt
Smith Nut
Adam Bolt
Walton Nut
Adam Nail
sname projName
Smith X
Smith Y
Adam Y
Walton Z
Adam X
partName projName
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Desirable Properties in normalization
Relational schemas that are well-designed have several
important properties
1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or
relationship
2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins
3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used
For example query execution time may increase because of normalization as more joins become necessary to answer queries
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normal Forms in Practice Example
For example street and city uniquely determine a zipcode
In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed
When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly
fd1
ENo City Street ZipCode
fd2
ENo ZipCode
ZipCode City Street
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Normal Forms in Practice Summary
In summary normalization is typically used in two ways
To improve on relational schemas after ER design
To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies
There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance
This is the classic time versus space tradeoff
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Conclusion
Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies
Normal forms indicate when relations satisfy certain properties
1NF - All attributes are atomic
2NF - All attributes are fully functionally dependent on a key
3NF - There are no transitive dependencies in the relation
4NF - There are no multi-valued dependencies in the relation
5NF - The relation has no join dependency
In practice normalization is used to improve schemas produced after ER design and existing relational schemas
It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Objectives
Explain process of normalization and why it is beneficial
Describe the concept of the Universal Relation
What are the motivations for normalization
What are the 3 types of update anomalies Be able to give examples
Describe the lossless-join property and dependency preservation property
Define and explain the concept of a functional dependency
What is a trivial FD
Describe the relationship between FDs and keys
What are Armstrongs Axioms When are they used
What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Objectives (2)
Define minimal and covers for a set of FDs
Define normal forms be able to list the normal forms and draw a diagram showing their relationship
Define 1NF Know how to convert non-1NF relation to 1NF
Define 2NF Convert 1NF relation to 2NF
Define 3NF Convert 2NF relation to 3NF
Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example
Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm
Describe multi-valued dependencies When do they occur
What is a trivial MVD
Define 4NF Convert 3NFBCNF relation to 4NF
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-
Objectives (3)
Describe lossless-join property and how it relates to 5NF
Define 5NF
Explain normalization in terms of the time versus space tradeoff
Explain the relationship between normalization and ER modeling
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Slide 47
- Slide 48
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Slide 55
- Slide 56
- Slide 57
- Slide 58
- Slide 59
- Slide 60
- Slide 61
- Slide 62
- Slide 63
- Slide 64
- Slide 65
- Slide 66
- Slide 67
- Slide 68
- Slide 69
- Slide 70
- Slide 71
- Slide 72
- Slide 73
- Slide 74
- Slide 75
- Slide 76
- Slide 77
- Slide 78
- Slide 79
-