software school of hunan university 2006.08 database systems design part iii normalization

79
Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Upload: maria-hudson

Post on 20-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Software School of Hunan University

200608

Database Systems Design Part III

Normalization

Database Design Flow Diagram

Individual Part 1Individual Part 1

Intelligence Identify Summary AbstractDeduce Refine

Individual Part nIndividual Part n

ER model ER model

Relations Relations Rational Relations

Rational Relations

User view 1User view 1

User view nUser view n

Transformation

Normalization

Requirements analyzing

Objectives

To answer

What is normalization (规范化) of relation

Why is normalization is necessary

What is general principle ( 基本原则 ) of normalization

What is the theory basic ( 理论基础 ) of normalization

How to normalize

Contents

① redundancy( 冗余 ) and update anomalies ( 更新异常) with improper relation design

② lossless-join ( 无损联接 ) property and dependency p

reservation ( 依赖保留 ) property

③ functional dependency (函数依赖) concept and it

applications

④Five Normalization forms ( 5 个范式)

1NF 2NF 3NF BCNF 4NF 5NF

Normalization

Given a collection of relations are they accurate rational and true representation of the enterprise data

Normalization

So we need a technology to validate our design

What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness

Motivation ( 动机 ) of Normalization

When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )

Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa

nceUpdate anomalies are problems that arise when tr

ying to insert delete or update tuples and are often caused by redundancy

Motivation of Normalization

Dream GameDepartment Staff Card

Department No CR74 Department Name Engineering

Staff No E2 Name MSmith

Title Database Designer Salary 66000

Remark

Department Manager Yes No⊙

Supervisor E5

Example of improper relation

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 2: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Database Design Flow Diagram

Individual Part 1Individual Part 1

Intelligence Identify Summary AbstractDeduce Refine

Individual Part nIndividual Part n

ER model ER model

Relations Relations Rational Relations

Rational Relations

User view 1User view 1

User view nUser view n

Transformation

Normalization

Requirements analyzing

Objectives

To answer

What is normalization (规范化) of relation

Why is normalization is necessary

What is general principle ( 基本原则 ) of normalization

What is the theory basic ( 理论基础 ) of normalization

How to normalize

Contents

① redundancy( 冗余 ) and update anomalies ( 更新异常) with improper relation design

② lossless-join ( 无损联接 ) property and dependency p

reservation ( 依赖保留 ) property

③ functional dependency (函数依赖) concept and it

applications

④Five Normalization forms ( 5 个范式)

1NF 2NF 3NF BCNF 4NF 5NF

Normalization

Given a collection of relations are they accurate rational and true representation of the enterprise data

Normalization

So we need a technology to validate our design

What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness

Motivation ( 动机 ) of Normalization

When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )

Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa

nceUpdate anomalies are problems that arise when tr

ying to insert delete or update tuples and are often caused by redundancy

Motivation of Normalization

Dream GameDepartment Staff Card

Department No CR74 Department Name Engineering

Staff No E2 Name MSmith

Title Database Designer Salary 66000

Remark

Department Manager Yes No⊙

Supervisor E5

Example of improper relation

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 3: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Objectives

To answer

What is normalization (规范化) of relation

Why is normalization is necessary

What is general principle ( 基本原则 ) of normalization

What is the theory basic ( 理论基础 ) of normalization

How to normalize

Contents

① redundancy( 冗余 ) and update anomalies ( 更新异常) with improper relation design

② lossless-join ( 无损联接 ) property and dependency p

reservation ( 依赖保留 ) property

③ functional dependency (函数依赖) concept and it

applications

④Five Normalization forms ( 5 个范式)

1NF 2NF 3NF BCNF 4NF 5NF

Normalization

Given a collection of relations are they accurate rational and true representation of the enterprise data

Normalization

So we need a technology to validate our design

What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness

Motivation ( 动机 ) of Normalization

When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )

Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa

nceUpdate anomalies are problems that arise when tr

ying to insert delete or update tuples and are often caused by redundancy

Motivation of Normalization

Dream GameDepartment Staff Card

Department No CR74 Department Name Engineering

Staff No E2 Name MSmith

Title Database Designer Salary 66000

Remark

Department Manager Yes No⊙

Supervisor E5

Example of improper relation

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 4: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Contents

① redundancy( 冗余 ) and update anomalies ( 更新异常) with improper relation design

② lossless-join ( 无损联接 ) property and dependency p

reservation ( 依赖保留 ) property

③ functional dependency (函数依赖) concept and it

applications

④Five Normalization forms ( 5 个范式)

1NF 2NF 3NF BCNF 4NF 5NF

Normalization

Given a collection of relations are they accurate rational and true representation of the enterprise data

Normalization

So we need a technology to validate our design

What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness

Motivation ( 动机 ) of Normalization

When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )

Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa

nceUpdate anomalies are problems that arise when tr

ying to insert delete or update tuples and are often caused by redundancy

Motivation of Normalization

Dream GameDepartment Staff Card

Department No CR74 Department Name Engineering

Staff No E2 Name MSmith

Title Database Designer Salary 66000

Remark

Department Manager Yes No⊙

Supervisor E5

Example of improper relation

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 5: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normalization

Given a collection of relations are they accurate rational and true representation of the enterprise data

Normalization

So we need a technology to validate our design

What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness

Motivation ( 动机 ) of Normalization

When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )

Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa

nceUpdate anomalies are problems that arise when tr

ying to insert delete or update tuples and are often caused by redundancy

Motivation of Normalization

Dream GameDepartment Staff Card

Department No CR74 Department Name Engineering

Staff No E2 Name MSmith

Title Database Designer Salary 66000

Remark

Department Manager Yes No⊙

Supervisor E5

Example of improper relation

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 6: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normalization

So we need a technology to validate our design

What is meaning of ldquoaccuraterdquo ldquorationalrdquo ldquoturerdquoMeet the business requirementsConsistencyExtensibilityLess redundancy PerformanceCorrectness

Motivation ( 动机 ) of Normalization

When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )

Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa

nceUpdate anomalies are problems that arise when tr

ying to insert delete or update tuples and are often caused by redundancy

Motivation of Normalization

Dream GameDepartment Staff Card

Department No CR74 Department Name Engineering

Staff No E2 Name MSmith

Title Database Designer Salary 66000

Remark

Department Manager Yes No⊙

Supervisor E5

Example of improper relation

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 7: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Motivation ( 动机 ) of Normalization

When the design of database is improper it leads to the problems of redundancies( 冗余 ) and update anomalies( 更新异常 )

Redundancy occurs when the same data value is stored more than once in a relation Redundancy wastes space and reduces performa

nceUpdate anomalies are problems that arise when tr

ying to insert delete or update tuples and are often caused by redundancy

Motivation of Normalization

Dream GameDepartment Staff Card

Department No CR74 Department Name Engineering

Staff No E2 Name MSmith

Title Database Designer Salary 66000

Remark

Department Manager Yes No⊙

Supervisor E5

Example of improper relation

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 8: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Motivation of Normalization

Dream GameDepartment Staff Card

Department No CR74 Department Name Engineering

Staff No E2 Name MSmith

Title Database Designer Salary 66000

Remark

Department Manager Yes No⊙

Supervisor E5

Example of improper relation

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 9: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Example of improper relation

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 10: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Insertion Anomaly Example

Consider these two types of insertion anomalies1) Insert a new employee E9 working in department D2

ndash You have to redundantly insert the department name and manager when adding this record

2) Insert a department D4 that has no current employeesndash This insertion is not possible without creating a dummy employee id a

nd record because ENo is the primary key of the relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 11: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Deletion Anomaly Example

Consider this deletion anomaly- Delete employees E3 and E6 from the database

- Deleting those two employees removes them from the database and we now have lost information about department D2

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 12: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Modification Anomaly Example

Consider these modification anomalies

1) Change the name of department D3 to ldquoEngineeringrdquo You must update the department name in 3 different records

2) Change the manager of D1 to E4 You must update the superno field in 2 different records

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 13: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Update Anomalies

bull There are three major types of update anomalies Insertion Anomalies - Insertion of a tuple into the r

elation either requires insertion of redundant information or cannot be performed without setting key values to NULL

Deletion Anomalies - Deletion of a tuple may lose information that is still required to be stored

Modification Anomalies - Changing an attribute of a tuple may require changing multiple attribute values in other tuples

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 14: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Overview to Normalization

Normalization is a validation technique for producing relations with desirable properties

Normalization is a bottom-up design technique for producing relations

Normalization decomposes relations into smaller relations that contain less redundancy This decomposition requires reconstruction of the original relations from the smaller relations and no information is lost

It was developed by Codd in 1972 preceding ER modeling and extended by others over the years

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 15: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Example Normalization to DeptEmp

bull DeptEmp Relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 16: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normalization to DeptEmp

Emp Relation Dept Relation

dno dname mgrno

D1 Management E8

D2 consulting E7

D3 Accounting E5

eno ename bdate title salary superNo dno

E1 J Doe 01-05-75 EE 30000 E2 null

E2 M Smith 06-04-66 SA 50000 E5 D3

E3 A Lee 07-05-66 ME 40000 E7 D2

E4 J Miller 09-01-50 PR 20000 E6 D3

E5 B Casey 12-25-71 SA 50000 E8 D3

E6 L Chu 11-30-65 EE 30000 E7 D2

E7 R Davis 09-08-77 ME 40000 E8 D1

E8 J Jones 10-11-72 SA 50000 null D1

Dept Emp= Emp dno Dept

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 17: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Overview to Normalization

Input a relation schema and constraints on attributes

Output one or more relation schemas

Processing Idea examining the relationships between attributes and organize the data according to its functional dependencies( 函数依赖)

Significance

1) Validate our design correct our improper design

2) Help for understanding the nature and the meaning of the data in the enterprise for learning more about the data semantics

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 18: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

When to use normalization

Normalization can be used after mapping ER model into relation model or independently

Normalization may be especially useful for databases that have already been designed without using formal techniques

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 19: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Theory basic of normalization

Functional Dependencies ( 函数依赖) Functional dependencies represent integrity constraints ( 完整

性约束) on the values of attributes in a relation and are used in normalization

A functional dependency (FD) describes the relationship between attributes in a relation We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y

Notation X rarr Y X functionally determines Y Y is functionally dependent on XExample

eno rarr ename eno pno rarr Duration

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 20: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Notation for Functional Dependencies

A functional dependency has a left-hand side called the determinant which is a set of attributes and one attribute on the right-hand side

eno pno rarr duration

determinant determined attribute

Strictly speaking there is always only one attribute on the RHS but we can combine several functional dependencies into one

eno pno rarr durationeno pno rarr respeno pno rarr duration resp

Remember that this is really two functional dependencies

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 21: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

The Semantics ofFunctional Dependencies

Similar to keys Functional dependencies are a property of a relation schema (intension) and not a property of a particular instance of the schema (extension)

Functional dependencies are directional

eno rarr ename ename rarr eno

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 22: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

ExampleFDs in EmpDept relation

bull EmpDept RelationENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8E8 J Jones 10-11-72 SA 50000 null D1 Management E8

Eno rarr ename bdate title salary superno dno

dnorarrdname mgrno

titlerarr salary

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 23: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Trivial Functional Dependencies ( 平凡函数依赖 )

A functional dependency is trivialif the attributes on its left-hand side are a superset of the attributes on its right-hand side

Examples eno rarr eno

eno ename rarr eno

eno pno duration rarr eno Duration

Trivial functional dependencies are not interesting because they do not tell us anything

We are only interested in nontrivial FDs

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 24: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Full Functional Dependencies

A functional dependency A rarr B is a full functional dependency if removal of any attribute from A results in the dependency not existing any more We say that B is fully functionally dependent on A If remove an attribute from A and the functional dependency still exists we say that B is partially dependent on A

Example

eno rarr ename (full FD)

eno ename rarr salary title (partial FD - can remove ename)

eno pno rarr hours resp (full FD)

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 25: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

How to Identify FDs in a relation

Examining the semantics of the attributes of relation

Identify all non-trivial functional dependencies in the Universal Relation

Universal (eno pno resp duration ename bdate title salary superNo dno dname mgrNo pname budget)

Eno rarr ename bdate title salary superno dnodnorarrdname mgrnotitlerarr salary

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 26: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Formal Definition ofFunctional Dependencies

Given relation R defined over U = A1 A2 An where X U Y U If for all pairs of tuples t1 and t2 in any legal instance of relation schema R

t1[X] = t2[X] implies t1[Y] = t2[Y]

then the functional dependency X rarr Y holds in R

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 27: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Functional Dependencies and Keys

Functional dependencies can be used to determine the candidate keys of a relation

What is superkey candidate key primary key

For example if an attribute functionally determines all other attributes in the relation that attribute can be a key

eno rarr eno ename bdate title supereno dno eno is a candidate key for the Emp relation

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 28: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Functional Dependencies and Keys

ENo rarr EName BDate Title SuperNo DNoDNo rarr DName MgreNotitle rarr salary

Question List the candidate key(s) of this relation

ENo EName BDate Title Salary SuperNo DNo

DName MgrNo

E1 J Doe 01-05-75 EE 30000 E2 null null null

E2 MSmith 06-04-66 SA 50000 E5 D3 Accounting E5

E3 A Lee 07-05-66 ME 40000 E7 D2 Consulting E7

E4 J Miller 09-01-50 PR 20000 E6 D3 Accounting E5

E5 B Casey 12-25-71 SA 50000 E8 D3 Accounting E5

E6 L Chu 11-30-65 EE 30000 E7 D2 Consulting E7

E7 R Davis 09-08-77 ME 40000 E8 D1 Management E8

E8 J Jones 10-11-72 SA 50000 null D1 Management E8

EmpDept relation

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 29: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Applications of FDs

1) given a set of FDs F in relation R FD X rarr Y is true or not

2) given a set of FDs F in relation R a set of attributes X is a candidate key of R or not

3) given a set of FDs F in relation R which candidate keys are in R

4) given a set of FDs F in relation R some FDs in F is Excessive or not

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 30: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Rules for reasoning about functional dependencies

Armstrongs Axioms are three inference rules for functional dependenciesLet A B C and D be subsets of the attributes of relation R

IR1 Reflexive Rule( 自反性 ) If AB then A rarr B

A set of attributes always trivially functionally determines any subset of itself

IR2 Augmentation Rule( 增加性 ) If A rarr B then AC rarr BC

Adding a set of attributes to both the LHS and RHS results in another valid functional dependency

IR3 Transitive Rule( 传递性 ) If A rarr B and B rarr C then A rarr C

Transitivity holds with FDs

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 31: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Derived Rules from Armstrongs Axioms

There are three other well-known rules that can be derived from the basic three

IR4 Decomposition( 可分解性 ) If A rarr BC then A rarr B and A rarr C A FD with two attributes on RHS can be decomposed into two FDs each with on attribute on the RHS

IR5 Union( 合并性 ) If A rarr B and A rarr C then A rarr BC Reverse of decomposition rule

IR6 Composition( 组合性 ) If A rarr B and C rarr D then AC rarr BD More general than union - can combine two non-overlapping FDs into one

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 32: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Characteristics of Armstrongs Axioms

The rules are sound( 健壮的 ) and complete( 完备的 )

1) Sound The rules do not infer( 推断 ) any incorrect FDs

2) Complete The rules can be used to infer all correct FDs (no FDs are missing)

The axioms (called IR1 IR2 and IR3) can be used to determine a set of all FDs F+ from an initial set of FDs F

F+ is referred to as the closure of F

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 33: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Strategy of dealing with questions

The basic idea is that given any set of attributes X we can compute the set of all attributes X+ that can be functionally determined using F This is called the closure of X under F

For question 1 given a set of FDs F in R FD X rarr Y is true or not we know that X rarr Y holds if Y is in X+

For question 2 a set of attributes X is a candidate key of R or not X is a superkey of R if X+ is all attributes of R

For question 3 how to computing F+ we can compute X+ for all possible subsets X of R

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 34: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Computing the Attribute Closure

The algorithm is as follows

Given a set of attributes X

Let X+ = X

Repeat Find a FD in F whose left side is a subset of X+ Add the right side of F to X+

Until (X+ does not change or X+ contains all attributes of R)

After the algorithm completes you have a set of attributes X+ that can be functionally determined by X This allows you to produce FDs of the form

X rarr A where A is in X+

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 35: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Computing Attribute Closure Example

R (ABCDEG)

F = A rarr BC C rarr D D rarr G

Compute A+

A+ = A (initial step)

A+ = ABC (by using FD A rarr BC)

A+ = ABCD (by using FD C rarr D)

A+ = ABCDG (by using FD D rarr G)

Similarly we can compute C+ and EG+

C+ = CDG

EG+ = EG

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 36: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Using Attribute Closure to Detect a FD

The attribute closure algorithm can be used to determine if a particular FD X rarr Y holds based on the set of given FDs F Compute X+ and see if Y is in X+

Example

R = (ABCDEG)

F = A rarr BC C rarr D D rarr G

Does the FD CD rarr G hold Compute CD+ = CDG Yes the FD CD rarr G holds

Does the FD BC rarr E hold Compute BC+ = BCDG No the FD BC rarr E does n

ot hold

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 37: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Using Attribute Closure to Find all FDs

The attribute closure algorithm can be used to find all FDs of a given relation R if perform the attribute closure algorithm for all subsets X of the attributes R

This is an exponential algorithm in the number of attributes in R Although there are a few optimizations Do not compute the closure of the empty set or of the set of all attributes If we find X+ = all attributes do not compute the closure of any supersets of X Note that even though you do not have to compute the closure (because you know the answer) you still must use the known closure to infer new FDs

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 38: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Using Attribute Closureto Find all FDs Example

Compute F+ from F where

R (ABCD)

F = A rarr BC C rarr D

Must compute all these attribute closures

A+ B+ C+ D+

AB+ AC+ AD+ BC+ BD+ CD+

ABC+ ABD+ ACD+ BCD+

Do not have to compute ABCD+

Start with the single sets

A+ = ABCD B+ = B C+ = CD D+ = D

New FD A rarr D

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 39: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Using Attribute Closureto Find all FDs Example (cont)

Next the two element sets Do not compute AB+ AC+ AD+ because A+ contained all attributes of R New FDs AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C BC+ = BCD BD+ = BD CD+ = CD New FD BC rarr DThree element sets Do not compute ABC+ ABD+ ACD+ New FDs ABC rarr D ABD rarr C ACD rarr B BCD+ = BCD No new FDsAll FDs F+ = A rarr BC C rarr D A rarr D BC rarr D AB rarr C AB rarr D AC rarr B AC rarr D AD rarr B AD rarr C ABC rarr D ABD rarrC ACD rarr B

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 40: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

FD Closure Questions

1) Compute all nontrivial FDs and list all keys of R

R = (ABCD)

F = AB rarr C C rarr D D rarr A

2) Compute all nontrivial FDs and list all candidate keys of R

R = (ABCD)

F = AB rarr C BC rarr D CD rarr A AD rarr B

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 41: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Equivalent Sets of FDs

A set of FDs F is said to cover another set of FDs E if every FD in E is also in F+ We can determine if F covers E by calculating X+ with respect to F for each FD X rarr Y in E and checking whether X+ includes Y

Two sets of FDs E and F are equivalent if E covers F and F covers E That is E+= F+

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 42: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Minimal Set of FDs

A set of FDs F is minimal if it satisfies these conditions Every FD in F has a single attribute on its right-hand side We cannot replace any FD X rarr A in F with Y rarr A where Y

X and still have a set of dependencies that is equivalent to F

We cannot remove any FD and still have a set that is equivalent to F

A minimal sets of FDs is like a canonical( 规范的 ) form of FDs

Note that there may be many minimal covers for a given set of functional dependencies F

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 43: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normal Forms

A relation is in a particular normal form if it satisfies certain

normalization properties

There are several normal forms defined 1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF - Boyce-Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form

Each of these normal forms are stricter than the next

For example 3NF is better than 2NF because it removes more redundancyanomalies from the schema than 2NF

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 44: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normal Forms

All relations (non-normalized)

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd NF)

4NF (Fourth NF)

5NF (Fifth NF)

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 45: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

First Normal Form (1NF)

A relation is in first normal form (1NF) if all its attribute values are atomic

That is a 1NF relation cannot have an attribute value that is a set of values (multi-valued attribute) a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs However object-oriented DBMSs and nested relational DBMSs relax t

his constraint

A relation that is not in 1NF is an unnormalized relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 46: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1 P2 Analyst Analyst 24 6

E3 A Lee P3P4 Consultant Engineer 10 48

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 47: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

A non-1NF Relation

One way to convert a non-1NF relation to a 1NF relation

1) Splitting Method - Divide the existing relation into two relations one with non-repeating attributes and the other with repeating attributes

Make a new relation consisting of the primary key of the original relation and the repeating attributes Determine a primary key for this new relation

Remove the repeating attributes from the original relation

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 48: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

ENo EName PNo Resp Duration

E1 J Doe P1 Manager 12

E2 M Smith P1P2

AnalystAnalyst

246

E3 A Lee P3P4

ConsultantEngineer

1048

E4 J Miller P2 Programmer 18

E5 B Casey P2 Manager 24

E6 L Chu P4 Manager 48

E7 R Davis P3 Engineer 36

ENo EName

E1 J Doe

E2 M Smith

E3 A Lee

E4 J Miller

E5 B Casey

E6 L Chu

E7 R Davis

ENo PNo Resp DurationE1 P1 Manager 12E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 49: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in 1NF and every non-primary key (non-prime) attribute is fully functionally dependent on the primary key

If a relation is not in 2NF we will divide it into separate relations each in 2NF by insuring that the primary key of each new relation fully functionally determines all the attributes in the relation

Splitting algorithm for FD X rarr Y that violates 2NF Compute X+ Replace R by relations R1 = X+ and R2 = (R - X+) U X Note that X is a subset of the attributes of the primary key

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 50: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Second Normal Form (2NF) Example

fd1 and fd4 are partial functional dependencies Normalize to

Emp (eno ename title bdate salary supereno dno)

WorksOn (eno pno resp hours)

Proj (pno pname budget)

ENo EName BDate Title Salary SuperNo DNo PNo PName Budget Resp Duration

EmpProj relation

fd1 fd2

fd3fd4

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 51: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Second Normal Form (2NF) Example (2)

ENo EName

BDate Title Salary SuperNo

DNo

Emp relation

fd1

fd2

fd3fd4

PNo PName Budget

Proj relationENo PNo Resp Duration

WorkOn relation

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 52: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Third Normal Form (3NF)

Third normal form (3NF) is based on the notion of transitive dependency A transitive dependency A rarr C is a FD that can be inferred from existing FDs A rarr B and B rarr CNote that a transitive dependency may involve more th

an 2 FDs

A relation is in third normal form (3NF) if it is in 2NF and there is no non-primary key (non-prime) attribute that is transitively dependent on the primary key

Converting a relation to 3NF from 2NF involves the removal of transitive dependencies

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 53: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Third Normal Form (3NF) Example

fd2 results in a transitive dependency eno rarr salary Remove it

ENo EName BDate Title Salary SuperNo DNo

Emp relation

fd1

fd2

fd2

Title Salary

Pay relation

ENo EName BDate Title SuperNo DNo

Emp relation

fd1

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 54: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

General Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys However a more general definition considers all candidate keys (just not the primary key we have chosen)

General definition of 2NFA relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on any candidate key

General definition of 3NFA relation is in 3NF if it is in 2NF and there is no non-prime attribute that is transitively dependent on any candidate key

Note that a prime attribute is an attribute that is in any key (candidate or primary)

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 55: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

General Definition of 3NF Example

The relation is not in 3NF according to the basic definitionbecause SSN is not a primary key attribute

However there is nothing wrong with this schema (no anomalies) because the SSN is a candidate key and any attributes fully functionally dependent on the primary key will also be fully functionally dependent on the candidate key

Thus the general definition of 2NF and 3NF includes allcandidate keys instead of just the primary key

ENo EName BDate Title Salary SuperNo DNo SSN

Emp relation

fd1

fd2

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 56: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normalization Question

Consider the universal relation

R(ABCDEFGHIJ) and the set of functional dependenciesF= AB rarr C A rarr DE B rarr F F rarr GH D rarr IJ

List the keys for R

Decompose R into 2NF and then 3NF relations

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 57: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key

To test if a relation is in BCNF we take the determinant of each FD in the relation and determine if it is a candidate key

The difference between 3NF and BCNF is that 3NF allows aFD X rarr Y to remain in the relation if Y is a primary key and X is not a candidate key BCNF only allows this FD if X is a candidate key Thus BCNF is more restrictive than 3NF However in practice most relations in 3NF are also in BCNF

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 58: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

ClientInterview relation

Relation is in 3NF but not BCNF

fd4

fd3

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 59: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Boyce-Codd Normal Form (BCNF)Example

Consider the ClientInterview relation

fd1

ClientNo InterviewDate InterviwTime StaffNo RoomNo

fd2

Note that we lose the FD fd2

fd4

fd3

ClientNo InterviewDate

InterviwTime

StaffNo StaffNo InterviwDate

RoomNo

Interview relation StaffRoom relation

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 60: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

BCNF versus 3NF

We can decompose to BCNF but sometimes we do not want to if we lose a FD

The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency

Note that we can always preserve the lossless-join (无损联接性) property (recovery) with a BCNF decomposition but we do not always get dependency preservation( 依赖保留性)

In contrast we get both recovery and dependency preservation with a 3NF decomposition (fd3)

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 61: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Conversion to BCNF

There is a direct algorithm for converting to BCNF withoutgoing through 2NF and 3NF given relation R with FDs F1) Look among the given FDs for a BCNF violation X rarr Y

Note that if any FD in F+ violates BCNF then a FD in F violates BCNF

2) Compute X+ Note that X+ cannot be the set of all attributes or else X is a superkey

3) Replace R by relations with schemas R1 = X+

R2 = (R ndash X+) U X

4) Project the FDs F onto the two new relations by Computing the closure of F and then using only the FDs whose attrib

utes are all in R1 or R2

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 62: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normalization Question

Given this database schema normalize into BCNF

fd1

PID Country Lot Area Price TaxRate

fd2

fd4fd3

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 63: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Multi-Valued Dependencies

A multi-valued dependency (MVD) occurs when two independent multi-valued attributes are present in the schema

A MVD occurs when two independent 1N relationships are in the relational schema

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 64: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Multi-Valued Dependencies Example

Eno ename Title Salary Pno dno

E1 J Doe EE 30000 P1P2

E2 M Smith SA 50000 P1P4 D1D2D3

E3 A Lee ME 40000 P1P3 D2D3

E4 J Miller PR 20000 P3 D3

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1

E1 J Doe EE 30000 P2

E2 M Smith SA 50000 P1 D1

E2 M Smith SA 50000 P1 D2

E2 M Smith SA 50000 P1 D3

E2 M Smith SA 50000 P4 D1

E2 M Smith SA 50000 P4 D2

E2 M Smith SA 50000 P4 D3

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P1 D2

E3 A Lee ME 40000 P3 D3

E3 A Lee ME 40000 P3 D3

E4 J Miller PR 20000 P3 D3

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 65: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Multi-Valued Dependencies (MVDs)

A multi-valued dependency (MVD) is a dependency between attributes A B C in a relation such that for each value of A there is a set of values B and a set of values C where the set of values B and C are independent of each otherA MVD is denoted as A mdashgtgt B and A mdashgtgt C or abbreviated as A mdashgtgt B | C

A trivial MVD A mdashgtgt B occurs when either B is a subset of A or

A cup B = R

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 66: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 67: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Fourth Normal Form (4NF) Example

eno mdashgtgtpno

eno mdashgtgtdno

Eno Ename Title Salary Pno dno

E1 J Doe EE 30000 P1 E1 J Doe EE 30000 P2 E2 M Smith SA 50000 P1 D1E2 M Smith SA 50000 P1 D2E2 M Smith SA 50000 P1 D3E2 M Smith SA 50000 P4 D1E2 M Smith SA 50000 P4 D2E2 M Smith SA 50000 P4 D3E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P1 D2E3 A Lee ME 40000 P3 D3E3 A Lee ME 40000 P3 D3E4 J Miller PR 20000 P3 D3

eno ename title salary

E1 J Doe EE 30000

E2 MSmith SA 50000

E3 A Lee ME 40000

E4 J Miller PR 20000

eno pno

E1 P1

E1 P2

E2 P1

E2 P4

E3 P1

E3 P3

E4 P3

eno dno

E2 D1

E2 D2

E2 D3

E3 D2

E3 D3

E4 D3

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 68: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Lossless-join Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation

A lossless-join dependency is a property of decomposition which ensures that no spurious( 伪造的 ) tuples are generated when relations are natural joined

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 69: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Fifth Normal Form (5NF)

bull Fifth normal form (5NF) is based on join dependencies

A relation is in fifth normal form (5NF) if the relation has no join dependency

bull 5NF is also called project-join normal form (PJNF)

A join dependency (JD) denoted by JD(R1 R2 hellip Rn) on relational schema R specifies a constraint on the states r of R

The constraint states that every legal state r of R should have a non-additive join decomposition into R1 R2 hellip Rn That is for every such r we have

ΠR1(r) ΠR2(r) hellip ΠRn(r) = r

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 70: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Fifth Normal Form (5NF) Example

Consider a relation Supply (sname partName projName)

1048715Add the additional constraint that

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j

Then supplier s also supplies part p to project j

Original Supply Relation Supply Relation with new constraintsname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail X

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 71: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Fifth Normal Form (5NF) Example

sname partName projName

Smith Bolt XSmith Nut YAdam Bolt YWalton Nut ZAdam Nail XAdam Bolt XSmith Bolt Y

sname partName

Smith Bolt

Smith Nut

Adam Bolt

Walton Nut

Adam Nail

sname projName

Smith X

Smith Y

Adam Y

Walton Z

Adam X

partName projName

Bolt X

Nut Y

Bolt Y

Nut Z

Nail X

Note That only joining all three relations together will get you back to the original relation Joining any two will create spurious tuples

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 72: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Desirable Properties in normalization

Relational schemas that are well-designed have several

important properties

1) The most basic property is that relations consists of attributes that are logically related The attributes in a relation should belong to only one entity or

relationship

2) Lossless-join property ensures that the information decomposed across many relations can be reconstructed using natural joins

3) Dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 73: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normal Forms in Practice

Normal forms are used to prevent anomalies and redundancy However just because successive normal forms are better in reducing redundancy that does not mean they always have to be used

For example query execution time may increase because of normalization as more joins become necessary to answer queries

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 74: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normal Forms in Practice Example

For example street and city uniquely determine a zipcode

In this case reducing redundancy is not as important as the fact that a join is necessary every time the zipcode is needed

When a zipcode does change it is easy to scan the entire Emp relation and update it accordingly

fd1

ENo City Street ZipCode

fd2

ENo ZipCode

ZipCode City Street

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 75: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Normal Forms in Practice Summary

In summary normalization is typically used in two ways

To improve on relational schemas after ER design

To improve on an existing relational schemas that are poorly designed and contain redundancy and potential for anomalies

There is always a tradeoff in normalization Normalization reduces redundancy but fragments the relations which makes it more expensive to query Some redundancy may help performance

This is the classic time versus space tradeoff

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 76: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Conclusion

Normalization is a technique for producing relations with desirable properties Normalization reduces redundancy and update anomalies

Normal forms indicate when relations satisfy certain properties

1NF - All attributes are atomic

2NF - All attributes are fully functionally dependent on a key

3NF - There are no transitive dependencies in the relation

4NF - There are no multi-valued dependencies in the relation

5NF - The relation has no join dependency

In practice normalization is used to improve schemas produced after ER design and existing relational schemas

It is not always beneficial to go to 5NF as it may be difficult and normalization may increase query time

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 77: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Objectives

Explain process of normalization and why it is beneficial

Describe the concept of the Universal Relation

What are the motivations for normalization

What are the 3 types of update anomalies Be able to give examples

Describe the lossless-join property and dependency preservation property

Define and explain the concept of a functional dependency

What is a trivial FD

Describe the relationship between FDs and keys

What are Armstrongs Axioms When are they used

What is the closure of a set of FDs Be able to compute the closure of a set of attributes and a set of FDs

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 78: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Objectives (2)

Define minimal and covers for a set of FDs

Define normal forms be able to list the normal forms and draw a diagram showing their relationship

Define 1NF Know how to convert non-1NF relation to 1NF

Define 2NF Convert 1NF relation to 2NF

Define 3NF Convert 2NF relation to 3NF

Describe why the general definitions of 2NF and 3NF are necessary and be able to give an example

Define BCNF Convert 3NF relation to BCNF Compare BCNF and 3NF Be able to convert directly to BCNF using algorithm

Describe multi-valued dependencies When do they occur

What is a trivial MVD

Define 4NF Convert 3NFBCNF relation to 4NF

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
Page 79: Software School of Hunan University 2006.08 Database Systems Design Part III Normalization

Objectives (3)

Describe lossless-join property and how it relates to 5NF

Define 5NF

Explain normalization in terms of the time versus space tradeoff

Explain the relationship between normalization and ER modeling

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79