functional dependencies and normalization 1 instructor: mohamed eltabakh [email protected]

52
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh [email protected]

Upload: reynard-butler

Post on 03-Jan-2016

253 views

Category:

Documents


5 download

TRANSCRIPT

Functional Dependencies and Normalization

1

Instructor: Mohamed Eltabakh [email protected]

What to Cover

Functional Dependencies (FDs)

Closure of Functional Dependencies

Lossy & Lossless Decomposition

Normalization2

Decomposing Relations

Greg

Dave

sName

p2

p1

pNumber

MMs2

MMs1

pNamesNumberStudentProf

FDs: pNumber pName

Greg

Dave

sName

p2

p1

pNumber

s2

s1

sNumber

Student

p2

p1

pNumber

MM

MM

pName

Professor

Greg

Dave

sName

MM

MM

pName

S2

S1

sNumber

Student

p2

p1

pNumber

MM

MM

pName

Professor

3

LosslessLossless

LossyLossy

Lossless vs. Lossy Decomposition

Assume R is divided into R1 and R2

Lossless Decomposition R1 natural join R2 should create exactly R

Lossy Decomposition R1 natural join R2 adds more records (or deletes

records) from R

4

Lossless Decomposition

5

Greg

Dave

sName

p2

p1

pNumber

MMs2

MMs1

pNamesNumberStudentProf

FDs: pNumber pName

Greg

Dave

sName

p2

p1

pNumber

s2

s1

sNumber

Student

p2

p1

pNumber

MM

MM

pName

ProfessorLosslessLossless

Student & Professor are lossless decomposition of StudentProf(Student Professor = StudentProf)⋈

Lossy Decomposition

6

Greg

Dave

sName

p2

p1

pNumber

MMs2

MMs1

pNamesNumberStudentProf

FDs: pNumber pName

Greg

Dave

sName

MM

MM

pName

S2

S1

sNumber

Student

p2

p1

pNumber

MM

MM

pName

ProfessorLossyLossy

Student & Professor are lossy decomposition of StudentProf(Student Professor != StudentProf)⋈

Goal: Ensure Lossless Decomposition

How to ensure lossless decomposition?

Answer: The common columns must be candidate key in

one of the two relations

7

Back to our example

Greg

Dave

sName

p2

p1

pNumber

MMs2

MMs1

pNamesNumberStudentProf

FDs: pNumber pName

Greg

Dave

sName

p2

p1

pNumber

s2

s1

sNumber

Student

p2

p1

pNumber

MM

MM

pName

Professor

Greg

Dave

sName

MM

MM

pName

S2

S1

sNumber

Student

p2

p1

pNumber

MM

MM

pName

Professor

8

LosslessLossless

LossyLossy

pNumber is candidate key

pName is not candidate key

What to Cover

Functional Dependencies (FDs)

Closure of Functional Dependencies

Lossy & Lossless Decomposition

Normalization9

Normalization

10

Normalization

First Normal Form (1NF)

Boyce-Codd Normal Form (BCNF)

Third Normal Form (3NF)

Canonical Cover of FDs

11

Normalization Set of rules to avoid “bad” schema design

Decide whether a particular relation R is in “good” form If not, decompose R to be in a “good” form

Several levels of normalization First Normal Form (1NF) BCNF Third Normal Form (3NF) Fourth Normal Form (4NF)

If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized

12

First Normal Form (1NF) Attribute domain is atomic if its elements are considered to

be indivisible units (primitive attributes)

Examples of non-atomic domains are multi-valued and composite attributes

A relational schema R is in first normal form (1NF) if the domains of all attributes of R are atomic

13

We assume all relations are in 1NFWe assume all relations are in 1NF

First Normal Form (1NF): Example

14

Since all attributes are primitive It is in 1NF

Boyce-Codd Normal Form (BCNF): Definition

A relation schema R is in BCNF with respect to a

set F of functional dependencies if for all functional

dependencies in F+ of the form

α → β

where α ⊆ R and β ⊆ R, then at least one of the

following holds:

α → β is trivial (i.e.,β α) ⊆

α is a superkey for R

15

Remember:Candidate keys are also

superkeys

Remember:Candidate keys are also

superkeys

BCNF: Example

16

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 ER

s3 Mike p1 MM

Student

Student Info Professor Info

Is relation Student in BCNF given pNumber pName It is not trivial FD pNumber is not a key in Student relation

How to fix it and make it in BCNF???

NONO

Decomposing a Schema into BCNF

If R is not in BCNF because of non-trivial dependency α → β, then decompose R

R is decomposed into two relations R1 = (α U β ) -- α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α

17

Example of BCNF Decomposition

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 MM

StudentProf

FDs: pNumber pName

sNumber sName pNumber

s1 Dave p1

s2 Greg p2

Student

pNumber pName

p1 MM

p2 MM

Professor

FOREIGN KEY: Student (PNum) references Professor (PNum)

18

What is Nice about this Decomposing ???

R is decomposed into two relations R1 = (α U β ) -- α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α

19

This decomposition is lossless(Because R1 and R2 can be joined based on α, and α is

unique in R1)

This decomposition is lossless(Because R1 and R2 can be joined based on α, and α is

unique in R1)

When you join R1 and R2 on α, you get R back without lose of information

StudentProf = Student ⋈Professor

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 MM

StudentProf

FDs: pNumber pName

sNumber sName pNumber

s1 Dave p1

s2 Greg p2

Student

pNumber pName

p1 MM

p2 MM

Professor

BCNF decomposition rule create lossless decomposition

20

Multi-Step Decomposition Relation R and functional dependency F

R = (customer_name, loan_number, branch_name, branch_city, assets, amount ) F = {branch_name assets branch_city,

loan_number amount branch_name}

Is R in BCNF ??

Based on branch_name assets branch_city R1 = (branch_name, assets, branch_city) R2 = (customer_name, loan_number, branch_name, amount)

Are R1 and R2 in BCNF ?

Divide R2 based on loan_number amount branch_name R3 = (loan_number, amount, branch_name) R4 = (customer_name, loan_number)

21

NONO

R2 is not R2 is not

Final Schema has R1, R3, R4Final Schema has R1, R3, R4

What is NOT Nice about BCNF

Before decomposition, we had set of functional dependencies FDs (Say F)

22

After decomposition, do we still have the same set of FDs or we lost something ??

What is NOT Nice about BCNF

Dependency Preservation After the decomposition, all FDs in F+ should be preserved

BCNF does not guarantee dependency preservation

Can we always find a decomposition that is both BCNF and preserving dependencies? No…This decomposition may not exist That is why we study a weaker normal form called (third

normal form –3NF)

23

Dependency Preserving

Assume R is decomposed to R1 and R2

Dependencies of R1 and R2 include: Local dependencies α → β

All columns of α and β must be in a single relation

Global Dependencies Use transitivity property to form more FDs across R1 and R2

relations

24

Does these dependencies match the ones in R ?

Yes Dependency preserving

No Not dependency preserving

Example of Lost FD Assume relation R(C, S, J, D, T, Q, V)

C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF

Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T)

Does C CSJDTQV still exist? Yes: C CSJDQV (local), SDT (local), C CSJDQVT

(global)

25

Lossless & in BCNFLossless & in BCNF

Example of Lost FD (Cont’d) Assume relation R(C, S, J, D, T, Q, V)

C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF

Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T)

Does SD T still exist? Yes: SDT (local)

26

Lossless & in BCNFLossless & in BCNF

Example of Lost FD (Cont’d) Assume relation R(C, S, J, D, T, Q, V)

C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF

Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T)

Does JT CSJDTQV still exist? No this one is lost (no way from the local FDs to get this one)

27

Lossless & in BCNFLossless & in BCNF

Dependency Preservation Test

Assume R is decomposed into R1 and R2

The closure of FDs in R is F+

The FDs in R1 and R2 are FR1 and FR2, respectively

Then dependencies are preserved if: F+ = (FR1 union FR2)+

28

local dependencies in R1

local dependencies in R2

Back to Our Example Assume relation R(C, S, J, D, T, Q, V)

C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF

Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T)

F+ = {C CSJDTQV, JT CSJDTQV, SD T} FR1 = {C CSJDQV} local for R1 FR2 = {SD T} local for R2 FR1 U FR2 = {C CSJDQV, SD T} (FR1 U FR2)+ = {C CSJDQV, SD T, C T}

29

JT C is still missing

JT C is still missing

Dependency Preservation

BCNF does not necessarily preserve FDs.But 3NF is guaranteed to be able to preserve FDs.

30

Normalization

First Normal Form (1NF)

Boyce-Codd Normal Form (BCNF)

Third Normal Form (3NF)

Canonical Cover of FDs

31

Third Normal Form: Motivation

There are some situations where BCNF is not dependency preserving

Solution: Define a weaker normal form, called Third Normal Form (3NF) Allows some redundancy (we will see examples later) But all FDs are preserved

32

There is always a lossless, dependency-preserving decomposition in 3NF

There is always a lossless, dependency-preserving decomposition in 3NF

Normal Form : 3NF

Relation R is in 3NF if, for every FD in F+ α β,

where α ⊆ R and β ⊆ R, at least one of the following holds:

α → β is trivial (i.e.,β α) ⊆

α is a superkey for R

Each attribute in β-α is part of a candidate key (prime attribute)

33

L.H.S is superkey ORR.H.S consists of prime attributes

L.H.S is superkey ORR.H.S consists of prime attributes

Testing for 3NF

Use attribute closure to check for each dependency α → β, if α is a superkey

If α is not a superkey, we have to verify if each attribute in (β- α) is contained in a candidate key of R

34

3NF: ExampleLot (ID, county, lotNum, area, price, taxRate)

Primary key: IDCandidate key: <county, lotNum>

FDs: county taxRatearea price

Decomposition based on county taxRateLot (ID, county, lotNum, area, price)County (county, taxRate)

35

Is relation Lot in 3NF ? NONO

Are relations Lot and County in 3NF ? Lot is not Lot is not

3NF: Example (Cont’d)Lot (ID, county, lotNum, area, price)County (county, taxRate)

Candidate key for Lot: <county, lotNum>FDs:

county taxRatearea price

Decompose Lot based on area priceLot (ID, county, lotNum, area)County (county, taxRate)Area (area, price)

36

Is every relation in 3NF ? YESYES

Comparison between 3NF & BCNF ?

If R is in BCNF, obviously R is in 3NF

If R is in 3NF, R may not be in BCNF

3NF allows some redundancy and is weaker than BCNF

3NF is a compromise to use when BCNF with good constraint enforcement is not achievable

Important: Lossless, dependency-preserving decomposition of R into a collection of 3NF relations always possible !

37

Normalization

First Normal Form (1NF)

Boyce-Codd Normal Form (BCNF)

Third Normal Form (3NF)

Canonical Cover of FDs

38

Canonical Cover of FDs

39

Canonical Cover of FDs Canonical Cover (Minimal Cover) = G

Is the smallest set of FDs that produce the same F+

There are no extra attributes in the L.H.S or R.H.S of and dependency in G

Given set of FDs (F) with functional closure F+

Canonical cover of F is the minimal subset of FDs (G), where

G+ = F+

40

Every FD in the canonical cover is needed, otherwise some dependencies are lost

Every FD in the canonical cover is needed, otherwise some dependencies are lost

Example : Canonical Cover

Given F: A B, ABCD E, EF GH, ACDF EG

Then the canonical cover G: A B, ACD E, EF GH

41

The smallest set (minimal) of FDs that can generate F+

The smallest set (minimal) of FDs that can generate F+

Computing the Canonical Cover

Given a set of functional dependencies F, how to compute the canonical cover G

42

Example : Canonical Cover(Lets Check L.H.S) Given F = {A B, ABCD E, EF G, EF H, ACDF EG}

Union Step: {A B, ABCD E, EF GH, ACDF EG}

Test ABCD E Check A:

{BCD}+ = {BCD} A cannot be deleted Check B:

{ACD}+ = {A B C D E} Then B can be deleted

Now the set is: {A B, ACD E, EF GH, ACDF EG}

Test ACD E Check C:

{AD}+ = {ABD} C cannot be deleted Check D:

{AC}+ = {ABC} D cannot be deleted43

Example: Canonical Cover(Lets Check L.H.S-Cont’d)

Now the set is: {A B, ACD E, EF GH, ACDF EG}

Test EF GH Check E:

{F}+ = {F} E cannot be deleted

Check F: {E}+ = {E} F cannot be deleted

Test ACDF EG None of the H.L.S can be deleted

44

Example: Canonical Cover(Lets Check R.H.S) Now the set is: {A B, ACD E, EF GH, ACDF EG}

Test EF GH Check G:

{EF}+ = {E F H} G cannot be deleted Check H:

{EF}+ = {E F G} H cannot be deleted

Test ACDF EG Check E:

{ACDF}+ = {A B C D F E G} E can be deleted

Now the set is: {A B, ACD E, EF GH, ACDF G}

45

Example: Canonical Cover(Lets Check R.H.S-Cont’d) Now the set is: {A B, ACD E, EF GH, ACDF

G}

Test ACDF G Check G:

{ACDF}+ = {A B C D F E G} G can be deleted

Now the set is: {A B, ACD E, EF GH}

46

The canonical cover is:{A B, ACD E, EF GH}

The canonical cover is:{A B, ACD E, EF GH}

Canonical Cover

Used to find the smallest (minimal) set of FDs that have the same closure as the original set.

Used in the decomposition of relations to be in 3NF

The resulting decomposition is lossless and dependency preserving

47

Done with Normalization

First Normal Form (1NF)

Boyce-Codd Normal Form (BCNF)

Third Normal Form (3NF)

Canonical Cover of FDs

48

Questions ?

49

What You Learned

Data Models Entity-Relationship Model & ERD Relational Model

Conversion between the data models

Relational Algebra & Operators

Structured Query Language SQL DML: Data Manipulation Language DDL: Data Definition Language

50

What You Learned (Cont’d)

Advanced SQL Triggers, Views, Cursors, Stored Procedures and Functions PL/SQL

Functional Dependencies

Normalization Rules

51

In Advanced Courses

Things get more interesting

Indexing Techniques

Transaction Management

Query Optimization

Handling of Big Data

And many more …

52