cs 430 database theory

26
1 CS 430 Database Theory Winter 2005 Lecture 9: Fourth and Fifth Normal Forms

Upload: seanna

Post on 15-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

CS 430 Database Theory. Winter 2005 Lecture 9: Fourth and Fifth Normal Forms. Decompositions. Given a relation R = { A 1 , … , A n } (all of the A i are unique), then a set of relation schemas D = { R 1 , … , R m } is a decomposition of R if R is the union of the R i , or - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 430 Database Theory

1

CS 430Database Theory

Winter 2005

Lecture 9: Fourth and Fifth Normal Forms

Page 2: CS 430 Database Theory

2

Decompositions

Given a relation R = {A1, … , An} (all of the Ai are unique), then a set of relation schemas

D = {R1, … , Rm} is a decomposition of R if R is the union of the Ri, or

That is, all the attributes of R appear in the Ri

R Ri

i = 1

m

Page 3: CS 430 Database Theory

3

Goodness of Decomposition

When is a decomposition “good”? Two standards:

Dependency Preservation Lossless (Nonadditive) Join

Page 4: CS 430 Database Theory

4

Dependency Preservation

Suppose we have a set of FDs F on R and a decomposition D = {R1, … , Rm}, the projection of F on R is the set

Ri(F) = {X Y F+ | X Y Ri}

That is, Ri(F) consists of all the FDs in the closure of F which are FDs on Ri

Page 5: CS 430 Database Theory

5

Dependency Preservation

D is Dependency Preserving with respect to F if the closure of the union of the projections of F onto the Ri is the closure of F. Or,

(R1(F) … Rm(F))+ = F+

Or, if we project F onto the individual Ri, union the projections together, and compute the closure, we get the original closure of F. Or, no information contained in F is lost by

projecting F onto the individual Ri

Page 6: CS 430 Database Theory

6

Dependency Preservation Notes Claim: It is possible to find a 3NF

decomposition of R (each of Ri is 3NF) which is dependency preserving See Algorithm 11.2, page 340. (No proof.)

Why do we want this? When we update the database, we want to be

able verify FDs by verifying them on the individual relations

The alternative is having to do joins to verify that our update is good, slowing system.

Page 7: CS 430 Database Theory

7

Lossless (Nonadditive) Join Property D has the Lossless (Nonadditive) Join

property with respect to a set of FDs F if for every relation state r of R that satisfies F:

R1(r) … Rm(r) = r

( is the natural join) Lossless means no loss of information Nonadditive means that natural join doesn’t

add any information

Page 8: CS 430 Database Theory

8

Lossless (Nonadditive) Join Notes Algorithm 11.1, page 337, provides a way to

test for this property If D is a binary decomposition, D = {R1 ,

R2}, D is nonadditive if and only if:

(R1 R2) (R1 - R2) is in F+, or

(R1 R2) (R2 - R1) is in F+

That is, R1 R2 is a key for (at least) one of R1 or R2

Page 9: CS 430 Database Theory

9

Aside: Null Problems with Nulls See Figures 11.2, 11.3, Text Book Bottom line: If nulls are present, especially

nulls in foreign keys then May have to use outer joins instead of ordinary

(inner) joins Have to be careful if using aggregation (e.g. sum

or average)

Page 10: CS 430 Database Theory

10

Multi-Value Dependencies

If X ,Y attributes of R there is a Multi-Valued Dependency (MVD) X > Y, (we let Z = R - (X Y )) if for all states r of R, and t1, t2 tuples of r such that t1[X ] = t2[X ], then there exist tuples t3, t4 of r such that:

t3[X ] = t4[X ] = t1[X ] = t2[X ]

t3[Y ] = t1[Y ], t4[Y ] = t2[Y ]

t4[Z ] = t1[Z ], t3[Z ] = t2[Z ] An MVD X > Y, is trivial if Y X , or X Y = R

Page 11: CS 430 Database Theory

11

Fourth Normal Form

R is 4NF with respect to a set of FDs and MVDs F if for every non-trivial MVD X > Y, X is a superkey of R.

See Figure 11.4(a, b) in Text Book.

Page 12: CS 430 Database Theory

12

Fourth Normal Form Notes

If a relation is not 4NF then there are update anomalies: If you add a relation you must also add the corresponding

relations D is a lossless (nonadditive) decomposition of R,

D = {R1 , R2}, with respect to a set of FDs and MVDs F if and only if:

(R1 R2) > (R1 - R2), which is the same as

(R1 R2) > (R2 - R1)

Page 13: CS 430 Database Theory

13

Fifth Normal Form

• JD(R1, … , Rm) is a Join Dependency (JD) for a decomposition {R1, … , Rm} of R if for every legal state r of R:

R1(r) … Rm(r) = r• A JD is trivial if some Ri = R• A relation R is in Fifth Normal Form (5NF) if

for every non-trivial JD of R, every Ri is a superkey of R

Page 14: CS 430 Database Theory

14

Notes on Fifth Normal Form

An MVD is a JD with m = 2 Finding all the JDs of a database of any size

is probably not feasible Example: See Figure 11.4 (c, d) of Text Book

Page 15: CS 430 Database Theory

15

Products, Salesmen, TerritoriesA Data Design Problem Salesman

Sells specific products Has specific territories Has a quota: How much he is supposed to sell

Product Sold by salesmen Has a price

Territory Worked by salesmen

Page 16: CS 430 Database Theory

16

ER Model Version 1

Product

Salesman

Territory

SellsProduct

WorksTerritory

Quota

Price

A Salesman can sell any Product he sells in any Territory he works.A Product has one Price for all Salesmen and all Territories.A Salesman has one Quota for all his sales.Note: Each Entity and Relation becomes a relation in our database.

Page 17: CS 430 Database Theory

17

ER Model Version 2

Product

Salesman

Territory

SellsProduct

WorksTerritoryQuota

Price

A Salesman has a Quota for each product he sells.

Page 18: CS 430 Database Theory

18

ER Model Version 3

Product

Salesman

Territory

SellsProduct

WorksTerritoryQuota

Price

Products are only sold in specific Territories.A Product has a Price set for each Territory where it is sold.A Salesman can sell any Product he sells in any Territory he works

where that Product is sold.Note JD between “Sells Product”, “Sold In”, and “Works Territory”.

SoldIn

Page 19: CS 430 Database Theory

19

ER Model Version 4

Product

Salesman

Territory

SellsProduct

SellsProduct

in Territory

Quota

Price

A Salesman is assigned to sell specific Products in specific Territories.A Salesman has a Quota for each Product he sells in each Territory.Possible Integrity Constraint: Keys of “Sells Product” and “Sold In” are

projections of “Sells Product in Territory”.

SoldIn

Page 20: CS 430 Database Theory

20

ER Model Version 4A

Product

Salesman

Territory

SellsProduct

in Territory

Quota

Price

Possible Integrity Constraint: Key of “Sold In” is projection of “Sells Product in Territory”. (But I might want to assign a Price even though no Salemen have yet been assigned that Product in that Territory.)

SoldIn

Page 21: CS 430 Database Theory

21

Sample Fields

Employee Employee ID Number Employee Name Work Location Manager

Manager ID Number Manager Name

Territory Territory Number Territory Name Territory Bonus

Product Product Number Product Name Price Actual_Sales Target_Sales

Other Quota Commission Rate Commission Manager Commission

Page 22: CS 430 Database Theory

22

Possible Functional Dependencies {Employee ID Number}

{Employee Name, Work Location, Manager ID Number, Manager Commission(?)}

{Manage ID Number} {Manager Name, Manager Commission(?)}

{Territory Number} {Territory Name, Territory Bonus(?)}

{Product Number} {Product Name, Price(?), Actual Sales(?), Target

Sales (?)}

Page 23: CS 430 Database Theory

23

More Possible FDs

{Employee ID Number, Territory Number} {Territory Bonus(?), Quota(?), Commission

Rate(?)} {Employee ID Number, Product Number}

{Quota(?), Commission Rate(?)} {Territory Number, Product Number}

{Price(?), Actual Sales(?), Target Sales(?), Territory Bonus(?), Commission Rate(?), Commission(?), Manager Commission(?)}

Page 24: CS 430 Database Theory

24

More Possible FDs

{Employee ID Number, Product Number, Territory Number} {Quota(?), Actual Sales(?), Target Sales(?),

Commission Rate(?), Commission(?) , Manager Commission(?)}

{Actual Sales, Commission Rate} {Commission}

{Actual Sales, Manager Commission Rate} {Manager Commission}

Page 25: CS 430 Database Theory

25

Proposed Solution

Employee(Employee ID Number, Employee Name, Work Location, Manager ID Number)

Manager(Manager ID Number, Manager Name, Manager Commission)

Territory(Territory Number, Territory Name) Product(Product Number, Product Name)

Page 26: CS 430 Database Theory

26

More Proposed Solution

Product_Territory(Product Number, Territory Number, Price)

Employee_Territory(Employee ID Number, Territory Number, Territory Bonus)

Employee_Product(Employee ID Number, Product Number, Commission Rate)

Employee_Product_Territory(Employee ID Number, Product Number, Territory Number, Quota, Actual Sales, Target Sales, Commission, Manager Commission)