dr. t. y. lin | sjsu | cs 157a | fall 2015 chapter 3 database normalization 1

42
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Upload: suzanna-perry

Post on 28-Dec-2015

227 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Chapter 3

Database Normalization

1

Page 2: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Database Design Theory

Different Levels of Anomaly Problems

Normalization

2

Page 3: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Anomaly Problems

3

S #

Salary

STATUS CITY P # QTY

S1 40000

20 LONDON P1 300

S1 40000

20 LONDON P2 200

S1 40000

20 LONDON P3 400

S1 40000

20 LONDON P4 200

S1 40000

20 LONDON P5 100

S1 40000

20 LONDON P6 100

S2 30000

10 PARIS P1 300

S2 30000

10 PARIS P2 400

S3 30000

10 PARIS P2 200

S4 40000

20 LONDON P2 200

S4 40000

20 LONDON P4 300

S4 40000

20 LONDON P5 400

Initial

Page 4: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 4

Deletion/insertion anomalyS # Salary STATUS CITY P # QTY

S1 40000 20 LONDON P1 300

S1 40000 20 LONDON P2 200

S1 40000 20 LONDON P3 400

S1 40000 20 LONDON P4 200

S1 40000 20 LONDON P5 100

S1 40000 20 LONDON P6 100

S2 30000 10 PARIS P1 300

S2 30000 10 PARIS P2 400

S3 30000 10 PARIS P2 200

S4 40000 20 LONDON P2 200

S4 40000 20 LONDON P4 300

S4 40000 20 LONDON P5 400

S5 60000 30 ATHENS -  

Page 5: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Insertion/update anomaly

5

S # Salary STATUS CITY P # QTY

S1 40000 20 LONDON P1 300

S1 40000 20 LONDON P2 200

S1 40000 20 LONDON P3 400

S1 40000 20 LONDON P4 200

S1 40000 20 LONDON P5 100

S1 40000 20 LONDON P6 100

S2 30000 10 PARIS P1 300

S2 30000 10 PARIS P2 400

S3 30000 10 PARIS P2 200

S4 40000 20 LONDON P2 200

S4 40000 20 LONDON P4 300

S4 40000 20 LONDON P5 400

Page 6: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Further Normalization

The problem of database design involves the decision of a suitable logical structure for that data. In other words, the decision is what relations are needed and what attributes they should use.

Codd defined three Normal Forms ( 1NF, 2NF, 3NF ) to remove some undesirable properties from relations. Later, both Boyce and Codd defined an even stronger Normal Form called Boyce - Codd (BCNF ). Later, Fagin introduced 4NF and finally 5NF ( PJ/NF ).

6

Page 7: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 7

UNIVERSE OF RELATIONS (NORMALIZED AND UNNORMALIZED)

1NF RELATIONS (NORMALIZED RELATIONS)

2NF RELATIONS

3NF RELATIONS

BCNF RELATIONS

4NF RELATIONS

5NF RELATIONS

PJ/NF

NORMAL FORMS

Page 8: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Functional Dependencies (FD)

Given a relation R, attribute Y of R is functionally dependent on attribute X of R if each X - value in R has associated with it precisely one Y - value in R (at any one time).

(no X-values are mapped to two Y-values)

8

Page 9: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Functional Dependencies (FD)

A functional dependency is a special form of integrity constraint. In other words, every legal extension ( tabulation ) of that relation satisfies that constraint.

An attribute Y is said to be fully functionally dependent on X if Y functionally depends on X but not any proper subset of X. From now on, by FD, we mean full FD.

9

Page 10: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

First Normal Form Relations (1NF)

A relation is said to be 1NF if all underlying domains contain atomic values only. so any normalized relation is in 1NF.

10

G # SNAME STATUS CITYG1 SMITH,

ADAMS20,30

LONDON,ATHENS

G2 JONES,BLAKE

10,30

PARIS

G3 BLAKE 30 PARISG4 CLARK 20 LONDONG5 ADAMS 30 ATHENS

Page 11: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

First Normal Form Relations (1NF) Normalized (1NF)

11

G # SNAME STATUS CITYG1 SMITH,

 20,

 LONDON,

 G1 SMITH,

 20 ATHENS

G1 SMITH, 

30 LONDON

G1 SMITH, 

30 ATHENS

G1 SMITH,ADAMS

20,30

LONDON,ATHENS

              

G2 JONES,BLAKE

10,30

PARIS

G3 BLAKE 30 PARISG4 CLARK 20 LONDONG5 ADAMS 30 ATHENS

Page 12: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

First Normal Form Relations (1NF)

First Normal Form Relations(1 NF)

All relations will be in 1NF

12

Page 13: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

First Normal Form Relations (1NF)

First

13

S # STATUS CITY P # QTYS1 20 LONDON P1 300

S1 20 LONDON P2 200

S1 20 LONDON P3 400

S1 20 LONDON P4 200

S1 20 LONDON P5 100

S1 20 LONDON P6 100

S2 10 PARIS P1 300

S2 10 PARIS P2 400

S3 10 PARIS P2 200

S4 20 LONDON P2 200

S4 20 LONDON P4 300

S4 20 LONDON P5 400

Page 14: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Functional Dependencies In The Relation First

We can verify the FD by SQL; but this is merely a NECESSARY condition (SEE “group by” in Ch6)

14

P#

S#

CITY

STATUS

QTY

Page 15: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Second Normal Form (2NF)

A relation is in 2NF if it is in 1NF and every nonkey (not part of CK) attribute is fully functionally dependent (ffd) on the primary key.

W=a * Sin X + b * Cos Y

(a and b are two parameters)

W is ffd on X and Y, if both a and b are on-zero

W is not ffd on X and Y, if one of a and b

are zero; W=0 * Sin X + b * Cos Y 15

Page 16: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

BCNF (Boyce-Codd Normal Form)

For Relations with Equal or More Than One Candidate Key,

A relation R is said to be in BCNF if and only if every determinant is a candidate key.

A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent.

16

Page 17: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

2NF And SP

17

S#

STATUS

CITY

S#

P#

QTY

Page 18: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

2NF and SP

18

S # STATUS CITYS1 20 LONDONS2 10 PARISS3 10 PARISS4 20 LONDONS5 30 ATHENS

S # P # QTYS1 P1 300S1 P2 200S1 P3 400S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400

Page 19: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

2NF and SP

19

S #

STATUS CITY AMSTERDAM

S1 20 LONDON S2 10 PARIS  S3 10 PARIS  S4 20 LONDON  S5 30 ATHENS   S

#STATUS CITY

S1 20 LONDONS2 10 PARISS3 10 PARISS4 20 LONDONS5 30 ATHENS

Insertion anomaly is fixed

Update anomaly is fixed

Page 20: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

2NF and SP

20

S # P # QTYS1 P1 300S1 P2 200S1 P3 400S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400

Deletion anomaly is fixed

Page 21: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

“Degree Two” Problems

Second (Update, deletion and insertion anomaly)

21

S # STATUS CITYS1 20 LONDONS2 10 PARISS3 10 PARISS4 20 LONDON  60 ROME

S # P # QTYS1 P1 300S1 P2 200S1 P3 400S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400

Page 22: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Functional Dependencies In The Third Normal Form (3NF)

Definition 1 A relation is in 3NF if it is in 2NF and every

non-key attribute is non transitively dependent on the candidate key.

Definition 2 A relation is in 3NF if for every non-trivial FD,

it either starts from super-key or end at part of the CK.

22

Page 23: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Functional Dependencies In The Third Normal Form (3NF)

Definition 3 A relation is in 3NF iff the non-key attributes of

R are

a) mutually independent

b) fully dependent on the primary key of R.

Definition 3 (In other words) A relation R is in 3NF if, for all time, each tuple

consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent attribute values that describe that entity in some way.

23

Page 24: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Sample Tabulations OfSC and CS

24

S # CITYS1 LONDONS2 PARISS3 PARISS4 LONDONS5 ATHENS

CITY STATUSATHENS 30LONDON 20PARIS 10

SC

CS

Page 25: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Functional DependenciesIn The Relations SC and CS

25

S# CITY

CITY STATUS

Page 26: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Another set of examples (Skip 2012)

26

LOTS

PROPERTY_ID # COUNTY_NAME LOT # AREA PRICE TAX_RATE

fd1

fd2

fd3

fd4

PROPERTY_ID # COUNTY_NAME LOT # AREA PRICE

fd1

fd2

fd4

fd3

LOTS1

LOTS2

COUNTY_NAME TAX_RATE

Page 27: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Another set of examples (Skip 2012)

27

LOTS1

PROPERTY_ID # COUNTY_NAME LOT # AREA PRICE

fd1

fd2

fd4LOTS2

COUNTY_NAME TAX_RATE

fd3

LOTS1A

PROPERTY_ID # COUNTY_NAME LOT # AREA

LOTS1B

AREA PRICE

fd1

fd2fd4

LOTS

LOTS1 LOTS2

LOTS1A LOTS1BLOTS2

1NF

2NF

3NF

Page 28: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Another set of examples (Skip 2012)

Figure 13.11 Example to illustrate normalization to 2NF and 3NF. (a) The LOTS relation schema and its

functional dependencies fd1 through fd4.

(b) Decomposing LOTS into the 2NF relations LOTS1 and LOTS2.

(c) Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B.

(d) Summary of normalization of LOTS.

28

Page 29: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Boyce-Codd Normal Form (BCNF)

Codd did not deal satisfactorily, in 3NF, with the case of a relation that (a) had multiple CKs

(b) CKs were composite

(c) CKs overlapped

The 3NF was subsequently replaced by BCNF.

29

Page 30: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Boyce-Codd Normal Form (BCNF)

Relations with Equal or More Than One Candidate Key

A relation R is said to be in BCNF iff every determinant is a candidate key.

A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent.

30

Page 31: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Boyce-Codd Normal Form (BCNF)

Consider a relation SJT with attributes S(student), J(subject), and T(teacher). The meaning of the tuple (s,j,t) is that student s is taught subject j by teacher t. Suppose, in addition, that the following constraints apply. For each subject, each student of that

subject is taught by only one teacher.

Each teacher teaches only one subject.

Each subject is taught by several teachers.

31

Page 32: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Boyce-Codd Normal Form (BCNF)

Problem If we delete the student 'Jones' and the subject

'Physics', we will lose the information that 'Brown' teaches 'Physics' (Professor get fired?).

Solution Split SJT into

ST (S,T) and TJ (T, J)

This decomposition avoids the above problem but introduces different problems, what are they? What are the candidate keys?

32

Page 33: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Sample Tabulation Of The Relation SJT

S J TSMITH MATH Prof. WHITESMITH PHYSICS Prof. GREENJONES MATH Prof. WHITEJONES PHYSICS Prof.

BROWN

33

S

J

T

Page 34: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Sample Tabulations

34

J TMATH Prof. WHITEPHYSICS Prof. GREENPHYSICS Prof.

BROWN

S TSMITH Prof. WHITESMITH Prof. GREENJONES Prof. WHITEJONES Prof.

BROWN

JTST

Page 35: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Boyce-Codd Normal Form (BCNF)

Consider the relation EXAM with overlapping candidate keys (S, J) and (J, P), and with attributes S (student), J (subject), and P (position). The meaning of an EXAM tuple (s, j, p) is that student s was examined in subject j and achieved position P in the class list. Let us assume that the following constraint holds.

There are no ties; that is, no two students obtained the same position in the same subject.

35

Page 36: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Boyce-Codd Normal Form (BCNF)

Note that update anomalies such as those associated with relation SJT do not apply to relation EXAM, Why?

Overlapping candidate keys do not necessarily lead to problems. In what normal form is relation EXAM?

36

Page 37: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Sample Tabulation Of SJP

S was examined in subject J and achieved position P

There are no ties; no students obtained The same position in the same subject

37

S J PSMITH MATH FIRST (M)SMITH PHYSICS FIRST (P)JONES MATH SECOND (M)JONES PHYSICS SECOND (P)

Page 38: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Boyce-Codd Normal Form (BCNF)

Illustrating BCNF: (a) BCNF normalization

with the dependency of fd2 being "lost" in the decomposition.

(b) A relation R in 3NF but not in BCNF.

38

LOTS1A

PROPERTY_ID # COUNTY_NAME LOT# AREA

fd1

fd2

fd5

BCNF Normalization

LOTS1AX

R

A B C

fd1

fd2

LOTS1AY

AREA COUNTY_NAMEPROPERTY_ID # AREA LOT #

(b)

(a)

Page 39: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Good and Bad Decomposition

In decomposition (A), the two projections are independent of one another, in the following sense :

Updates can be made to either one without regard for the other, provided that it does not violate the primary key uniqueness constraint for that projection. Actually, if attribute CITY of relation SC is

regarded as a foreign key matching the primary key CITY of relation CS, then a certain amount of cross - checking between the two projections will be required on updates after all 39

Page 40: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Independent Components In decomposition (B), by contrast, update

to either of the two projections must be monitored to ensure that the FD CITY STATUS is not violated (if two suppliers have the same city, they must have the same status). Consider, for example : What is involved in decomposition (B) in

moving supplier S1 from London to Paris ?

In decomposition (B), the FD CITY STATUS has become an inter-relational constraint.

40

Page 41: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Independent Components

Rissanen shows that projections R1 and R2 of a relation R are independent if and only if 1. Every FD in R can be logically deduced from

those in R1 and R2, and

2. The common attributes of R1 and R2 form a candidate key for at least one of the pair.

Recall the relation SJT with its two projections ST (S, T) and TJ (T,J)

These two projections are not independent. By Rissanen's Theorem, the FD (S, J) T cannot be deduced from the FD TJ

41

Page 42: Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015

Independent Components

Relations which cannot be decomposed into independent components are said to be atomic. Thus, SJT is atomic, even though it is not in BCNF.

Unfortunately, we are forced to the unpleasant conclusion that the two objections of decomposing a relation into BCNF components and decomposing it into independent components may occasionally be in conflict.

42