normalization for relational databaseepl242/lectures/normalization_theory_1.pdfand update anomalies...

20
NORMALIZATION FOR RELATIONAL DATABASE

Upload: others

Post on 25-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

NORMALIZATION FOR

RELATIONAL DATABASE

Page 2: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

• Is a formal process developed to help designers define (choose) “good” relational schemas.

• Is a formal process to help designers choose between “bad” and “good” designs.

But:What is a “GOOD” design?

Normalization:

Page 3: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

Design Guidelines

• Relations should have a simple meaning.

• No Insert, Deletion or Modification anomalies

• Avoid requiring NULLS in relation columns.

• Beware of JOINS creating tuples.

Page 4: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

Relations should have simple meaning

Figure 1: Good Design

Page 5: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation
Page 6: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation
Page 7: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation
Page 8: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

Redundant Information in Tuplesand Update Anomalies

• One goal of “good” design is to minimize the storage that the base relation occupy.- Compare the storage needed for the two designs.

• Insertion Anomalies:- Add an employee who has not been assign to a department.- Difficult to add a department that doen’t have any employee yet.

Page 9: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

Deletion Anomalies:• Loose information about a department by deleting its

last employee.

Modification Anomalies:• Updating might create an inconsistent database.

For example changing the manager of department 5.

• Therefore, design the base relation schemas so that no insertion, deletion anomalies occur in the relation.

Page 10: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

• Avoid Requiring NULLS in Relation Columns:If many of the attributes do not apply to all tuplesin the relation.Problem when using aggregate operations such COUNT or SUM.Nulls have multiple interpretations:

The attribute does not apply to this tupleThe attribute value is ‘unknown’The value is known but absent.

Example:“If only 10% of the employees have individual offices”Don’t include an “office_number” attribute in the EMPLOYEE relation,rather create a new relation.

Page 11: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

Be aware of joins that create spurious tuples

• Consider the following relation schema which is derived from the EMP_PROJ relation which by the way is a very bad schema.

Page 12: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation
Page 13: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

Using that schema we can not recover the information that was originally in EMP_PROJ relation.

- Decompositing EMP_PROJ into EMP_PROJ1 and EMP_LOCS using NATURAL-JOIN we don’t get the correct original information.

Page 14: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

We should design relation schema so they can be JOINED with equality conditions on attributes that either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated.

Page 15: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

Normalization Theory• Help a designer define a relation schema without the

previous anomalies.• Provide formal concepts that may be used to define

concepts of “goodness” and “badness” of individual relation schemas.

• Relational normalization is a process for identifying “stable” attribute groupings with high interdependency and affinity.

• Normalization is based on concepts of dependencies among attributes.

- These dependencies are called “Functional Dependencies”.- They are use to identify “stable” groupings.

Page 16: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

• Normalization theory use the term “normal form” to describe the extent to which attribute have been grouped into stable relations.

• Numerous normal forms have been proposed, each trying to achieve a more stable grouping of attributes.

Figure: Normal Forms

Page 17: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

Functional Dependencies(FD)• A functional dependency is a constrain between two

sets of attributes from the database.Definition:• Give a relation R, attribute Y of R is functionally

dependent on attribute X of R denoted:

• If and only if each X-value in R has associated with it precisely one Y-value in R (at any one time). Attribute X and Y may be composite.

Page 18: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

Example:

• Using the EMPLOYEE relation:

Page 19: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

• An alternate definition of FD:Given a relation R, attribute Y of R is functionally

dependent on attribute X of R if and only if whenever two tuples of R, t1 and t2 agree on their X-value, t1[X] = t2[X] they must necessarily agree on their Y-value, t1[Y] = t2[Y].

Example:– Relation EMP_PROJ of Figure 4 satisfies the FD

Page 20: NORMALIZATION FOR RELATIONAL DATABASEepl242/lectures/Normalization_Theory_1.pdfand Update Anomalies • One goal of “good” design is to minimize the storage that the base relation

• A functional dependency is a property of the meaning or semantic of the attributes in a relation schema.

• We use our understanding of the semantics of the attributes of R – that is, how they relate to one another – to specify the FD that should hold an all relational instances.

• Functional dependence is a semantic notion.– Recognizing the FDs is part of the process of understanding

what data means.