normalization for relational databaseepl242/lectures/normalization_theory_1.pdfand update anomalies...
TRANSCRIPT
NORMALIZATION FOR
RELATIONAL DATABASE
• Is a formal process developed to help designers define (choose) “good” relational schemas.
• Is a formal process to help designers choose between “bad” and “good” designs.
But:What is a “GOOD” design?
Normalization:
Design Guidelines
• Relations should have a simple meaning.
• No Insert, Deletion or Modification anomalies
• Avoid requiring NULLS in relation columns.
• Beware of JOINS creating tuples.
Relations should have simple meaning
Figure 1: Good Design
Redundant Information in Tuplesand Update Anomalies
• One goal of “good” design is to minimize the storage that the base relation occupy.- Compare the storage needed for the two designs.
• Insertion Anomalies:- Add an employee who has not been assign to a department.- Difficult to add a department that doen’t have any employee yet.
Deletion Anomalies:• Loose information about a department by deleting its
last employee.
Modification Anomalies:• Updating might create an inconsistent database.
For example changing the manager of department 5.
• Therefore, design the base relation schemas so that no insertion, deletion anomalies occur in the relation.
• Avoid Requiring NULLS in Relation Columns:If many of the attributes do not apply to all tuplesin the relation.Problem when using aggregate operations such COUNT or SUM.Nulls have multiple interpretations:
The attribute does not apply to this tupleThe attribute value is ‘unknown’The value is known but absent.
Example:“If only 10% of the employees have individual offices”Don’t include an “office_number” attribute in the EMPLOYEE relation,rather create a new relation.
Be aware of joins that create spurious tuples
• Consider the following relation schema which is derived from the EMP_PROJ relation which by the way is a very bad schema.
Using that schema we can not recover the information that was originally in EMP_PROJ relation.
- Decompositing EMP_PROJ into EMP_PROJ1 and EMP_LOCS using NATURAL-JOIN we don’t get the correct original information.
We should design relation schema so they can be JOINED with equality conditions on attributes that either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated.
Normalization Theory• Help a designer define a relation schema without the
previous anomalies.• Provide formal concepts that may be used to define
concepts of “goodness” and “badness” of individual relation schemas.
• Relational normalization is a process for identifying “stable” attribute groupings with high interdependency and affinity.
• Normalization is based on concepts of dependencies among attributes.
- These dependencies are called “Functional Dependencies”.- They are use to identify “stable” groupings.
• Normalization theory use the term “normal form” to describe the extent to which attribute have been grouped into stable relations.
• Numerous normal forms have been proposed, each trying to achieve a more stable grouping of attributes.
Figure: Normal Forms
Functional Dependencies(FD)• A functional dependency is a constrain between two
sets of attributes from the database.Definition:• Give a relation R, attribute Y of R is functionally
dependent on attribute X of R denoted:
• If and only if each X-value in R has associated with it precisely one Y-value in R (at any one time). Attribute X and Y may be composite.
Example:
• Using the EMPLOYEE relation:
• An alternate definition of FD:Given a relation R, attribute Y of R is functionally
dependent on attribute X of R if and only if whenever two tuples of R, t1 and t2 agree on their X-value, t1[X] = t2[X] they must necessarily agree on their Y-value, t1[Y] = t2[Y].
Example:– Relation EMP_PROJ of Figure 4 satisfies the FD
• A functional dependency is a property of the meaning or semantic of the attributes in a relation schema.
• We use our understanding of the semantics of the attributes of R – that is, how they relate to one another – to specify the FD that should hold an all relational instances.
• Functional dependence is a semantic notion.– Recognizing the FDs is part of the process of understanding
what data means.