cs 377 database systems database design theory and...
TRANSCRIPT
CS 377 Database Systems
Database Design Theory and
Normalization
1
Normalization
Li Xiong
Department of Mathematics and Computer Science
Emory University
Relational database design� So far
� Conceptual database design - ER Model
� Logical database design - relational model
� Mapping from ER to Relational Model
� Relational Algebra
2
� Relational Algebra
� SQL
� Relational database design - relational model
� goodness measures of relational schemas
Anomalies
� Insert Anomaly:
� an insert operation that insert ONE item of information
needs to insert multiple tuples into some relation or needs
to use NULL values
� Delete Anomaly:
6
� Delete Anomaly:
� a delete operation that delete ONE item of information
needs to delete multiple tuples from some relation or
cause "additional" (unintended) information loss
� Update Anomaly:
� an update operation that update ONE item of information
needs to update multiple tuples and may result in logical
inconsistencies
Generation of Spurious Tuples
� Figure 15.5(a)
� Relation schemas EMP_LOCS and EMP_PROJ1
� NATURAL JOIN
� Result produces many more tuples than the original set of
9
� Result produces many more tuples than the original set of
tuples in EMP_PROJ
� Called spurious tuples
� Represent spurious information that is not valid
Problematic Designs
� Anomalies cause redundant work to be done
� Waste of storage space due to NULLs
� Difficulty of performing operations and joins due to
NULL values
10
NULL values
� Generation of invalid and spurious data during joins
Informal Design Guidelines
for “Good” Relation Schemas� Clear schema and attribute semantics
� No insertion, deletion, or update anomalies are
present
� Reducing redundant information in tuples
11
� Reducing redundant information in tuples
� Reducing NULL values in tuples
� Can be joined with equality conditions on related
attributes with guarantees that no spurious tuples
are generated
Database Design Theory� Normal forms
� Each Normal Form defines a set of properties that relations must
satisfy
� When relations possess these properties, they exhibit less anomalies
� Successively higher degrees of stringency
� Database normalization
12
� Certify whether a database design satisfies a certain normal form
� Correct a database design to achieve certain normal form
� Additional properties
� Nonadditive join property
� Dependency preservation property
History
� Relational database model
� 1970, Codd
� 1NF, 2NF and 3NF (first, second, and third normal form)
� 1972, Codd
� Based on the concept of functional dependency
13
� BCNF (Boyce-Codd Normal Form)
� 1974, Boyce & Codd
� new and stronger 3NF
� 4NF
� 1977, Fagin
� multi-valued dependencies
� 5NF (projection-join normal form)
� 1979, Fagin
First Normal Form
� Part of the formal definition of a relation in the
basic (flat) relational model
� Only attribute values permitted are single atomic
(or indivisible) values
14
(or indivisible) values
� Techniques to achieve first normal form
� Remove attribute violating 1NF and place in separate
relation
� Expand the key
� Use several atomic attributes if maximum number of
values is known
Functional Dependency
� Constraint between two sets of attributes
16
� X functionally determines Y
� Y is functionally dependent on X
� Notes
� If X is a candidate key of R, then X� R
� If X � Y, not necessarily Y � X
Functional Dependency� An FD is a property of semantics or meaning of the attributes
� An FD is a property of the relational schema, not of a particular
legal relation state
� An FD must be defined based on the semantics of the attributes
� An FD cannot be inferred automatically from a given populated
relation
19
relation
� An FD may exist
� Can state that an FD does not hold if there are violations of such an FD
Definitions of Keys and Attributes
Participating in Keys� Definition of superkey and key
� Candidate key
� If more than one key in a relation schema
• One is primary key
20
• One is primary key
• Others are secondary keys
Second Normal Form
� Full functional dependency vs. Partial functional
dependency
� X�Y is a full functional dependency if for any A, (X-
{A}) does not functionally determine Y
� X�Y is a partial functional dependency if for some A,
21
X Y is a partial functional dependency if for some A,
(X-{A}) functionally determines Y
� Second normal form (2NF)
� Problematic FD
� Left-hand side is part of primary key
Third Normal Form
� Transitive dependency� X�Y is a transitive dependency if for some Z that is not a prime
attribute, both X�Z and Z�Y hold
� Third normal form
24
� Problematic FD� Left-hand side is part of primary key
� Left-hand side is a nonkey attribute
Boyce-Codd Normal Form� BCNF
� Difference from 3NF:
� 3NF allows A to be prime
31
� 3NF allows A to be prime
� Every relation in BCNF is also in 3NF
� Most relation schemas that are in 3NF are also in
BCNF but not all: