db03 normalization full
TRANSCRIPT
-
8/13/2019 DB03 Normalization Full
1/27
5
Chapter 5
Normalization of Database
Tables
Database Systems: Design, Implementation, and
Management, Rob and Coronel
Special adaptation for INFS-3200
-
8/13/2019 DB03 Normalization Full
2/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 2
Database Tables and Normalization
Table is basic building block in database design Normalization is process for assigning attributes to
entities. Why:
Reduces data redundancies
Helps eliminate data anomalies
Produces controlled redundancies to link tables and
therefore, establish relationships.
GENERAL GUIDELINES:
Define business rules
Define level of detail (granularity)
details & aggregates
Each table must represent one and only one subject only
All attributes in the table must be fully dependent on the PK, the entire
PK and nothing but the PK.
-
8/13/2019 DB03 Normalization Full
3/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 3
Database Tables and Normalization
Normalization stages 1NF - First normal form:
Put data in table format.
Eliminate repeating groups.
Select a suitable primary key.
2NF - Second normal form Eliminate partial dependencies.
3NF - Third normal form
Eliminate transitive dependencies
BCNF - Boyce-Codd normal form
Every determinant in the table is a candidate key.
4NF - Fourth normal form
Eliminate independent multi-valued set of facts.
-
8/13/2019 DB03 Normalization Full
4/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 4
Database Tables and NormalizationOriginal Report Data
-
8/13/2019 DB03 Normalization Full
5/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 5
Sample Data for Project ReportReport Data in Table Format
(incomplete)
Observations
PROJ_NUM intended to be
primary key
Table entries invite data
inconsistencies
Table tend to create dataanomalies:
Update
Modifying JOB_CLASS
Insertion
New employee must be
assigned project
Deletion
If employee deleted, other
vital data lost
-
8/13/2019 DB03 Normalization Full
6/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 6
Conversion to 1NF
Repeating groups must be eliminated (duplicated columntypes/multi-valued columns)
Tabular format
Each cell has single value - No repeating groups
In our case: PROJ_NUM, EMP_NUM, etc.
Proper primary key developed Uniquely determines (identifies) attribute values (in
each row) In our case: combination of PROJ_NUM and EMP_NUM
Identify Dependencies
Desirable dependencies based on primary key
Less desirable dependencies
Partial
based on part of composite primary key
Transitive
one nonprime attribute depends on another nonprime attribute
-
8/13/2019 DB03 Normalization Full
7/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 7
Data Organization: 1NF
-
8/13/2019 DB03 Normalization Full
8/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 8
Dependency Diagram (1NF)
PROJ_NUM
Transitive
dependency
EMP_NUMPROJ_NAME EMP_NAME HOURS
Partial dependency
JOB_CLASS CHG_HOUR
Partial dependencies
1NF (PROJ_NUM, EMP_NUM, PROJ_NUM,EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)
Partial Dependencies:
(PROJ_NUM ->PROJ_NAME)
(EMP_NUM ->EMP_NAME, JOB_CLASS, CHG_HOUR)
Transitive Dependencies:
(JOB_CLASS ->CHG_HOUR)
-
8/13/2019 DB03 Normalization Full
9/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 9
1NF Summarized
Data in a tabular format No repeating groups in table (repeated attributes or
multi-value attributes )
Primary key attribute(s) identified
All dependent attributes depend on the primary key
Identify all partial and transitive dependencies
Partial
Attributes that depend on part of PK
Can only exist if the table has a composite PK
Transitive Non-PK attribute(s) determines other attributes
JOB_CLASS determines CHG_HOUR
JOB_CLASS is the determinantattribute
CHG_HOUR is the dependentattribute
-
8/13/2019 DB03 Normalization Full
10/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 10
Conversion to 2NF
Start with 1NF format: Write each key component on separate line
Write original key on last line
Each component is new table
Write dependent attributes after each key
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
Transitive Dependencies:
(JOB_CLASS ->CHG_HOUR)
-
8/13/2019 DB03 Normalization Full
11/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 11
2NF Conversion Results
PROJ_NUM
Transitive
dependency
EMP_NUM
PROJ_NAME
EMP_NAME JOB_CLASS CHG_HOUR
EMP_NUM PROJ_NUM ASSIGN_HOURS
Table name: PROJECT
Table name: ASSIGN
Table name: EMPLOYEE
-
8/13/2019 DB03 Normalization Full
12/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 12
2NF Summarized
In 1NF Includes no partial dependencies
No attribute dependent on a portion of primary key
Still possible to exhibit transitive dependency
Attributes may be functionally dependent on non-keyattributes
-
8/13/2019 DB03 Normalization Full
13/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 13
Conversion to 3NF
Create separate table(s) to eliminate transitivefunctional dependencies
For every transitive dependency:
Write determinantas PK of new table
Write all dependentattributes for each determinant
Delete the dependent attribute(s) from the original table(s)
Leave the determinant attribute(s) in the original table(s)
PROJECT (PROJ_NUM, PROJ_NAME)ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
-
8/13/2019 DB03 Normalization Full
14/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 14
Conversion to 3NF
PROJ_NUM
EMP_NUM
PROJ_NAME
EMP_NAME JOB_CLASS
EMP_NUM PROJ_NUM ASSIGN_HOURS
Table name: PROJECT
Table name: ASSIGN
Table name: EMPLOYEE
JOB_CLASS CHG_HOUR
Table name: JOB
-
8/13/2019 DB03 Normalization Full
15/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 15
3NF Summarized
In 2NF Contains no transitive dependencies
-
8/13/2019 DB03 Normalization Full
16/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 16
Improving the Design
1. Evaluate PK Assignment (meet PK guidelines)2. Evaluate Naming Conventions
3. Refine Attribute Atomicity (simple, single-valued)
4. Identify New Attributes
5. Identify New Relationships (decompose M:M)
6. Refine Primary Keys (as required for data granularity)
7. Maintain Historical Transactional Accuracy
8. Identify Use of Derived Attributes
-
8/13/2019 DB03 Normalization Full
17/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 17
Improving the Design
(Job Table)
JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR
Table name: JOB
1. Evaluate PK Assignment
Introduce a better suited PKfree of semantic content (non-intelligent PK)
Add JOB_CODE as surrogatePK.
To reduce data entry errors
Repeat for other tables (see #6Assign)*
2. Evaluate Naming Conventions
JOB_CLASS is actually adescription of the job, changeto JOB_DESCRIPTION.
CHG_HOURS should be
JOB_CHG_HOUR
-
8/13/2019 DB03 Normalization Full
18/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 18
Improving the Design
(Employee Table)
3. Refine Attribute Atomicity
Decompose compositeattributes into simple
attributes
EMP_NAME should be
EMP_LNAME
EMP_FNAME
EMP_INITIAL
4. Identify New Attributes
Add new attributes that
describe real world entity
characteristics
EMP_HIREDATE
EMP_NUM EMP_LNAME EMP_FNAME
Table name: EMPLOYEE
EMP_INITIAL EMP_HIREDATE JOB_CODE
-
8/13/2019 DB03 Normalization Full
19/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 19
Improving the Design
(Project Table)
5. Identify New Relationships
Add new relationships as
required by business rules.
A project is managed by an
employee, an employee
can be the manager of only
one project.
Add EMP_NUM as FK in
PROJECT
PROJ_NUM PROJ_NAME
Table name: PROJECT
EMP_NUM
-
8/13/2019 DB03 Normalization Full
20/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 20
Improving the Design
(Assign Table)
4. Identify new attributes
Add ASSIGN_DATE
6. Refine Primary Keys
Consider the granulity of the
data being represented in
order to determine the PK.
Can an employee have
multiple hours worked entries
for a given day in a given
project?
If yes, add ASSIGN_NUM as
surrogate PK.
7. Maintain Historical Transaction
Accuracy
Add ASSIGN_CHG_HOUR
8. Identify Use of Derived Attributes
Add ASSIGN_CHARGE
ASSIGN_NUM ASSIGN_DATE PROJ_NUM
Table name: ASSIGN
EMP_NUM ASSIGN_HOURS ASSIGN_CHARGEASSIGN_CHG_HOUR
-
8/13/2019 DB03 Normalization Full
21/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 21
Limitations of System Assigned PK
Surrogate PK ensures that each row
has an unique ID, not that the rowsdependent values are unique.
JOB_CODE system assigned PK
We st i l l could have dupl icate values:
511 Programmer 35.75
512 Programmer 35.75
Clearly, entries are duplicated!
To ensure unique values we must
have create an uniq ue index o n al l
candidate keys.
Unique index on
JOB_DESCRIPTION
This still will still not avoid data
entry errors!
513 Progranmer 35.75
JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR
Table name: JOB
-
8/13/2019 DB03 Normalization Full
22/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 22
Normalization and Database Design
First, business rules must be determined Determine the granularity of the data in each entity.
Normalization should be part of the design process
E-R Diagram provides macro view (conceptual)
Normalization provides micro view of entities (logical) Focuses on characteristics of specific entities
May yield additional entities/relationships
Difficult to separate normalization from E-R
diagrammingcomplementary
-
8/13/2019 DB03 Normalization Full
23/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 23
Initial ERD for Contracting Company
1. A company has many projects.2. Each project requires the services of many
employees. An employee may be assigned to several
different projects.
3. Some employees are not assigned to a project.4. Each employee has a single primary job
classification. Many employees can have the same
job classification.
5. The job classification determines the hourly billingrate.
-
8/13/2019 DB03 Normalization Full
24/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 24
Modified ERD for
Contracting Company
Figure 4.11
-
8/13/2019 DB03 Normalization Full
25/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 25
Final ERD for
Contracting Company
Figure 4.12
-
8/13/2019 DB03 Normalization Full
26/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 26
Denormalization
Normalization is one of many database design goals Normalization creates many small tables with PK/FK
Reporting requirements over normalized tables
requiresmultiple table joins to get complete data:
Additional processing (join operations) Additional I/Os operations
Design must find right balance among:
Design Integrity requirements
Information requirements Performance requirements
-
8/13/2019 DB03 Normalization Full
27/27
5
Database Systems: Design, Implementation, & Management, Rob & Coronel 27
Unnormalized Table Defects
Data updates less efficient Normalizationensures that data
is updated (insert/update/delete)
only once in one place.
Indexing more cumbersome
No simple strategies for creating views