db03 normalization full

Upload: karelle-bhoorasingh

Post on 04-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 DB03 Normalization Full

    1/27

    5

    Chapter 5

    Normalization of Database

    Tables

    Database Systems: Design, Implementation, and

    Management, Rob and Coronel

    Special adaptation for INFS-3200

  • 8/13/2019 DB03 Normalization Full

    2/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 2

    Database Tables and Normalization

    Table is basic building block in database design Normalization is process for assigning attributes to

    entities. Why:

    Reduces data redundancies

    Helps eliminate data anomalies

    Produces controlled redundancies to link tables and

    therefore, establish relationships.

    GENERAL GUIDELINES:

    Define business rules

    Define level of detail (granularity)

    details & aggregates

    Each table must represent one and only one subject only

    All attributes in the table must be fully dependent on the PK, the entire

    PK and nothing but the PK.

  • 8/13/2019 DB03 Normalization Full

    3/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 3

    Database Tables and Normalization

    Normalization stages 1NF - First normal form:

    Put data in table format.

    Eliminate repeating groups.

    Select a suitable primary key.

    2NF - Second normal form Eliminate partial dependencies.

    3NF - Third normal form

    Eliminate transitive dependencies

    BCNF - Boyce-Codd normal form

    Every determinant in the table is a candidate key.

    4NF - Fourth normal form

    Eliminate independent multi-valued set of facts.

  • 8/13/2019 DB03 Normalization Full

    4/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 4

    Database Tables and NormalizationOriginal Report Data

  • 8/13/2019 DB03 Normalization Full

    5/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 5

    Sample Data for Project ReportReport Data in Table Format

    (incomplete)

    Observations

    PROJ_NUM intended to be

    primary key

    Table entries invite data

    inconsistencies

    Table tend to create dataanomalies:

    Update

    Modifying JOB_CLASS

    Insertion

    New employee must be

    assigned project

    Deletion

    If employee deleted, other

    vital data lost

  • 8/13/2019 DB03 Normalization Full

    6/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 6

    Conversion to 1NF

    Repeating groups must be eliminated (duplicated columntypes/multi-valued columns)

    Tabular format

    Each cell has single value - No repeating groups

    In our case: PROJ_NUM, EMP_NUM, etc.

    Proper primary key developed Uniquely determines (identifies) attribute values (in

    each row) In our case: combination of PROJ_NUM and EMP_NUM

    Identify Dependencies

    Desirable dependencies based on primary key

    Less desirable dependencies

    Partial

    based on part of composite primary key

    Transitive

    one nonprime attribute depends on another nonprime attribute

  • 8/13/2019 DB03 Normalization Full

    7/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 7

    Data Organization: 1NF

  • 8/13/2019 DB03 Normalization Full

    8/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 8

    Dependency Diagram (1NF)

    PROJ_NUM

    Transitive

    dependency

    EMP_NUMPROJ_NAME EMP_NAME HOURS

    Partial dependency

    JOB_CLASS CHG_HOUR

    Partial dependencies

    1NF (PROJ_NUM, EMP_NUM, PROJ_NUM,EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)

    Partial Dependencies:

    (PROJ_NUM ->PROJ_NAME)

    (EMP_NUM ->EMP_NAME, JOB_CLASS, CHG_HOUR)

    Transitive Dependencies:

    (JOB_CLASS ->CHG_HOUR)

  • 8/13/2019 DB03 Normalization Full

    9/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 9

    1NF Summarized

    Data in a tabular format No repeating groups in table (repeated attributes or

    multi-value attributes )

    Primary key attribute(s) identified

    All dependent attributes depend on the primary key

    Identify all partial and transitive dependencies

    Partial

    Attributes that depend on part of PK

    Can only exist if the table has a composite PK

    Transitive Non-PK attribute(s) determines other attributes

    JOB_CLASS determines CHG_HOUR

    JOB_CLASS is the determinantattribute

    CHG_HOUR is the dependentattribute

  • 8/13/2019 DB03 Normalization Full

    10/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 10

    Conversion to 2NF

    Start with 1NF format: Write each key component on separate line

    Write original key on last line

    Each component is new table

    Write dependent attributes after each key

    PROJECT (PROJ_NUM, PROJ_NAME)

    EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)

    ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

    Transitive Dependencies:

    (JOB_CLASS ->CHG_HOUR)

  • 8/13/2019 DB03 Normalization Full

    11/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 11

    2NF Conversion Results

    PROJ_NUM

    Transitive

    dependency

    EMP_NUM

    PROJ_NAME

    EMP_NAME JOB_CLASS CHG_HOUR

    EMP_NUM PROJ_NUM ASSIGN_HOURS

    Table name: PROJECT

    Table name: ASSIGN

    Table name: EMPLOYEE

  • 8/13/2019 DB03 Normalization Full

    12/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 12

    2NF Summarized

    In 1NF Includes no partial dependencies

    No attribute dependent on a portion of primary key

    Still possible to exhibit transitive dependency

    Attributes may be functionally dependent on non-keyattributes

  • 8/13/2019 DB03 Normalization Full

    13/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 13

    Conversion to 3NF

    Create separate table(s) to eliminate transitivefunctional dependencies

    For every transitive dependency:

    Write determinantas PK of new table

    Write all dependentattributes for each determinant

    Delete the dependent attribute(s) from the original table(s)

    Leave the determinant attribute(s) in the original table(s)

    PROJECT (PROJ_NUM, PROJ_NAME)ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

    EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)

    JOB (JOB_CLASS, CHG_HOUR)

  • 8/13/2019 DB03 Normalization Full

    14/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 14

    Conversion to 3NF

    PROJ_NUM

    EMP_NUM

    PROJ_NAME

    EMP_NAME JOB_CLASS

    EMP_NUM PROJ_NUM ASSIGN_HOURS

    Table name: PROJECT

    Table name: ASSIGN

    Table name: EMPLOYEE

    JOB_CLASS CHG_HOUR

    Table name: JOB

  • 8/13/2019 DB03 Normalization Full

    15/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 15

    3NF Summarized

    In 2NF Contains no transitive dependencies

  • 8/13/2019 DB03 Normalization Full

    16/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 16

    Improving the Design

    1. Evaluate PK Assignment (meet PK guidelines)2. Evaluate Naming Conventions

    3. Refine Attribute Atomicity (simple, single-valued)

    4. Identify New Attributes

    5. Identify New Relationships (decompose M:M)

    6. Refine Primary Keys (as required for data granularity)

    7. Maintain Historical Transactional Accuracy

    8. Identify Use of Derived Attributes

  • 8/13/2019 DB03 Normalization Full

    17/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 17

    Improving the Design

    (Job Table)

    JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR

    Table name: JOB

    1. Evaluate PK Assignment

    Introduce a better suited PKfree of semantic content (non-intelligent PK)

    Add JOB_CODE as surrogatePK.

    To reduce data entry errors

    Repeat for other tables (see #6Assign)*

    2. Evaluate Naming Conventions

    JOB_CLASS is actually adescription of the job, changeto JOB_DESCRIPTION.

    CHG_HOURS should be

    JOB_CHG_HOUR

  • 8/13/2019 DB03 Normalization Full

    18/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 18

    Improving the Design

    (Employee Table)

    3. Refine Attribute Atomicity

    Decompose compositeattributes into simple

    attributes

    EMP_NAME should be

    EMP_LNAME

    EMP_FNAME

    EMP_INITIAL

    4. Identify New Attributes

    Add new attributes that

    describe real world entity

    characteristics

    EMP_HIREDATE

    EMP_NUM EMP_LNAME EMP_FNAME

    Table name: EMPLOYEE

    EMP_INITIAL EMP_HIREDATE JOB_CODE

  • 8/13/2019 DB03 Normalization Full

    19/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 19

    Improving the Design

    (Project Table)

    5. Identify New Relationships

    Add new relationships as

    required by business rules.

    A project is managed by an

    employee, an employee

    can be the manager of only

    one project.

    Add EMP_NUM as FK in

    PROJECT

    PROJ_NUM PROJ_NAME

    Table name: PROJECT

    EMP_NUM

  • 8/13/2019 DB03 Normalization Full

    20/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 20

    Improving the Design

    (Assign Table)

    4. Identify new attributes

    Add ASSIGN_DATE

    6. Refine Primary Keys

    Consider the granulity of the

    data being represented in

    order to determine the PK.

    Can an employee have

    multiple hours worked entries

    for a given day in a given

    project?

    If yes, add ASSIGN_NUM as

    surrogate PK.

    7. Maintain Historical Transaction

    Accuracy

    Add ASSIGN_CHG_HOUR

    8. Identify Use of Derived Attributes

    Add ASSIGN_CHARGE

    ASSIGN_NUM ASSIGN_DATE PROJ_NUM

    Table name: ASSIGN

    EMP_NUM ASSIGN_HOURS ASSIGN_CHARGEASSIGN_CHG_HOUR

  • 8/13/2019 DB03 Normalization Full

    21/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 21

    Limitations of System Assigned PK

    Surrogate PK ensures that each row

    has an unique ID, not that the rowsdependent values are unique.

    JOB_CODE system assigned PK

    We st i l l could have dupl icate values:

    511 Programmer 35.75

    512 Programmer 35.75

    Clearly, entries are duplicated!

    To ensure unique values we must

    have create an uniq ue index o n al l

    candidate keys.

    Unique index on

    JOB_DESCRIPTION

    This still will still not avoid data

    entry errors!

    513 Progranmer 35.75

    JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR

    Table name: JOB

  • 8/13/2019 DB03 Normalization Full

    22/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 22

    Normalization and Database Design

    First, business rules must be determined Determine the granularity of the data in each entity.

    Normalization should be part of the design process

    E-R Diagram provides macro view (conceptual)

    Normalization provides micro view of entities (logical) Focuses on characteristics of specific entities

    May yield additional entities/relationships

    Difficult to separate normalization from E-R

    diagrammingcomplementary

  • 8/13/2019 DB03 Normalization Full

    23/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 23

    Initial ERD for Contracting Company

    1. A company has many projects.2. Each project requires the services of many

    employees. An employee may be assigned to several

    different projects.

    3. Some employees are not assigned to a project.4. Each employee has a single primary job

    classification. Many employees can have the same

    job classification.

    5. The job classification determines the hourly billingrate.

  • 8/13/2019 DB03 Normalization Full

    24/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 24

    Modified ERD for

    Contracting Company

    Figure 4.11

  • 8/13/2019 DB03 Normalization Full

    25/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 25

    Final ERD for

    Contracting Company

    Figure 4.12

  • 8/13/2019 DB03 Normalization Full

    26/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 26

    Denormalization

    Normalization is one of many database design goals Normalization creates many small tables with PK/FK

    Reporting requirements over normalized tables

    requiresmultiple table joins to get complete data:

    Additional processing (join operations) Additional I/Os operations

    Design must find right balance among:

    Design Integrity requirements

    Information requirements Performance requirements

  • 8/13/2019 DB03 Normalization Full

    27/27

    5

    Database Systems: Design, Implementation, & Management, Rob & Coronel 27

    Unnormalized Table Defects

    Data updates less efficient Normalizationensures that data

    is updated (insert/update/delete)

    only once in one place.

    Indexing more cumbersome

    No simple strategies for creating views