schema refinement and normal formskrunapon/courses/178370/slides/lecture4.pdf · to bcnf, 4nf, and...

13
Electricite Du Laos (EDL) Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 1 1 Asst. Prof. Dr. Kanda Runapongsa Saikaew ([email protected] ) Department of Computer Engineering Khon Kaen University Database Normalization 2 Overview What and why normalization Background to normalization Functional dependency Superkey Candidate key Primary key Normal forms and their purposes 1NF 2NF 3NF 3 What and why normalization Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics insertion, update, and deletion anomaliesthat could lead to a loss of data integrity Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems and to suggest refinements.

Upload: vuongthuan

Post on 21-Mar-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 1

1

Asst. Prof. Dr. Kanda

Runapongsa Saikaew

([email protected])

Department of Computer Engineering

Khon Kaen University

Database Normalization

2

Overview

What and why normalization

Background to normalization

Functional dependency

Superkey

Candidate key

Primary key

Normal forms and their purposes

1NF

2NF

3NF

3

What and why normalization

Normalization is a systematic way of ensuring that a

database structure is suitable for general-purpose

querying and free of certain undesirable

characteristics —insertion, update, and deletion

anomalies—that could lead to a loss of data integrity

Integrity constraints, in particular functional dependencies, can be used to identify schemas with

such problems and to suggest refinements.

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 2

Normalized tables

Informally, a relational database table

is often described as "normalized" if it

is in the third normal form (3NF)

Most 3NF tables are free of insertion,

update, and deletion anomalies, i.e. in

most cases 3NF tables adhere

to BCNF, 4NF, and 5NF (but typically

not 6NF)

4

When we apply normalization

A standard piece of database design

guidance is that the designer should

begin by fully normalizing the design

Selectively denormalize only in places

where doing so is absolutely necessary

to address performance issues

Some modeling disciplines, such as the

dimensional modeling approach to data

warehouse design, explicitly

recommend non-normalized designs 5

6

Functional dependency

Functional dependency: Attribute B has a

functional dependency on attribute B (i.e., A ->

B)

If, for each value of attribute A, there is exactly one

value of attribute B

If value of A is repeating in tuples then value of B

will also repeat

In our example, branch address and

branch phone number has a functional

dependency on branch no.

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 3

7

All instance that satisfies AB C

A B C D

a1 b1 c1 d1

a1 b1 c1 d2

a1 b2 c2 d1

a2 b1 c3 d1

8

Trivial functional dependency

Trivial functional dependency

A trivial functional dependency is a

functional dependency of attribute of a

superset of itself

Examples

{Branch No, Branch Address} -> {Branch

Address}

{Branch Address} -> {Branch Address}

Full functional dependency

An attribute is fully functional dependency

on a set of attributes X if it is

Functionally dependent on X, and

Not functionally dependent on any

proper subset of X

Example:

{Employee Address} has a functional

dependency on {Employee ID, Skill}, but

not a full functional dependency

because it is also dependent on

{Employee ID} 9

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 4

Transitive dependency

Transitive dependency is an indirect

functional dependency, one in

which X→Z only by virtue

of X→Y and Y→Z

Example:

Employee ID -> Employee office

Employee office -> Employee phone

Employee ID -> Employee phone

10

Superkey

Superkey

A superkey is an attribute or set of

attributes that uniquely identifies rows

within a table

Two distinct rows are always guaranteed to

have distinct superkeys

Example

{Employee ID, Salary, Year} would be a

superkey for the {Employee ID, Salary,

Bonus, Year} table

{Employee ID, Year} would also be a

superkey11

Candidate key

Candidate key is a minimal superkey,

that is, a superkey for which we can

say that no proper subset of it is also

a superkey

Example:

{Employee ID, Year} would be a

candidate key for the table {Employee

ID, Salary, Bonus, Year} table.

12

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 5

Non-prime attribute

A non-prime attribute is an attribute

that does not occur in any candidate

key

Bonus would be a non-prime attribute

in the "Employees' Bonus" table.

13

Primary key

Most DBMSs require a table to be

defined as having a single unique

key, rather than a number of possible

unique keys

A primary key is a candidate key

which the database designer has

designated for this purpose

14

15

Example: Constraints on Entity Set

Consider relation obtained from Hourly_Emps:

Hourly_Emps (ssn, name, lot, rating, hrly_wages,

hrs_worked)

Notation: We will denote this relation schema by

listing the attributes: SNLRWH

This is really the set of attributes {S,N,L,R,W,H}.

Sometimes, we will refer to all attributes of a relation by

using the relation name. (e.g., Hourly_Emps for SNLRWH)

Some FDs on Hourly_Emps:

ssn is the key: S SNLRWH

rating determines hrly_wages: R W

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 6

16

Data redundancy example1 (1/3)

ssn name lot rating Hr_Wages Hr_Worked

1 A 48 8 10 40

2 B 22 8 10 30

3 C 35 5 7 30

4 D 35 5 7 32

5 E 35 8 10 40

17

Data redundancy example1 (2/3)

Redundant Storage: The rating value

8 corresponds to the hourly wage 10,

and this association is repeated three

times

Update Anomalies: The

hourly_wages in the first tuple could

be updated without making a similar

change in the second tuple

18

Data redundancy example1 (3/3)

Insertion Anomalies: We cannot insert a

tuple for an employee unless we know the

hourly wage for the employee‟s rating

value

Deletion Anomalies: If we delete all tuples

with a given rating value (e.g., we delete

the tuples for C and D) we lose the

association between the rating value and

its hourly_wage value

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 7

19

Decompositions (1/2)

Redundancy arises when a relational

schema forces an association

between attributes that is not natural.

Functional dependencies can be used

to identify such situations and

suggest refinement to the schema

20

Decompositions (2/2)

The essential idea is that many problems

arising from redundancy can be addressed

by replacing a relation with a collection of

„smaller‟ relations

R consists of replacing the relation

schema by two (or more) relation schemas

that each contain a subset of attributes of

R and together include all attributes in R

21

Examples

We can decompose Hourly_Emps into two relations:

Hourly_Emps2(ssn, name, lot, rating, hours_worked)

Wages(rating, hourly_wages)

This is more efficient than updating several tuples (as in the original design), and it eliminates the potential for inconsistency

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 8

Data redundancy example2

22

Redundant Storage: the branchNo B001 and B002 corresponds to their branchAddress and telNo are repeated

Example2 Problem

Update Anomalies: If we change address of

Tom Daniels, we also need to change

address of Sally Adams

Insertion Anomalies: We cannot insert a

tuple for a staff unless we know the address

and phone number of the branch of that stuff

Deletion Anomalies: If we delete all tuples

with a given branch value (e.g., we delete the

tuples for branch B002) we lose the

association between the branch and its

address and phone number23

Example2 Solution

24

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 9

25

Normal Forms (1/2)

Given a relation schema, we need to

decide whether it is a good design or

we need to decompose it into smaller

relations

To provide such guidance, several

normal forms have been proposed

26

Normal Forms (2/2)

The normal forms based on FDs are

first normal form (1NF), second

normal form (2NF), and third normal

form (3NF)

Every relation in 3NF is also in 2NF

Every relation in 2NF is also in 1NF

First Normal Form (1NF)

A relation is in first normal form if

every field contains only atomic

values, that is, no lists or sets

This requirement is implicit in our

definition of the relation model

Table faithfully represents

a relation and has no "repeating

groups"

27

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 10

Branch Table is not in 1NF

28

Converting Branch Table into 1NF

29

Second normal form (2NF)

2NF only applies to tables with

composite primary keys.

No non-prime attribute in the table

is functionally dependent on a part

(proper subset) of a candidate key

30

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 11

TempStaffAllocation Table is not in 2NF

31

Formal Definition of 2NF

Formal definition of 2NF is a table

that is in 1NF and every non-prime

attribute is fully functional dependent on the primary key

Full functional dependency indicates

that if A and B are columns of a table,

B is fully dependent on A if B is

functionally dependent on A but not

on any proper subset of A

32

Converting TempStaffAllocation table to 2NF

33

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 12

Third Normal Form (3NF)

A table that is in 1NF and 2NF and in

which all non-prime attributes can be

derived from only the primary key column(s) and no other columns

The formal definition of 3NF is a table

that is in 1NF and 2NF and in which

no non-prime attributes is transitively

dependent on the primary key

34

StaffBranch table is not in 3NF

35

Converting StaffBranch table to 3NF

36

Electricite Du Laos (EDL)

Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 13

Summary (1/2)

Normalization is a systematic way of

ensuring that a database structure is

suitable for general-purpose querying

Free of certain undesirable

characteristics—insertion, update, and

deletion anomalies—that could lead to a

loss of data integrity

Informally, a relational database table

is often described as "normalized" if it

is in the 3NF37

Summary (2/2)

1NF: Table faithfully represents

a relation and has no "repeating

groups"

2NF: No non-prime attribute in the

table is functionally dependent on a

part (proper subset) of a candidate

key

3NF: Every non-prime attribute is

non-transitively dependent on every

key of the table 38

References

Connolly and Begg, “Database

Systems: A Practical Approach to

Design, Implementation and

Management”, Pearson, 2004

http://en.wikipedia.org/wiki/Database_

normalization

Ramakrishnan and Gehrke,

“Database Management Systems”,

McGraw-Hill

Science/Engineering/Math, 2003 39