lecture 4 - nomalization...normalization • normalization is)atechnique)for)producing)aset)of)...

34
IS 263 – Database Concepts 1 Department of Computer Science and Engineering Lecture 4: Normalization Instructor: Henry Kalisti

Upload: others

Post on 22-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

IS  263  – Database  Concepts

1

Department  of  Computer  Science  and  Engineering

Lecture  4:  NormalizationInstructor:  Henry  Kalisti

Page 2: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Limitations  of  E-­‐‑R  Designs  • Provides  a  set  of  guidelines,  does  not  result  in  a  unique  database  schema  • Does  not  provide  a  way  of  evaluating  alternative  schemas  • Normalization  theory  provides  a  mechanism  for  analyzing  and  refining  the  schema  produced  by  an  E-­‐R  design  

2

Page 3: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Normalization• Normalization is  a  technique  for  producing  a  set  of  relations  with  desirable  properties,  given  the  data  requirements  of  the  enterprise  being  modeled.  • The  process  of  normalization  was  first  developed  by  Codd  in  1972.  • Normalization  is  often  performed  as  a  series  of  tests  on  a  relation  to  determine  whether  it  satisfies  or  violates  the  requirements  of  a  given  normal  form.  • Codd  initially  defined  three  normal  forms  called  first  (1NF),  second  (2NF),  and  third  (3NF).  Boyce  and  Codd  together  introduced  a  stronger  definition  of  3NF  called  Boyce-­‐Codd  Normal  Form  (BCNF)  in  1974.  

3

Page 4: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Normalization• All  four  of  these  normal  forms  are  based  on  functional  dependencies  among  the  attributes  of  a  relation.  • A  functional  dependency  describes  the  relationship  between  attributes  in  a  relation.  • For  example,  if  A  and  B  are  attributes  or  sets  of  attributes  of  relation  R,  B  is  functionally  dependent  on  A  (denoted  A  →  B),  if  each  value  of  A  is  associated  with  exactly  one  value  of  B.  

4

Page 5: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Normalization• In  1977  and  1979,  a  fourth  (4NF)  and  fifth  (5NF)  normal  form  were  introduced  which  go  beyond  BCNF.  However,  they  deal  with  situations  which  are  quite  rare.  Other  higher  normal  forms  have  been  subsequently  introduced,  but  all  of  them  are  based  on  dependencies  more  involved  than  functional  dependencies.  

5

Page 6: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Normalization• A  relational  schema  consists  of  a  number  of  attributes,  and  a  relational  database  schema  consists  of  a  number  of  relations.  • Attributes  may  be  grouped  together  to  form  a  relational  schema  based  largely  on  the  common  sense  of  the  database  designer,  or  by  mapping  the  relational  schema  from  an  ER  model.  • Whatever  approach  is  taken,  a  formal  method  is  often  required  to  help  the  database  designer  identify  the  optimal  grouping  of  attributes  for  each  relation  in  the  database  schema.  

6

Page 7: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Normalization• The  process  of  normalization  is  a  formal  method  that  identifies  relations  based  on  their  primary  or  candidate  keys  and  the  functional  dependencies  among  their  attributes.  • Normalization  supports  database  designers  through  a  series  of  tests,  which  can  be  applied  to  individual  relations  so  that  a  relational  schema  can  be  normalized  to  a  specific  form  to  prevent  the  possible  occurrence  of  update  anomalies.  

7

Page 8: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Purpose  of  Normalization  • To  avoid  redundancy  by  storing  each  ‘fact’  within  the  database  only  once.  • To  put  data  into  a  form  that  conforms  to  relational  principles  (e.g.,  single  valued  attributes,  each  relation  represents  one  entity)  -­‐ no  repeating  groups.  • To  put  the  data  into  a  form  that  is  more  able  to  accurately  accommodate  change.  • To  avoid  certain  updating  ‘anomalies’.  • To  facilitate  the  enforcement  of  data  constraints.   8

Page 9: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Redundancy  and  Data  Anomalies  

9

Example: We have the following relation that contains staff and department details:

staffNo job dept dname citySL10 Salesman 10 Sales Stratford SA51 Manager 20 Accounts BarkingDS40 Clerk 20 Accounts BarkingOS45 Clerk 30 Operations Barking

Redundancy and Data AnomaliesRedundant data is where we have stored the same ‘information’ more than once. i.e., the redundant data could be removed without the loss of information.

Insert Anomaly: We can’t insert a dept without inserting a member of staff that works in that department

Update Anomaly: We could change the name of the dept that SA51 works in without simultaneously changing the dept that DS40 works in.

Deletion Anomaly: By removing employee SL10 we have removed all information pertaining to the Sales dept.

Such ‘redundancy’could lead to the following ‘anomalies’

Page 10: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Repeating  Groups  

10

Repeating GroupsA repeating group is an attribute (or set of attributes) that can have more than one value for a primary key value.

Example: We have the following relation that contains staff and department details and a list of telephone contact numbers for each member of staff.

staffNo job dept dname city contact numberSL10 Salesman 10 Sales Stratford 018111777, 018111888, 079311122SA51 Manager 20 Accounts Barking 017111777DS40 Clerk 20 Accounts BarkingOS45 Clerk 30 Operations Barking 079311555

Repeating Groups are not allowed in a relational design, since all attributes have to be ‘atomic’ - i.e., there can only be one value per cell in a table!

Page 11: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Redundancy  and  Other  Problems  • Set  valued  attributes  in  the  E-­‐R  diagram  result  in  multiple  rows  in  corresponding  table  • Example:  Person  (SSN,  Name,  Address,  Hobbies)  • A  person entity  with  multiple  hobbies  yields  multiple  rows  in  table  Person• Hence,  the  association  between  Name  and  Address  for  the  same  person  is  stored  redundantly  

• SSN  is  key  of  entity  set,  but  (SSN,  Hobby)  is  key  of  corresponding  relation  • The  relation  Person can’t  describe  people  without  hobbies  

11

Page 12: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Example

12

Page 13: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Anomalies• Redundancy  leads  to  anomalies:• Update  anomaly:  A  change  in  Address  must  be  made  in  several  places  • Deletion  anomaly:  Suppose  a  person  gives  up  all  hobbies.  Do  we:  • Set  Hobby  attribute  to  null?  No,  since  Hobby  is  part  of  key  • Delete  the  entire  row?  No,  since  we  lose  other  information  in  the  row  

• Insertion  anomaly:  Hobby  value  must  be  supplied  for  any  inserted  row  since  Hobby  is  part  of  key  

13

Page 14: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Functional  Dependency  Functional DependencyFormal Definition: Attribute B is functionally dependant upon attribute A (or a collection of attributes) if a value of A determines a single value of attribute B at any one time.

Formal Notation: A o B This should be read as ‘A determines B’or ‘B is functionally dependant on A’. A is called the determinantand B is called the object of the determinant.

staffNo job dept dname SL10 Salesman 10 SalesSA51 Manager 20 AccountsDS40 Clerk 20 AccountsOS45 Clerk 30 Operations

Example:

staffNo o jobstaffNo o deptstaffNo o dnamedept o dname

Functional Dependencies

14

Page 15: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Functional  Dependency  Functional Dependency

Full Functional Dependency: Only of relevance with composite determinants. This is the situation when it is necessary to use all the attributes of the composite determinant to identify its object uniquely.

order# line# qty price A001 001 10 200A002 001 20 400A002 002 20 800A004 001 15 300

Example:

(Order#, line#) o qty(Order#, line#) o price

Full Functional Dependencies

Compound Determinants: If more than one attribute is necessary to determine another attribute in an entity, then such a determinant is termed a composite determinant.

15

Page 16: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Functional  Dependency  Functional Dependency

Partial Functional Dependency: This is the situation that exists if it is necessary to only use a subset of the attributes of the composite determinant to identify its object uniquely.

student# unit# room grade9900100 A01 TH224 29900010 A01 TH224 149901011 A02 JS075 39900001 A01 TH224 16

Example:

(student#, unit#) o gradeFull Functional Dependencies

unit# o roomPartial Functional Dependencies

Repetition of data!16

Page 17: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Transitive  Dependency  Transitive DependencyDefinition: A transitive dependency exists when there is an intermediate functional dependency.

Formal Notation: If A o B and B o C, then it can be stated that the following transitive dependency exists: A o B o C

staffNo job dept dname SL10 Salesman 10 SalesSA51 Manager 20 AccountsDS40 Clerk 20 AccountsOS45 Clerk 30 Operations

Example:staffNo o deptdept o dname

staffNo o dept o dname

Transitive Dependencies

Repetition of data! 17

Page 18: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Normalisation  -­‐‑ Relational Model  Normalisation - Relational Model

Relational Database Design: All attributes in a table must be atomic, and solely dependant upon the fully primary key of that table.

THE KEY, THE WHOLE KEY, AND NOTHING BUT THE KEY!

In order to comply with the relational model it is necessary to 1) remove repeating groups and 2) avoid redundancy and data anomalies by remoting partial and transitive functional dependencies.

18

Page 19: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Stages  of  Normalization  Unnormalised

(UDF)

First normal form(1NF)

Remove repeating groups

Second normal form(2NF)

Remove partial dependencies

Third normal form(3NF)

Remove transitive dependencies

Boyce-Codd normalform (BCNF)

Remove remaining functional dependency anomalies

Fourth normal form(4NF)

Remove multivalued dependencies

Fifth normal form(5NF)

Remove remaining anomalies

Stages of Normalisation

19

Page 20: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

UnnormalisedNormal  Form  (UNF)  Unnormalised Normal Form (UNF)

Definition: A relation is unnormalised when it has not had any normalisation rules applied to it, and it suffers from various anomalies.

This only tends to occur where the relation has been designed using a ‘bottom-up approach’. i.e., the capturing of attributes to a ‘Universal Relation’ from a screen layout, manual report, manual document, etc...

20

Page 21: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

UnnormalisedNormal  Form  (UNF)  

21

• Below  is  a  form  that  is  used  by  Customers  to  order  different  products  from  Demashi  Supplies.  Normalize  this  form  into  3NF.  

Unnormalised Normal Form (UNF)

ORDERCustomer No: 001964 Order Number: 00012345Name: Mark Campbell Order Date: 14-Feb-2002Address: 1 The House

LeytonstoneE11 9ZZ

Product Product Unit Order LineNumber Description Price Quantity Total

T5060 Hook 5.00 5 25.00

PT42 Bolt 2.50 10 20.50

QZE48 Spanner 20.00 1 20.00

Order Total: 65.50

Below is a form that is used by Customers to order different products from DemashiSupplies. Normalize this form into 3NF.

Page 22: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

First  Normal  Form  (1NF)  First Normal Form (1NF)Definition: A relation is in 1NF if, and only if, all its underlying attributes contain atomic values only.

Steps from UNF to 1NF: � Remove the outermost repeating group (and any nested repeated

groups it may contain) and create a new relation to contain it.� Add to this relation a copy of the PK of the relation immediately

enclosing it.� Name the new entity (appending the number 1 to indicate 1NF)� Determine the PK of the new entity� Repeat steps until no more repeating groups.

Remove repeating groups into a new relation

A repeating group is shown by a pair of brackets within the relational schema.

ORDER (order-no, order-date, cust-no, cust-name, cust-add, (prod-no, prod-desc, unit-price, ord-qty, line-total)*, order-total

22

Page 23: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Example  -­‐‑ UNF  to  1NF  Example - UNF to 1NFORDER (order-no, order-date, cust-no, cust-name, cust-add,

(prod-no, prod-desc, unit-price, ord-qty, line-total)*, order-total

1. Remove the outermost repeating group (and any nested repeated groups it may contain) and create a new relation to contain it. (rename original to indicate 1NF)

ORDER-1 (order-no, order-date, cust-no, cust-name, cust-add, order-total

(prod-no, prod-desc, unit-price, ord-qty, line-total)

2. Add to this relation a copy of the PK of the relation immediately enclosing it.

ORDER-1 (order-no, order-date, cust-no, cust-name, cust-add, order-total

(order-no, prod-no, prod-desc, unit-price, ord-qty, line-total)

3. Name the new entity (appending the number 1 to indicate 1NF)

ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total)

4. Determine the PK of the new entityORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total)

23

Page 24: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Second  Normal  Form  (2NF)  Second Normal Form (2NF)Definition: A relation is in 2NF if, and only if, it is in 1NF and every non-key attribute is fully dependent on the primary key.

Steps from 1NF to 2NF: � Remove the offending attributes that are only partially functionally

dependent on the composite key, and place them in a new relation.

� Add to this relation a copy of the attribute(s) which are the determinants of these offending attributes. These will automatically become the primary key of this new relation.

� Name the new entity (appending the number 2 to indicate 2NF)

� Rename the original entity (ending with a 2 to indicate 2NF)

Remove partial functional dependencies into a new relation

24

Page 25: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Example  -­‐‑ 1NF  to  2NF  Example - 1NF to 2NF

ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total)

1. Remove the offending attributes that are only partially functionally dependent on the composite key, and place them in a new relation.

ORDER-LINE-1 (order-no, prod-no, ord-qty, line-total)

(prod-desc, unit-price)

2. Add to this relation a copy of the attribute(s) which determines these offending attributes. These will automatically become the primary key of this new relation..

(prod-no, prod-desc, unit-price)

ORDER-LINE-1 (order-no, prod-no, ord-qty, line-total)

3. Name the new entity (appending the number 2 to indicate 2NF)

PRODUCT-2 (prod-no, prod-desc, unit-price)

4. Rename the original entity (ending with a 2 to indicate 2NF)

ORDER-LINE-2 (order-no, prod-no, ord-qty, line-total)

25

Page 26: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Third  Normal  Form  (3NF)  Third Normal Form (3NF)Definition: A relation is in 3NF if, and only if, it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.

Steps from 2NF to 3NF: � Remove the offending attributes that are transitively dependent on

non-key attribute(s), and place them in a new relation.

� Add to this relation a copy of the attribute(s) which are the determinants of these offending attributes. These will automatically become the primary key of this new relation.

� Name the new entity (appending the number 3 to indicate 3NF)

� Rename the original entity (ending with a 3 to indicate 3NF)

Remove transitive dependencies into a new relation

26

Page 27: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Example  -­‐‑ 2NF  to  3NF  Example - 2NF to 3NF

ORDER-2 (order-no, order-date, cust-no, cust-name, cust-add, order-total

1. Remove the offending attributes that are transitively dependent on non-key attributes, and place them in a new relation.

(cust-name, cust-add )

ORDER-2 (order-no, order-date, cust-no, order-total

2. Add to this relation a copy of the attribute(s) which determines these offending attributes. These will automatically become the primary key of this new relation..

(cust-no, cust-name, cust-add )

ORDER-2 (order-no, order-date, cust-no, order-total

3. Name the new entity (appending the number 3 to indicate 3NF)

CUSTOMER-3 (cust-no, cust-name, cust-add )

4. Rename the original entity (ending with a 3 to indicate 3NF)

ORDER-3 (order-no, order-date, cust-no, order-total

27

Page 28: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Example  -­‐‑ Relations  in  3NF  Example - Relations in 3NF

CUSTOMER-3 (cust-no, cust-name, cust-add )

ORDER-3 (order-no, order-date, cust-no, order-total

ORDER-LINE-2 (order-no, prod-no, ord-qty, line-total)

PRODUCT-2 (prod-no, prod-desc, unit-price)

CUSTOMER

ORDER

ORDER-LINE

PRODUCT

places placed by contains

part of

showsbelongs to

cust-no

order-no prod-no

order-no, prod-no28

Page 29: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Exercise

29

1)  Consider  the  following  table.  • Give  an  example  of  update  anomaly,  an  example  of  deletion  anomaly  and  an  example  of  insertion  anomaly  knowing  that• A  product  has  many  suppliers  and  can  have  many  other  products  as  a  substitute  (i.e.  a  product  can  be  replaced  by  its  substitute).• The  purchase  price  is  determined  by  a  supplier  for  a  product,  while  the  sale  price  is  for  a  given  product  regardless  of  the  supplier.• The  quantity  is  for  a  given  product,  again  regardless  of  the  supplier.  

Page 30: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Exercise  

30

DB- Introduction 27

Exercise1) Consider the following table. � Give an example of update anomaly, an example of deletion

anomaly and an example of insertion anomaly knowing that – A product has many suppliers and can have many other products as a

substitute (i.e. a product can be replaced by its substitute). – The purchase price is determined by a supplier for a product, while the

sale price is for a given product regardless of the supplier. – The quantity is for a given product, again regardless of the supplier.

Page 31: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Solution  

31

Update  Anomaly:  •Changing  the  quantity  of  a  product  implies  updating  the  quantity  for  as  many  suppliers  and  substitutes  there  is  for  the  product.  Deletion  Anomaly:  •By  deleting  the  only  substitute  of  a  product,  the  whole  product  entry  needs  to  be  removed.  Insertion  Anomaly:  •We  can’t  add  a  substitute  of  a  product  if  we  do  not  know  the  supplier  of  the  product.  

Page 32: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Exercise  continue…

32

2)  Give  a  schema  of  a  decomposition  that  avoids  such  anomalies.  

Page 33: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Solution

33

Product (ProductID,  Quantity,  SalePrice)  Suppliers (ProductID,  SupplierID,  PurchasePrice)  Substitutes  (ProductID,  Substitute)  

Page 34: Lecture 4 - Nomalization...Normalization • Normalization is)atechnique)for)producing)aset)of) relations)with)desirable)properties,given)the)data) requirements)of)the)enterprise)being)modeled

Home  Work

34

• Briefly  explain  disadvantages  of  normalization.