database design functional dependences normal forms d. christozov / g.tuparov inf 280 database...

37
Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Upload: sylvia-harper

Post on 29-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design

FUNCTIONAL DEPENDENCESNORMAL FORMS

D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Page 2: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Objectives• Purpose of normalization.• Problems associated with redundant data.• Identification of various types of update anomalies

such as insertion, deletion, and modification anomalies.

• How to recognize appropriateness or quality of the design of relations.

• Use of functional dependencies to group attributes.• The process of normalization.• Normal forms: 1NF, 2NF, 3NF• Boyce–Codd (BCNF) normal form.D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms 2

Page 3: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design• What is relational database design?– The grouping of attributes to form "good" relation

schemas.• Two levels of relation schemas– The logical "user view" level– The storage "base relation" level

• Design is concerned mainly with base relations.• What are the criteria for "good" base relations?

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 3

Page 4: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Data Redundancy

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 4

Page 5: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design• Informal guidelines for good relational design.• Formal concepts of functional dependencies and

normal forms:– 1NF (First Normal Form);– 2NF (Second Normal Form);– 3NF (Third Normal Form);– BCNF (Boyce-Codd Normal Form).

• Normalization vs. de-normalization.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 5

Page 6: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Guideline 1• GUIDELINE 1: Informally, each tuple in a relation

should represent one entity or relationship instance. (Applies to individual relations and their attributes).– Attributes of different entities (EMPLOYEEs,

DEPARTMENTs, PROJECTs) should not be mixed in the same relation.

– Only foreign keys should be used to refer to other entities.– Entity and relationship attributes should be kept apart as

much as possible.

• Bottom Line: Design a schema that can be explained easily relation by relation. The semantics of attributes should be easy to interpret.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 6

Page 7: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Guideline 1

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 7

Employee-Department

DMGRSSNDNameDNumAddressBdateSSNEname

PLocationPnameEnameHoursPNumberSSN

Employee-Project

Page 8: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Anomalies (1)

Redundant Information in Tuples and Update Anomalies• Mixing attributes of multiple entities may cause

problems.• Information is stored redundantly wasting storage.• Problems with update anomalies:– Insertion anomalies;– Deletion anomalies;– Modification anomalies.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 8

Page 9: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Anomalies (2)

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 9

Page 10: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Anomalies (3)Consider the relation:EMP_PROJ ( SSN, PNumber, Hours, Ename, Pname, ..)

• Update Anomaly: Changing the name of project number P1 from “ProjectX” to “ProjectM” may cause this update to be made for all employees working on the project P1.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 10

PLocationPnameEnameHoursPNumberSSN

EMP_PROJ

Page 11: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Anomalies (4)• Insert Anomaly: Cannot insert a project unless an

employee is assigned to.• Inversely - Cannot insert an employee unless an

he/she is assigned to a project.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 11

PLocationPnameEnameHoursPNumberSSN

EMP_PROJ

Page 12: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Anomalies (5)• Delete Anomaly: When a project is deleted, it will

result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 12

PLocationPnameEnameHoursPNumberSSN

EMP_PROJ

Page 13: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Anomalies (6)

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 13

EMP_DEPT

DMGRSSNDNameDNumAddressBdateSSNEname

Page 14: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Guideline 2

• GUIDELINE 2: Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any present, then note them so that applications can be made to take them into account.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 14

Page 15: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Guideline 3

• GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible.

• Attributes that are NULL frequently could be placed in separate relations (with the primary key).

• Reasons for nulls:– attribute not applicable or invalid;– attribute value unknown (may exist);– value known to exist, but unavailable.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 15

Page 16: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Guideline 4• Bad designs for a relational database may result in

erroneous results for certain JOIN operations.• The "lossless join" property is used to guarantee

meaningful results for join operations.• GUIDELINE 4: The relations should be designed

to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.

• There are two important properties of decompositions: – non-additive or losslessness of the corresponding join;– preservation of the functional dependencies.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 16

Page 17: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Database Design: Guideline 4• Lossless-join property enables us to find any

instance of original relation from corresponding instances in the smaller relations. Must be achieved at any cost.

• Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations.

• See 15.4, 15.5 and 15.6 from textbook.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 17

Page 18: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Functional Dependences (1)• Functional Dependency is a property of the meaning

(or semantics) of the attributes in a relation:– Describes relationship between attributes in a relation. – If A and B are attributes of relation R, B is functionally

dependent on A (denoted A B), if each value of A in R is associated with exactly one value of B in R.

• Determinant of a functional dependency refers to attribute or group of attributes on left-hand side of the arrow.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 18

Page 19: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Functional Dependences (2)

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 19

Page 20: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Functional Dependences (3)Examples of FD constraints:• social security number determines employee name:

SSN ENAME• project number determines project name and

location:PNUMBER {PNAME, PLOCATION}

• employee ssn and project number determines the hours per week that the employee works on the project:

{SSN, PNUMBER} HOURS

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 20

Page 21: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Functional Dependences (4)1. X Y holds if whenever two tuples have the same value for X, they must have the same value for Y.2. For any two tuples t1 and t2 in any relation instance r(R):

If t1[X]=t2[X], then t1[Y]=t2[Y]3. X Y in R specifies a constraint on all relation instances r(R).4. Written as X Y; can be displayed graphically on a relation schema as in Figures ( denoted by an arrow).5. FDs are derived from the real-world constraints on the attributes .6. An FD is a property of the attributes in the schema R.7. The constraint must hold on every relation instance r(R).8. If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K]). D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms 21

Page 22: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

The Process of Normalization• Formal technique for analyzing a relation based on its

primary key and functional dependencies between its attributes.

• Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.

• As normalization proceeds, relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 22

Page 23: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Relationship Between Normal Forms

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 23

Page 24: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Normalization: 1NFDefinitions:• A relation is in 1NF if all attribute values are atomic.• Atomic: cannot be further broken down:• NO multivalued attributes;• NO nested attributes.

• Cure for Non-1NF Relations.– Multivalued attribute is eliminated by projection:• {DNUM, DNAME, MSSN, DLOC} {DNUM, DNAME,

MSSN}, {DNUM, DLOC}

– Nested attribute is eliminated by flattening:• ENAME(FNAME MI LNAME) {FNAME, MI, LNAME} and

ENAME is discarded.D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms 24

Page 25: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Normalization: 2NFDefinitions:• Prime attribute - attribute that is member of the primary key K.• Non-prime attribute – attribute that is not a member of the PK.• Full functional dependency - a FD Y Z where removal of any

attribute from Y means the FD does not hold any more.

Examples:1) {SSN, PNUMBER} HOURS is a full FD since neither SSN HOURS nor PNUMBER HOURS hold

2) {SSN, PNUMBER} ENAME is not a full FD (it is calleda partial dependency ) since SSN ENAME also holds

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 25

Page 26: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Normalization: 2NF• A relation R is in second normal form (2NF) if every

non-prime attribute A in R is fully functionally dependent on the primary key.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 26

PLocationPNameEnameHoursPNumberSSN

Employee-Project

SSN PNumber Hours SSN EName PNumber PName PLocation

fd1fd2

fd3

Page 27: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Normalization: 3NFDefinition: Transitive functional dependency - a FD X Z that can be derived from two FDs X Y and Y Z Examples:1. SSN DMGRSSN is a transitive FD since SSN DNUMBER and DNUMBER DMGRSSN hold 2. SSN ENAME is non-transitive since there is no set

of attributes X where SSN X and X ENAME

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 27

Page 28: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Normalization: 3NF

• A relation R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key.

NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency. E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 28

Page 29: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Normalization: 3NF

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 29

Employee-Department

DMGRSSNDNameDNumAddressBdateSSNEname

DNumAddressBDateSSNEname DMGRSSNDNameDNum

SSN Dnum Dname/DMGRSSN

Page 30: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Normalization: 3NF

• Converting from 2NF to 3NF:– Identify the primary key in the 2NF relation.– Identify functional dependencies in the relation.– If transitive dependencies exist on the primary key

remove them by placing them in a new relation along with a copy of their dominant.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 30

Page 31: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

General Definitions of 2NF and 3NF

• Second normal form (2NF)A relation R that is in 1NF is in 2NF if every non-

primary-key attribute A is not partially functionally dependent on any candidate key in R.

• Third normal form (3NF)A relation R that is in 2NF is in 3NF if no non-

primary-key attribute A is transitively dependent on any candidate key in R.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 31

Page 32: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Normalization: BCNF• A relation R is in Boyce-Codd Normal Form (BCNF) if

whenever an FD X A holds in R, then X is a superkey of R.

Notes:• Each normal form is strictly stronger than the

previous one:– Every 2NF relation is in 1NF;– Every 3NF relation is in 2NF;– Every BCNF relation is in 3NF.

• There exist relations that are in 3NF but not in BCNF.• The goal is to have each relation in BCNF (or 3NF).

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 32

Page 33: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

BCNF• A relation R is in BCNF if a functional dependency XA

holds in R, then X is a superkey of R.• Based on functional dependency that takes into

account all candidate keys in a relation.• For a relation with only one candidate key, 3NF and

BCNF are equivalent.• A relation is in BCNF, if and only if every determinant is

a candidate key.• Violation of BCNF may occur in a relation that:– Contains 2 (or more) composite keys;– Which candidate keys overlap and share at least 1 attribute.

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 33

Page 34: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Example 1

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 34

Atributes: Description

Eqpt_number Order number of a piece of the equipment from given type.

Location Place, where the piece of the equipment operates

Oprtr_ID_1 Identifier of the first shift operator (person operates on that equipment)

Oprtr_Name_1 Name of the first shift operator

Oprtr_Phone_1 Phone number of the first shift operator

Oprtr_ID_2 Identifier of the second shift operator (person operates on that equipment)

Oprtr_Name_2 Name of the second shift operator

Oprtr_Phone_2 Phone number of the second shift operator

Oprtr_ID_3 Identifier of the third shift operator (person operates on that equipment)

Oprtr_Name_3 Name of the third shift operator

Oprtr_Phone_3 Phone number of the third shift operator

Eqpt_Type Type of the equipment

Producer Producer of this type of equipment

Inst_Date Date of installation of the piece of the equipment

Cons_Power Consumed electricity of this type of the equipment

Maint_Time Required period of maintenance for this type of equipment

Maint_Lst_Date Date of the last maintenance

Maint_ID Identifier of the person responsible for the last maintenance

Maint_Name Name of the person responsible for the last maintenance

Maint_Phone Phone of the person responsible for the last maintenance

Maint_Act_1 Description of the first activity in last maintenance

Maint_Act_2 Description of the second activity in last maintenance

Maint_Act_3 Description of the third activity in last maintenance

“Equipment” holds data about the usage and maintenance activities of equipment in an enterprise

Notes:

• Eqpt_Number, Eqpt_Type, Location, and Maint_Lst_Date composed the primary key - identify any piece of equipment and last maintenance activities.

• Producer, Cons_Power, Maint_Time are specific for an equipment type, identified by Eqpt_Type.

• Operators (from any shift) operate on a particular piece of equipment and are identified by Oprts_ID. The relation keeps track to their names and phones.

• For any piece of equipment, maintenance is done periodically or when fails.

• Maintenance activities are predefined list for a given Type of Equipment. For maintenance of a particular piece of equipment up to three activities are listed.

• Relation holds information about the person, who performed the maintenance activities, identified by Maint_ID, their names and phones.

Page 35: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Example 2Chains’

supplier

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 35

Attribute Description

Code Identifier of the product

Name Supplier’s name of the product

Price Supplier’s regular price

Dealer Identifier of the Dealer

DName Dealer’s name

DCode Dealer’s identifier of the product

Dprice Dealer’s price of the product

DPhone Dealer’s phone

DPName Dealer’s name of the product

DQuantity Quantity of the product ordered by the Dealer

Date Date of the order

Page 36: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Example 3Home

delivery Pizzas

D. Christozov / G.TuparovINF 280 Database Systems:

DB design: Normal Forms 36

Attribute Description

Order_id Identifier of the ordered (automatic)

Pizza_id Identifier of the products

Quantity Quantity of the product, purchased by the customer

Ingredient_id Identifier of additional ingredient ordered (one or more)

Description Description of the ingredient

Unit Measurement unit used for ingredient

Amount Quantity per unit: quantity of ingredient used in a product unit

I_price Price of the ingredients

Name Name of the customer

Address Address of the customer

Price Price of the product

Page 37: Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

Q & AAttention ! Next classQuiz 3: Normal Forms

D. Christozov / G.TuparovINF 280 Database Systems:

Relational Model 37