part 6 chapter 15 normalization of relational database csci455 r [email protected] [email protected] 1

69
Part 6 Part 6 Chapter 15 Chapter 15 Normalization of Relational Normalization of Relational Database Database Csci455 r[email protected] 1

Post on 20-Dec-2015

227 views

Category:

Documents


7 download

TRANSCRIPT

Part 6 Part 6 Chapter 15 Chapter 15

Normalization of Relational DatabaseNormalization of Relational Database

[email protected]

1

• Design Methodologies• Goodness of design• functional dependencies• The normalization process and normal forms

– First, second, third, BCNF• Pros and cons of normalization

2

ObjectivesObjectives

• Database system can be designed via– Bottom-up (design by synthesis)– Top-Down (design by analysis)

3

Design MethodologyDesign Methodology

• Starts with the basic relationships between pair of attributes

• Uses these information to construct the relations

• not scalable and practical

4

Bottom-up design Bottom-up design

• The design process– Starts with one relation (set of all attributes)– Decomposes it into groups

• Use ER to model the conceptual schema• Existing design knowledge or experiences

– Maps each entity into table schema – Analyzes each table schema for goodness

• possible refinement and/or decomposition

5

Top-down designTop-down design

• Informal design metrics Semantics of the related attributes Reducing the redundant values in tuples Minimizing the NULL values Disallowing spurious tuples

6

Informal Design Guidelines for Relational Informal Design Guidelines for Relational SchemasSchemas

• Based on the semantics of attributes or how the attributes values in a tuple relate to one another– A schema should capture facts about one entity or

one relationship type

7

Semantics of the Relation AttributesSemantics of the Relation Attributes

8

9

Fig10-2Fig10-2

• Design a relation schema so that it is easy to explain its meaning– do not combine attributes from multiple entity

types and relationship types into a single relation

10

Guideline 1Guideline 1

11

Fig10-3Fig10-3

Considered as POOR designs! Why?

• The important objective of schema design – to minimize the storage space and effort – to minimize problems resulted from updates

• Example – Compare relations in Fig15.2 with those in Fig.15.4

12

Redundant Information in Tuples and Redundant Information in Tuples and Update AnomaliesUpdate Anomalies

13

Fig10-2Fig10-2

14

Fig10-4Fig10-4

• Update Anomalies – Insertion anomalies– deletion anomalies– Modification anomalies

15

Update AnomaliesUpdate Anomalies

• Insertion Anomalies • Consistency:

– E.g., insert a new employee » need to insert ALL attributes for Department, » or insert NULL if employee does not work

• Null values: – E.g., insert a new department, with no employee

» violation of Entity integrity because ssn cannot be NULL

• e.g., EMP_DEP fig 15.416

Insertion AnomaliesInsertion Anomalies

17

Fig10-4Fig10-4

• Deletion Anomalies – Loss of Information

• E.g., – delete the very last employee who works for dnum=1 from

EMP_DEPT

18

Deletion AnomaliesDeletion Anomalies

19

Fig10-4Fig10-4

• Modification Anomalies– Change one, change all

• E.g., change dept. Mgr or dept. number

20

Modification AnomaliesModification Anomalies

21

Fig10-4Fig10-4

• Design anomaly-free base relation schemas– How? use formal approaches to validate design

against these guidelines

22

Guideline 2Guideline 2

• Results in a set of attributes that do not apply to all tuples– E.g., Student Phone number

• Not every student has a cell phone or work phone

• Guideline 3– Stay a way from attributes with NULL values in the base

table• Waste storage, difficulties to understand, aggregate functions,

and operations involving comparisons (e.g. join operation)

23

Null Values in TuplesNull Values in Tuples

• Refers to the undesirable decomposition of a relation– E.g.,

• EMP_LOC and EMP_PROJ1

24

Generation of Spurious (or invalid) TuplesGeneration of Spurious (or invalid) Tuples

25

Fig10-5Fig10-5

26

Fig10-6Fig10-6

ENAME

• Design relation schema so that they can be JOINED with equality conditions on attributes that are either PKs or FKs

27

Guideline 4Guideline 4

Summary and discussion of design Summary and discussion of design guidelinesguidelines

• The problems discussed can be avoided using the following guidelines1. Anomalies that cause redundant work to be

done during insertion, deletion, and modifications

2. Waste of storage space due to NULL3. Generations of invalid and spurious data during

Join on base relations using non-key attributes

28

• Refers to a requirement between two sets of attributes: X and Y such that– For two tuples t1, and t2 in r(R)

• if t1[X]=t2[X] t1[Y] =t2[Y]

• Used to define normal forms

29

Functional DependenciesFunctional Dependencies

• Represented by X Y– X functionally determines Y– or, Y functionally depends on X– if for each X value, we have ONLY one Y value,

then X is Candidate Key (CK)• Note: FD is the property of the semantics or

meaning of attributes• Legal relation states (legal extensions) of R

30

Functional Dependencies (FD): Formal Functional Dependencies (FD): Formal definitiondefinition

• The notion of dependency has to do with a schema-based dependency – It is a semantic notation– FD is part of the process of understanding what

the data means

31

Properties of functional dependencies Properties of functional dependencies (FDs)(FDs)

32

Fig10-3Fig10-3

(b) EMP_PROJSSNENAME PNUMBER{PNAME, PLOCATION}{SSN, PNUMBER} HOURS

• Legal extensions (or legal relation): – Refers to the extensions r(R) that satisfy the functional

dependency constraint • A FD is a property of the relation schema not the

relation extension

33

Important Notes on FDsImportant Notes on FDs

34

Fig10-7Fig10-7

FD1: TEXT COURSE ? Yes or no

FD2: TEACHER COURSE? No

FD3: COURSETEACHER? No

• Normalization theory: – builds around the concept of normal forms– used in the design process

• a relation is in a particularly normal form if it satisfies a specified set of requirements– E.g.,

• 1NF (i.e., all underlying domains MUST have atomic values)

35

NormalizationNormalization

• Type of Normal Forms– 1NF– 2NF– 3NF– BCNF– 4NF– 5NF (PJ/NF)– DKNF (absolute normal form)

36

Normal FormNormal Form

1NF

2NF

3NF/BCNF

4NF

5NF

DKNF

37

Relationships of Normal FormsRelationships of Normal Forms

• 1NF prevents– multi-valued attributes, – composite attributes– combinations of the above

• See fig 15.8• See fig 15.9

– nested relation or multivalued composite attributes

38

First Normal Form (1NF)First Normal Form (1NF)

39

Fig10-8Fig10-8

40

Fig10-9Fig10-9

• Based on the concepts of full functional dependency• Analogy to the traditional justice oath:

– Every non-key attribute depends on a key, the whole key, and nothing but the key

• R is in 2NF iff – R is in 1NF– Every non-key attribute is fully depend on the PK

41

Second Normal Form (2NF)Second Normal Form (2NF)

Normalization into 2NF, and 3NFNormalization into 2NF, and 3NF

42

43

Fig10-10Fig10-10

• Based on the concepts of transitive dependency

• Relation R is in 3NF iff– R is in 2NF– Every non-key attribute is non-transitively

dependents on the PK

44

Third Normal FormThird Normal Form

45

Fig10-10Fig10-10

• Formal Definition– R is in 3NF if, whenever a functional dependency

XY exists then• X is super key • Y is prime attribute

• e.g.,– LOTS2 in fig.15.12.b is 3NF– LOTS1 in fig.15.12.b (FD4) is NOT 3NF

46

Interpretation of 3NFInterpretation of 3NF

47

48

49

50

Alternative definition of 3NFAlternative definition of 3NF

• A relation schema R is in 3NF if every non-prime attribute of R satisfies the following conditions:– Non-primed attribute fully functionally depends

on every Key of R– Non-primed attribute is non-transitively depend

on every key of R

51

• Boyce-Codd normal form– A more restricter formal form than 3NF

•If R is BCNF then R is also in 3NF•R in 3NF does not mean R is BCNF

– Attempts to eliminate more redundancy not detectable by 3NF

52

Boyce/Codd NFBoyce/Codd NF

ExampleExample

• Suppose we have thousands of lots in the relation but the lots are from only two counties– DeKalb and Fulton

• Let say lot sizes in – The Dekalb are 0.5.,…,1.0 acres– The Fulton are 1.1, 1.2, …1.9,2.0 acres

• Also assume that– FD5: Area County_Name

53

54

Fig10-12Fig10-12

• A relation R is in Boyce/Codd normal form (BCNF) iff – Every determinant is a CK

• (i.e., each attribute MUST describe the key, the whole key, and nothing but the key)

• Ensures no redundancy (GOOD)• Considered the most desirable NF

55

Boyce/Codd NF (Cont’)Boyce/Codd NF (Cont’)

• Consider a relation TEACH with– FD1: {Student, Course} Instructor– FD2: Instructor Course

• The relation is 3NF• Is it in BCNF? No

56

ExampleExample

Candidate key

BCNF ExampleBCNF ExampleSemanticsSemantics

• A student can take more than one course• But a student has a different instructor for

each course.• Each instructor (non-key) teaches only one

course (partial key).

57

58

Fig10-13Fig10-13

• Possible decompositions are1. {Student, Instructor} and {Student, Course}2. {Course, Instructor} and {Course, Student}3. {Instructor, Course} and {Instructor, Student}

• Which of the decomposition is better? Justify it.

59

More on ExampleMore on Example

Instructor-course TableInstructor-course Table

Instructor Course

Mark Database

Navathe Database

Schulman Theory

Ahmand OS

Omiecinski Database

Ammar OS

60

Instructor-student TableInstructor-student Table

Instructor StudentMark Narayan

Mark Wallace

Navathe Smith

Navathe Zelaya

Ammar Smith

Ammar Narayan

Schulman Smith

Ahmand Wallace

OMIECINSKIw Wong61

• Decomposition: Pros and cons– Makes answering the complex queries less efficient (BAD)

because additional joins must be performed during query (BAD)

– May increase storage requirements if the degree of redundancy is very low (BAD)

– May decrease storage requirements if the degree of redundancy is very high (Good)

– Makes simple update transaction more efficient (GOOD)

62

To decompose or Not to decompose?To decompose or Not to decompose?

Multivalued DependencyMultivalued DependencyFourth Normal FormFourth Normal Form

• We discussed the concept of functional dependency (FD)• Other constraints that cannot be specified as functional dependencies is

– multivalued dependency (MVD) and define fourth normal form, which is based on this dependency

• It is a direct consequence of first normal form (1NF) which disallows an attribute in a tuple to have a set of values

• Happens when have two or more multivalued independent attributes in the same relation schema

– i.e., having a relation consists of multiple 1:Ns

63

• Multivalued dependency(MVD) XY on R, – where XYR, and Z = (R – (XY)) specifies the

following conditions on r(R):• t3[X]= t4[X]= t1[X]= t2[X]• t3[Y]=t1[Y] and t4[Y] = t2[Y]• t3[Z]=t2[Z] and t4[Z] = t1[Z]

• 4NF typically involves eliminating MVDs by repeated binary decompositions as well.

64

Formal Definition of Multivalued DependencyFormal Definition of Multivalued Dependency

65

Join Dependencies (JD)Join Dependencies (JD)Fifth Normal Form (Project-Join)Fifth Normal Form (Project-Join)

• Join dependency – constraint on the set of legal relations over a database

scheme. – A table T is subject to a join dependency if T can always be

recreated by joining multiple tables each having a subset of the attributes of T

– Join operation must satisfy the lossless (or nonadditive) join property

• A very specific semantic constraint and very difficult to detect in practice– there is no sound and complete axiomatization for join dependencies

66

Example (JD)Example (JD)

• Suppose that the following additional constraint always holds:– Whenever a supplier s supplies part p, – and a project j uses part p, – and the supplier s supplies at least one part pi to

project j, – Then supplier s will also be supplying part p to

project j.

67

68

Quiz: March 10, 2015Quiz: March 10, 2015

69