normalization continued cmsc 461 michael wilson. normalization clarification normalization is...

20
Normalizati oncontinued CMSC 461 Michael Wilson

Upload: milo-walters

Post on 16-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

NormalizationcontinuedCMSC 461Michael Wilson

Page 2: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

Normalization clarification Normalization is simply a way of

reducing anomalous database behavior It’s not a required or programmatically

necessary concept A database will function perfectly fine

without normalized tables The table design will just suck

Page 3: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

First normal form (1NF) Each attribute only has atomic values

None of the elements on a relation in 1NF have elements which are sets The elements cannot be further broken

down A bad 1NF attribute and example value:

phoneNumberAndFirstName: 555-555-5555,Jason

Page 4: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

First normal form (1NF) Furthermore

There are no duplicate rows This means that there must be a key

This is important for higher normalization forms

Page 5: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

Second normal form (2NF) Must be in 1NF Non-prime attributes are dependent on the

whole of a candidate key Not a partial candidate key – not 2NF Non-prime = attributes not part of a candidate

key One thing to keep in mind

Multiple candidate keys may occur within one table

As long as the non-prime attributes depend on a candidate key, it is sufficiently 2NF

Page 6: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

Reminder Candidate key = minimal uniquely

identifying set of attributes

Page 7: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

2NF exampleEmployee Skill Work Location

Brown Light Cleaning 73 Industrial Way

Brown Typing 73 Industrial Way

Harrison Light Cleaning 73 Industrial Way

Jones Shorthand 114 Main Street

Jones Typing 114 Main Street

Jones Whittling 114 Main Street

Page 8: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

2NF example Jacked shamelessly from wikipedia

Good example, though Neither Employee or Skill can be a key

here Key must be {Employee, Skill}

Here, the work location depends on the employee alone

How to solve this?

Page 9: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

Third normal form (3NF) Must be in 2NF Every non-prime attribute must be

directly dependent on every superkey in a relation X→A where X is a superkey and A is a non-

prime attribute Must hold for every superkey and every

non-prime attribute

Page 10: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

Reminder Superkey – uniquely identifying set of

attributes

Page 11: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

Third normal form (3NF) Another definition: For every functional dependency X→A,

one of the following must hold: X→A is trivial X is a superkey Every element of the set difference

between A and X is a prime attribute – part of a candidate key

Page 12: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

3NF exampleTournament

Year Winner Winner DOB

Indiana Invitational

1998 Al Frederickson

21 July 1975

Cleveland Open

1999 Bob Albertson

28 September 1968

Des Moines Masters

1999 Al Frederickson

21 July 1975

Indiana Invitational

1999 Chip Masterson

14 March 1977

Page 13: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

3NF example Also jacked from wikipedia This table is in 2NF

What are the candidate keys? What are the superkeys?

Page 14: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

3NF example The winner functionally determines the

winner date of birth Transitive dependency of a non-prime

attribute Therefore, 3NF violation How do we fix this?

Page 15: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

Boyce-Codd Normal Form Often called 3.5NF Only states two things For every functional dependency of the

form X→A, one of the following must hold: X→A is trivial X is a superkey for the relation

Page 16: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

Difference between 3NF and BCNF? It’s actually pretty straightforward

3NF says that non-prime attributes must be dependent on a key

However, it does not say anything about prime attributes Parts of the key can be dependent on

candidate keys BCNF tables satisfy 3NF, but not

necessarily the reverse

Page 17: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

3NF and BCNF BCNF is only slightly more strict than

3NF Only time you run into issues is when

candidate keys overlap in 3NF Possible to have a 3NF relation that is not

BNF when candidate keys overlap

Page 18: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

What to use? 3NF is very popular, most common BCNF is also very popular Recommendation

Shoot for 3NF to begin with Very sensible way of organizing your data Tables only have information that

describes the key

Page 19: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

Denormalization Though normalization helps us rely on

our data, denormalization is sometimes required for performance reasons Often, one will need to re-add redundant

data Minimizes joins, selects, views, etc.

In high performance applications, one extra select could cause crippling response issues

Page 20: Normalization continued CMSC 461 Michael Wilson. Normalization clarification  Normalization is simply a way of reducing anomalous database behavior

When to denormalize? Not at first!

If you don’t know that you’re going to run into performance issues, then don’t denormalize

Always try to keep things in a normalized form if possible

Later Once you’ve identified issues through

testing and statistics, denormalize if necessary