normalization continued cmsc 461 michael wilson. normalization clarification normalization is...
TRANSCRIPT
NormalizationcontinuedCMSC 461Michael Wilson
Normalization clarification Normalization is simply a way of
reducing anomalous database behavior It’s not a required or programmatically
necessary concept A database will function perfectly fine
without normalized tables The table design will just suck
First normal form (1NF) Each attribute only has atomic values
None of the elements on a relation in 1NF have elements which are sets The elements cannot be further broken
down A bad 1NF attribute and example value:
phoneNumberAndFirstName: 555-555-5555,Jason
First normal form (1NF) Furthermore
There are no duplicate rows This means that there must be a key
This is important for higher normalization forms
Second normal form (2NF) Must be in 1NF Non-prime attributes are dependent on the
whole of a candidate key Not a partial candidate key – not 2NF Non-prime = attributes not part of a candidate
key One thing to keep in mind
Multiple candidate keys may occur within one table
As long as the non-prime attributes depend on a candidate key, it is sufficiently 2NF
Reminder Candidate key = minimal uniquely
identifying set of attributes
2NF exampleEmployee Skill Work Location
Brown Light Cleaning 73 Industrial Way
Brown Typing 73 Industrial Way
Harrison Light Cleaning 73 Industrial Way
Jones Shorthand 114 Main Street
Jones Typing 114 Main Street
Jones Whittling 114 Main Street
2NF example Jacked shamelessly from wikipedia
Good example, though Neither Employee or Skill can be a key
here Key must be {Employee, Skill}
Here, the work location depends on the employee alone
How to solve this?
Third normal form (3NF) Must be in 2NF Every non-prime attribute must be
directly dependent on every superkey in a relation X→A where X is a superkey and A is a non-
prime attribute Must hold for every superkey and every
non-prime attribute
Reminder Superkey – uniquely identifying set of
attributes
Third normal form (3NF) Another definition: For every functional dependency X→A,
one of the following must hold: X→A is trivial X is a superkey Every element of the set difference
between A and X is a prime attribute – part of a candidate key
3NF exampleTournament
Year Winner Winner DOB
Indiana Invitational
1998 Al Frederickson
21 July 1975
Cleveland Open
1999 Bob Albertson
28 September 1968
Des Moines Masters
1999 Al Frederickson
21 July 1975
Indiana Invitational
1999 Chip Masterson
14 March 1977
3NF example Also jacked from wikipedia This table is in 2NF
What are the candidate keys? What are the superkeys?
3NF example The winner functionally determines the
winner date of birth Transitive dependency of a non-prime
attribute Therefore, 3NF violation How do we fix this?
Boyce-Codd Normal Form Often called 3.5NF Only states two things For every functional dependency of the
form X→A, one of the following must hold: X→A is trivial X is a superkey for the relation
Difference between 3NF and BCNF? It’s actually pretty straightforward
3NF says that non-prime attributes must be dependent on a key
However, it does not say anything about prime attributes Parts of the key can be dependent on
candidate keys BCNF tables satisfy 3NF, but not
necessarily the reverse
3NF and BCNF BCNF is only slightly more strict than
3NF Only time you run into issues is when
candidate keys overlap in 3NF Possible to have a 3NF relation that is not
BNF when candidate keys overlap
What to use? 3NF is very popular, most common BCNF is also very popular Recommendation
Shoot for 3NF to begin with Very sensible way of organizing your data Tables only have information that
describes the key
Denormalization Though normalization helps us rely on
our data, denormalization is sometimes required for performance reasons Often, one will need to re-add redundant
data Minimizes joins, selects, views, etc.
In high performance applications, one extra select could cause crippling response issues
When to denormalize? Not at first!
If you don’t know that you’re going to run into performance issues, then don’t denormalize
Always try to keep things in a normalized form if possible
Later Once you’ve identified issues through
testing and statistics, denormalize if necessary