your name here. improving schemas and normalization what are redundancies and anomalies? what are...
Post on 31-Dec-2015
220 Views
Preview:
TRANSCRIPT
Your name here
Improving Schemas and Normalization
• What are redundancies and anomalies?• What are functional dependencies and how are they
related to schema quality?• What is a superkey?• What is a inference rule and how can we infer functional
dependencies• How are keys determined by functional dependencies• How can we modify a schema to improve it?• What are normal forms and why are they important?
Redundancy and Anomalies in Relation Schemas
videoId date Acquired
movieId title genre length rating
115 1/ 25/ 98 101 The Thirty-Nine Steps
mystery 101 PG
90987 2/ 5/ 97 450 Elizabeth costume drama 123 PG-13 145 12/ 31/ 95 145 Lady and the Tramp animated drama 93 G 8034 4/ 5/ 98 145 Lady and the Tramp animated drama 93 G 90988 4/ 5/ 98 450 Elizabeth costume drama 123 PG-13 90989 3/ 25/ 86 450 Elizabeth costume drama 123 PG-13 543 5/ 12/ 95 101 The Thirty-Nine
Steps mystery 101 R
1243 4/ 29/ 91 123 Annie Hall romantic comedy 110 R
• Anomalies occur when data is inconsistent• Redundancy of values is the source of anomalies• Update anomaly occurs when values are inconsistent
– if title, genre, length or rating changed in any one or two of the green rows
Redundancy and Anomalies in Relation Schemas
• Anomalies occur when data is inconsistent• Redundancy of values is the source of anomalies• Deletion anomaly caused by deletion of row with videoId1243 (pink)
– Information about movie is deleted along with video• Insertion anomaly caused by last row (blue)
– Length and rating are inconsistent with other rows
videoId date Acquired
movieId title genre length rating
115 1/ 25/ 98 101 The Thirty-Nine Steps
mystery 101 PG
90987 2/ 5/ 97 450 Elizabeth costume drama 123 PG-13 145 12/ 31/ 95 145 Lady and the Tramp animated drama 93 G 8034 4/ 5/ 98 145 Lady and the Tramp animated drama 93 G 90988 4/ 5/ 98 450 Elizabeth costume drama 123 PG-13 90989 3/ 25/ 86 450 Elizabeth costume drama 123 PG-13 543 5/ 12/ 95 101 The Thirty-Nine
Steps mystery 101 R
1243 4/ 29/ 91 123 Annie Hall romantic comedy 110 R 114 6/ 5/ 98 450 Elizabeth costume drama 110 R
Functional Dependencies Between Attributes
• A functional dependency is a strong connection between two or more attributes in a table. – one attribute is functionally dependent on another attribute when
any two rows of the table that have the same value of the second attribute must have the same value for the first
• Example: movieId determines title, genre, length, rating– Each row with movieId 123 has the same values for other
attributes– FD2: movieId {title, genre, length, rating}
VideoMovie:(videoId, dateAcquired, movieId, title, genre, length, rating)
FD2
City, State, Zipcode Dependencies
• FD4: zipcode {city, state}• FD5: {street, city, state} zipcode
accountId lastName firstName street city state zipcode
101 Block Jane 123 Main St. Apopka FL 30458
102 Hamilton Cherry 3230 Dade St. Dade City FL 30555
103 Harrison Kate 103 Dodd Hall Apopka FL 30457
104 Breaux Carroll 76 Main St. Apopka FL 30457
106 Morehouse Anita 9501 Lafayette St. Houma LA 44099
111 Deaux Jane 123 Main St. Apopka FL 30458
201 Greaves Joseph 14325 N. Bankside St. Godfrey IL 43580
Customer:(accountId, lastName, firstName, street, city, state, zipcode)
FD4
FD5
Superkeys and Keys
• A key constraint is a functional dependency• Example: accountId is key of Customer
– FD6: accountId {lastName, firstName, street, city, state, zipcode}
• A superkey is a set of attributes that determine the rest of the attributes of a schema– FD7: {accountId, lastName} (firstName, street, city, state, zipcode}
Customer:(accountId, lastName, firstName, street, city, state, zipcode)
FD6
Customer:(accountId, lastName, firstName, street, city, state, zipcode)
FD7
Using Functional Dependencies
• Functional dependencies are used for– Determining keys– Finding sources of redundancy and hence trouble
• Functional dependencies are declared– Designer defines FDs based on the semantics of the schemas– Additional dependencies can be found from those that are
declared
• Keys and redundancies are based on the full set of FDs– All declared FDs– FDs inferred by applying inference rules
Inferring Additional Functional Dependencies
• Main inference rules– Rule 1: Reflexivity, a set of attributes X determines a subset Y of itself: – If X Y, then X Y. – Rule 2: Augmentation, a set of attributes Z can be added to both sides of X
Y:– If X Y, then XZ YZ.– Rule 3: Transitivity, we can follow chains of dependencies from X to Y to Z: If X
Y and Y Z, then X Z.• Additional rules for convenience
– Rule 4: Decomposition, we can remove a set of attributes Z from the right side of X YZ:
– if X YZ, then X Y.– Rule 5: Union, we can put two dependencies X Y and X Z together if they
have the same left side Z:– if X Y and X Z then X YZ– Rule 6: Pseudo-transitivity, a combination of augmentation by adding W to both
sides of X Y and transitivity in going from WX to WY to Z: – if X Y and WY Z, then WX Z.
• Apply rules to FDs to find new rules– Closure is the set of all FDs that can be inferred
Example of Inference
• Consider these how to infer FD7 from FD6– FD6: accountId {lastName, firstName, street, city, state,
zipcode}– FD7: {accountId, lastName} (firstName, street, city, state,
zipcode}
• Infer FD8 with augmentation: add lastName to left side– FD8: {accountId, lastName} (firstName, street, city, state,
zipcode, lastName}
• Use decomposition: remove lastName from right side– FD7: {accountId, lastName} (firstName, street, city, state,
zipcode}
Determining Keys from Functional Dependencies
• Start with closure of functional dependencies• Any functional dependency that includes all attributes
has a superkey as the left side• If no subset of the left side is a super key
– The left side is a key
• A set of attributes is a key if and only if the above holds• Some terminology
– Key is a set of attributes that determine all other attributes– Key attribute is an attribute that is part of a key– Non-key attribute is an attribute that is not part of any key – Primary key is one of the keys that has been selected to identify
the objects of the schema– Secondary key is a key that is not the primary key
Normalization
• Normalization is the process of transforming some objects into a structural form that satisfies some collection of rules
• Any schema that is in normal form is guaranteed to have certain quality characteristics
• Each normal form has a rule that describes what kinds of functional dependencies the normal form allows. – Normalization is the process of transforming schemas in order to
remove violations of the normal form rules. – Normalization is applied independently to each relation schema
in a database schema. – A a database schema is said to be in normal form if each of its
relation schemas is in the normal form.
Third Normal Form• A relation schema is in third normal form (3NF) if for every functional
dependency– The left side (determinant) is a superkey or – The right side attributes are all key attributes
• A functional dependency is a 3NF violation if– The left side is not a superkey and– The right side attributes are all non-key attributes
• Consider the schema and FDs– VideoMovie:(videoId, dateAcquired, movieId, title, genre, length, rating)– FD1: movieId title– FD2: movieId {title, genre, length, rating}– FD9: videoId (dateAcquired, movieId}– FD10: videoId movieId– FD11: videoId (title, genre, length, rating}– FD12: videoId (dateAcquired, movieId, title, genre, length, rating}
• FD1, FD2 are 3NF violations• FD9, FD10, FD11, FD12 are not 3NF violations because videoId (left side)
is a key
Decomposition
• Remove violations by decomposition– Create a new schema from FD– Remove right hand attributes of FD from original schema– Left side of FD becomes foreign key in original schema
• Consider the schema and 3NF violations– VideoMovie:(videoId, dateAcquired, movieId, title, genre, length,
rating)– FD1: movieId title– FD2: movieId {title, genre, length, rating}
• Can decompose by either FD1 or FD2– Better to use the larger FD
• New schemas– Video: (videoId, dateAcquired, movieId references Movie)– Movie: (movieId, title, genre, length, rating)
First and Second Normal Form
• The traditional presentation of 3NF includes two other normal forms: 1NF and 2NF– E.F. Codd (1970) defined several normal forms– Subsequent analysis simplified the definition of 3NF
• 1NF specifies that every attribute must be single valued– 1NF has been incorporated into definition of relational model
• 2NF makes a technical distinction about why an FD is a violation– The goal of normalization is to achieve 3NF– 2NF is an intermediate step and never a goal of normalization
Boyce Codd Normal Form• A schema is in BCNF if every functional dependency has a
superkey as its determinant – No exclusion for key attributes in left side
• Important in the context of multi-attribute keys• Consider the example of schema and FD
– R6: (street, city, state, zipcode, secondary key {street, zipcode})– FD4: zipcode {city, state}
• FD4 has BCNF violation even though city and state are key attributes– FD4 is not a 3NF violation
• Decomposition of R6 by FD4 into R7 and R8 in BCNF– R7: (street, zipcode references R8)– R8: (zipcode, city, state)
• Note that the schemas have one key each.– Multiple keys have been removed by decomposition
Case in Point: Normalizing a Car Registration Schema
• Example from text – Illustrates the way
that normalization can be a source of schema definition
• Process of design– Define relevant FDs– Apply inference rules– Normalize– Rename resulting
schemas
10583256
DECAL NUMBER
9
BIRTHDATEMO. DAY YR.YR.
347113M 9911
SEX
EXPIRESMO. DAY YR.
N
TAGISSUED
LW1 43K
TAG NUMBER
38543630
TITLE NUMBER
KLA4KA4667MCO19042
VEHICLE IDENTIFICATION NO. YR. MAKE
01
WT/LENGTH
3179
CLASS
02
MAKE
ACUR
TYPE
4D
GREGORY, MATILDA A OR ALAN M2006 W BRANOCH CIRTALLAHASSEE FL 32301
OWNER NAME AND ADDRESS
G26438156171
1ST OWNER DL. NO.
G26351353447
2ND OWNER DL. NO.
C
INSURANCEPIP LIABILITY
12 9910
DATE ISSUEDMO. DAY YR.
0.00 0.00CREDIT$ REFUND$
TAG MONEY
TITLE MONEY
1.60
MOS. TAX $
0.00
B.T. MOS. BACK TAX $
2.50
SVC. CHG. $
0.00
OTHER $
4.10
TAG TOTAL $
25.00
TITLE FEE $
0.00 0.00
LATE FEE $ LIEN $
4.25
SVC. CHG. $
29.25TITLE TOTAL $
0.00
SALES TAX $
33.35
GRAND TOTAL $
COLOR CPR
State of Florida Department of Motor VehiclesVehicle Registration Certificate
top related