cs542 1 schema refinement chapter 19 (part 1) functional dependencies
DESCRIPTION
CS542 3 The Evils of Redundancy Redundancy is at the root of several problems associated with relational schemas: redundant storage, insert/delete/update anomalies Functional dependeny constraints utilized to identify schemas with such problems and to suggest refinements. Main refinement technique: decomposition Example: replacing ABCD with, say, AB and BCD, Decomposition should be used judiciously: Is there reason to decompose a relation? What problems (if any) does the decomposition cause?TRANSCRIPT
CS542 1
Schema Refinement
Chapter 19 (part 1)
Functional Dependencies
CS542 2
The Evils of Redundancy Redundancy is at the root of several problems
associated with relational schemas: redundant storage, insert/delete/update anomalies
Main refinement technique: decomposition Example: replacing ABCD with, say, AB and BCD,
Functional dependeny constraints utilized to identify schemas with such problems and to suggest refinements.
CS542 4
Insert Anomaly
sNumber sName pNumber pNames1 Dave p1 prof1s2 Greg p2 prof2
Student
Question : How do we insert a professor who has no students?
Insert Anomaly: We are not able to insert “valid” value/(s)
CS542 5
Delete Anomaly
sNumber sName pNumber pNames1 Dave p1 MMs2 Greg p2 ER
Student
Question : Can we delete a student that is the only student of a professor ?
Delete Anomaly: We are not able to perform a delete without losing some “valid” information.
CS542 6
Update Anomaly
sNumber sName pNumber pNames1 Dave p1 MMs2 Greg p1 MM
Student
Question : How do we update the name of a professor?
Update Anomaly: To update a value, we have to update multiple rows. Update anomalies are due to redundancy.
CS542 7
Functional Dependencies (FDs)
A functional dependency X Y holds over relation R if, for every allowable instance r of R:
t1 r, t2 r, (t1) = (t2)
implies
(t1) = (t2)
Given two tuples in r, if the X values agree, then the Y values must also agree.
X
Y Y
X
CS542 8
FD Example
sNumber sName address1 Dave 144FL2 Greg 320FL
Student
Suppose we have FD sName address
• for any two rows in the Student relation with the same value for sName, the value for address must be the same
• i.e., there is a function from sName to address
CS542 10
Keys + Functional Dependencies
Assume K is a candidate key for R
What does this imply about FD between K and R?
It means that K R !
Does K R require K to be minimal ?
No. Any superkey of R also functionally implies all attributes of R.
CS542 11
Example: Constraints on Entity Set
Consider relation obtained from Hourly_Emps: Hourly_Emps (ssn, name, lot, rating, hrly_wages,
hrs_worked) Notation:
We denote relation schema by its attributes: SNLRWH This is really the set of attributes {S,N,L,R,W,H}.
Some FDs on Hourly_Emps: ssn is the key: S SNLRWH rating determines hrly_wages: R W
CS542 12
Problems Caused by FD
Problems due to Example FD :
rating determines hrly_wages: R W
CS542 13
Example Problems due to R W :
Update anomaly: Can we change W in just the 1st tuple of SNLRWH?
Insertion anomaly: What if we want to insert an employee and don’t know the hourly wage for his rating?
Deletion anomaly: If we delete all employees with rating 5, we lose the information about the wage for rating 5!
S N L R W H123-22-3666 Attishoo 48 8 10 40231-31-5368 Smiley 22 8 10 30131-24-3650 Smethurst 35 5 7 30434-26-3751 Guldu 35 5 7 32612-67-4134 Madayan 35 8 10 40
Hourly_Emps
rating (R) determines hrly_wages (W)
CS542 14
Same Example Problems due to R W !
S N L R W H123-22-3666 Attishoo 48 8 10 40231-31-5368 Smiley 22 8 10 30131-24-3650 Smethurst 35 5 7 30434-26-3751 Guldu 35 5 7 32612-67-4134 Madayan 35 8 10 40
S N L R H123-22-3666 Attishoo 48 8 40231-31-5368 Smiley 22 8 30131-24-3650 Smethurst 35 5 30434-26-3751 Guldu 35 5 32612-67-4134 Madayan 35 8 40
R W8 105 7Hourly_Emps2
Wages
Solution : 2 smaller tables insteadof one big one !
Hourly_Emps
CS542 15
Same Example Problems due to R W !
S N L R W H123-22-3666 Attishoo 48 8 10 40231-31-5368 Smiley 22 8 10 30131-24-3650 Smethurst 35 5 7 30434-26-3751 Guldu 35 5 7 32612-67-4134 Madayan 35 8 10 40
S N L R H123-22-3666 Attishoo 48 8 40231-31-5368 Smiley 22 8 30131-24-3650 Smethurst 35 5 30434-26-3751 Guldu 35 5 32612-67-4134 Madayan 35 8 40
R W8 105 7Hourly_Emps2
Wages
Will 2 smaller tables be better than one big one?
Update anomaly: Can we change W in just the 1st tuple of SNLRWH?Insertion anomaly: What if we want to insert an employee and don’t know the hourly wage for his rating?Deletion anomaly: If we delete all employees with rating 5, we lose information about wage for rating 5?
CS542 17
Reasoning About FDs
Given some FDs, we can usually infer additional FDs:
ssn did, did lot implies ssn lot
CS542 18
Properties of FDs Consider A, B, C, Z are sets of attributes
Armstrong’s Axioms: Reflexive (also trivial FD): if A B, then A B Transitive: if A B, and B C, then A C Augmentation: if A B, then AZ BZThese are sound and complete inference rules for FDs!
Additional rules (that follow from AA): Union: if A B, A C, then A BC Decomposition: if A BC, then A B, A C
CS542 21
Closure of FDs
An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold. = closure of F is set of all FDs that are implied
by F.
Computing closure of a set of FDs can be expensive. Size of closure is exponential in # attrs!
F
CS542 23
Reasoning About FDs (Contd.)
Instead of computing full closure F+ of a set of FDs Too expensive
Typically, we just need to know if a given FD X Y is in closure of a set of FDs F.
Algorithm for efficient check: Compute attribute closure of X (denoted ) wrt F:
• Set of all attributes A such that X A is in• There is a linear time algorithm to compute this.
Check if Y is in X+ . If yes, then X Y in F+.
X
F
CS542 24
Algorithm for Attribute Closure
Computing the closure of set of attributes {A1, A2, …, An}, denoted {A1, A2, …, An}+
1. Let X = {A1, A2, …, An}2. If there exists a FD B1, B2, …, Bm C, such
that every Bi X, then X = X C3. Repeat step 2 until no more attributes can
be added.4. Output X+ = {A1, A2, …, An}+
CS542 26
Another Example : Inferring FDs
Consider R (A, B, C, D, E) with FDs F = { A B, B C, CD E } does A E hold ? Rephrase as :
Is A E in the closure F+ ? Equivalently, is E in A+ ?
Let us compute {A}+
{A}+ = {A, B, C} Conclude : E is not in A+, therefore A E is false
CS542 27
Recap: So Far.
Functional Dependencies : Relationships across Attributes of Relations
Redundancy : Arises due to certain relationships (FDs) holding.
So far : Reasoning with FDs.
Approach : Establish certain “normal forms” with respect to dependencies