Download - Advanced DB CHAPTER 7 RELATIONAL DB DESIGN
Advanced DB
CHAPTER 7RELATIONAL DB DESIGN
Chapter 7: Relational Database Design
First Normal FormPitfalls in Relational Database DesignFunctional DependenciesDecompositionBoyce-Codd Normal FormThird Normal FormMultivalued Dependencies and Fourth Normal FormOverall Database Design Process
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 2
First Normal Form
Domain is atomic if its elements are considered to be indivisible unitsExamples of non-atomic domains:p
set of names, composite attributesidentification numbers like CS101 that can be broken up into parts
A relational schema is in first normal form (1NF) if the domains of all attributesA relational schema is in first normal form (1NF) if the domains of all attributes are atomicAtomicity is actually a property of how the elements of the domain are used
S d ID b CS0012 EE1127Student ID numbers: CS0012, EE1127, …If the first two characters are extracted to find the department => the domain is not atomic
i ibNon-atomic attributesleads to encoding of information in application program rather than in the databasecomplicate storage and query processing
We assume all relations are in first normal form
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 3
ExampleLending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)
Redundancy:Data for branch-name, branch-city, assets are repeated for each loan that a branch makesWastes space Complicates updating, introducing possibility of inconsistency of assets valuep pd g, d g p b y y v
Null valuesCan use null values, but they are difficult to handle.
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 4
Redundancy creates problems
Anomalies (by Codd)I i l i f i b b h if l iInsertion anomaly: cannot store information about a branch if no loans exist Deletion anomaly: lose branch info when that last account for the branch is deletedUpdate anomaly: what happens when you modify asset for a branch in only a single record?
The problems are caused by redundancy!Solution => decompose schema so that each information
i d l (l )content is represented only once (later)information content: relationship between attributes
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 5
Relational Theory
Goal: Devise a theory for the following“ ”Decide whether a particular relation R is in “good” form.
In the case that a relation R is not in “good” form, decompose it i f l i {R R R } h hinto a set of relations {R1, R2, ..., Rn} such that
each relation is in good form the decomposition is a lossless join decompositionthe decomposition is a lossless-join decomposition
Our theory is based on:functional dependenciesfunctional dependenciesmultivalued dependencies (not covered in this semester)
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 6
Functional Dependencies
Constraints on the set of legal relations.Require that the value for a certain set of attributes determines uniquely the value for another set of attributes.A f i l d d i li i f h i fA functional dependency is a generalization of the notion of a key.
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 7
Functional Dependencies (Cont.)
Let R be a relation schemaα ⊆ R d β ⊆ Rα ⊆ R and β ⊆ R
The functional dependency α → β holds on R if and only iffor any legal relations r(R),y g ( ),whenever any two tuples t1 and t2 of r agree on the attributes α,they also agree on the attributes β.Th t iThat is,
t1[α] = t2 [α] ⇒ t1[β ] = t2 [β ]
Examplep eConsider r(A,B) with the following instance of r
1 41 5
On this instance, A → B does NOT hold, but B → A does hold3 7
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 8
Trivial FD
A functional dependency is trivial if it is satisfied by all instances of a relationof a relation
E.g.customer-name, loan-number → customer-name,customer-name → customer-name
Lemma: α → β is trivial if β ⊆ α
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 9
Closure of a Set of FDs
Given a set F of FDs, there are certain other FDs that are logically implied by FF
E.g. If A → B and B → C, then we can infer that A → C
The set of all functional dependencies logically implied by F is the closure of F.We denote the closure of F by F+
We can find all of F+ by applying Armstrong’s Axioms:f β h β ( fl )if β ⊆ α, then α → β (reflexivity)
if α → β, then γ α → γ β (augmentation)
if α → β, and β → γ, then α → γ (transitivity)( y)
These rules are sound (generate only functional dependencies that actually hold) and
l ( ll f i l d d i h h ld)complete (generate all functional dependencies that hold).
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 10
Example
R = (A, B, C, G, H, I)F = { A → B{
A → CCG → HCG → IB → H }
some members of F+
A → HA → H by transitivity from A → B and B → H
AG → I by augmenting A → C with G, to get AG → CG by a g e g → C w G, o ge G → CG
and then transitivity with CG → I CG → HI
from CG → H and CG → I : “union rule” can be inferred fromdefinition of functional dependencies, or Augmentation of CG → I to infer CG → CGI, augmentation ofCG → H to infer CGI → HI, and then transitivity
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 11
Decomposition
Redundancy causes problems: Solution => decompose schema so that each information content is represented only onceinformation content is represented only onceDefinition: Let R be a relation scheme {R1, ..., Rn} is a decomposition of R if R = R1∪ ... ∪ Rn{ 1 n} p 1 n
(i.e., all of R’s attributes are represented)
We will deal mostly with binary decomposition: { } hR into {R1, R2} where R = R1 ∪ R2
student(s id name dept dept head dept phone grade)student(s_id, name, dept, dept_head, dept_phone, grade)=> student’(s_id, name, dept, grade)
학과(dept, dept_head, grade)( p p g )
Lending = (b_name, asset, b_city, loan#, c_name, amount)=> Branch = (b_name, asset, b_city)
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 12
Loan = (loan#, c_name, amount)
Lossy Decomposition
Careless decomposition leads to loss of information: Lossy decomposition
Lending = (b_name, asset, b_city, loan#, c_name, amount): anomalies due to repetition of information
=> Branch = (b_name, asset, b_city)Loan = (loan#, c_name, amount)
- problem: relationship between loan and branch is lost- loss of information
B h (b b i )=> Branch = (b_name, asset, b_city)Loan = (loan#, c_name, amount, b_city)
- more tuples in natural joinmore tuples in natural join- but we have lost the relationship- loss of information
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 13
Lossy Decomposition (cont.)
Decomposition of R = (A, B) intoR = (A) and R = (B)R1 = (A) and R2 = (B)
A B A B
ααβ
121
αβ
12
( )
Can we recover the original information content?
β 1∏A(r) ∏B(r)
Can we recover the original information content?
∏A (r) ∏B (r)A B
1ααββ
1212 Lossy!
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 14
β 2 Lossy!
Lossless-join Decomposition
For r(R) and decomposition {R1, R2}, it is always the case that ⊆ ∏ ( ) ∏ ( )r ⊆ ∏R1 (r) ∏R2 (r)
Definition: Decomposition {R1, R2} is a lossless-join decomposition of R ifr = ∏R1 (r) ∏R2 (r)r ∏R1 (r) ∏R2 (r)
The information content of the original relation r is always the basis
Lemma: {R1,...,Rn} is a lossless decomposition ifR1 ∩ R2 → R1, or R1 ∩ R2 → R2
i.e., if one of the two subschemas hold the key of the other subschema
r1r2 r1 r2
r
aabb
1123
cdef
aabb
cdef
2
abb
123
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 15
Goal for decomposition
When we decompose a relation schema R with a set of functional dependencies F into R R R e antdependencies F into R1, R2, …, Rn we want1. Lossless decomposition2. No redundancy2. No redundancy3. Dependency preservation
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 16
Boyce-Codd Normal Form
We want a way to decide whether a particular relation R is in “good” form.gDefinition:A relation schema R is in BCNF (with respect to a set F of FDs) iff h FD β i F+ ( R d β R) l f h f ll ifor each FD α → β in F+ (α ⊆ R and β ⊆ R), at least one of the following
holds:α → β is trivial (i.e., β ⊆ α)
i k f Rα is a superkey for R
ExamplepR = (A, B, C), F = {A → B ; B → C}, Key = {A}
R is not in BCNFDecompose into R = (A B) R = (B C)Decompose into R1 = (A, B), R2 = (B, C)
R1 and R2 in BCNFLossless-join decompositionDependency preserving
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 17
Dependency preserving
Third Normal Form
Third Normal FormAll d d ( i h l bl )Allows some redundancy (with resultant problems)But FDs can be checked on individual relations without computing a joinThere is always a lossless-join, dependency-preserving decomposition intoThere is always a lossless join, dependency preserving decomposition into 3NF
A relation schema R is in third normal form (3NF) iffor all α → β in F+ at least one of the following holds:
α → β is trivial (i.e., β ∈ α)β ( β )α is a superkey for REach attribute A in β – α is contained in a candidate key for R.β y
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF it is in 3NF (since in BCNF one of the
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 18
first two conditions above must hold).
Example
R = (J, K, L)F = {JK → L, L → K}F {JK → L, L → K}
Two candidate keys: JK and JLR is in 3NF
JK → L JK is a superkeyL → K K is contained in a candidate key
There is some redundancy in this schemaBCNF decomposition has (JL) and (LK)=> Testing for JK → L requires a join
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 19
Comparison of BCNF and 3NF
It is always possible to decompose a relation into relations in 3NF and3NF and
the decomposition is losslessthe dependencies are preservedthe dependencies are preserved
It is always possible to decompose a relation into relations in BCNF and
the decomposition is losslessit may not be possible to preserve dependencies.
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 20
Comparison of BCNF and 3NF (cont.)
St
s
Zp
z
CR(street, city, zip)
street city → zip s1
s2
s3
z1
z1
z2
c1
c1
c1
street city → zip
zip → city
s3
null
2
z3
1
c2/* 3NF but not in BCNF (nontrivial & zip is not key) */
repetition of information (e.g., the relationship z1, c1)d ll l ( h l i hi h h i
St Zp Zp C
need to use null values (e.g., to represent the relationship z3, c2 where there is no corresponding value for St)
St
s1
s
Zp
z1
z1
Zp
z1
z2
C
c1
c1
R1(street, zip) R2(city, zip)
/* i l 하는 FD의 로하나의 l i 을 */ s2
s3
z1
z2
2
z3
c1
c2
/* violate 하는 FD의 attr로하나의 relation을 */Now R1, R2 are in BCNF but not dependency-preserving.
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 21
MultiValued Dependencies
Basic ConceptR XYZ l i h A MVD X Y h ldR=XYZ: relation scheme. An MVD X →→ Y holds iff each X-value in R is associated with a set of Y-values in a way that does not depend on Z-values.in a way that does not depend on Z values.Formally, for any pair of tuples t1, t2 of r(R)such that t1[X]=t2[X]There exists t3, t4 in r such thatt1[X]=t2[X]= t3[X]=t4[X]t3[Y] t1[Y] N Child S l
Name ->> ChildName -> Salary
t3[Y]=t1[Y]t3[Z]=t2[Z]t4[Y]=t2[Y]
Name Child Salary
A B 1000
A C 1000[ ] [ ]t4[Z]=t1[Z] A D 1000
F G 1200
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 22
F H 1200
MVD
NoteIf X Y h X Z h R XYZIf X →→ Y then X →→ Z where R = XYZX →→ Y is a trivial MVD if
Y ⊆ X or Y ∪ X = RY X or Y ∪ X RWe can always make r satisfy a given MVD by adding more tuples
Name Child Phone
A B 1234
A C 1235
A B 1235A B 1235
A C 1234
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 23
MVD (Cont.)
Tabular representation of α →→ β
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 24
Example
Let R be a relation schema with a set of attributes that are partitioned into 3 nonempt s bsetspartitioned into 3 nonempty subsets.
Y, Z, WW h Y Z (Y l id i Z)We say that Y →→ Z (Y multidetermines Z)if and only if for all possible relations r(R)
< > d < >< y1, z1, w1 > ∈ r and < y2, z2, w2 > ∈ rthen
d< y1, z1, w2 > ∈ r and < y1, z2, w1 > ∈ rNote that since the behavior of Z and W are identical it follows hthat
Y →→ Z if Y →→ W
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 25
Examples & Exercises
1. List all nontrivial MVDs A B C
a1 b1 c1
a1 b1 c2
a2 b1 c1a2 b1 c1
a2 b1 c3
2 . Add tuples to the following tableso that it will satisfy X →→ Y
X Y Z W
x1 y1 z1 w1
x1 y1 z1 w2
x1 y2 z2 w1
3. Prove that α →→ β implies α →→ β .
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 26
3. Prove that α →→ β implies α →→ β .
MVDs and Redundancies
Relations with nontrivial MVDs introduce redundanciesi i l FD > BCNFno nontrivial FDs => BCNF
MVD C_name →→ C_street, C_city holdsstreet and city info need to be repeated for each occurrence of c_namey p _
Loan# C_name C_street C_cityy
L-23 Smith North Rye
L-23 Smith Main ManchesterL 23 Smith Main Manchester
L-93 Curry Lake Horseneck
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 27
Use of Multivalued Dependencies
We use multivalued dependencies in two ways: 1 T l i d i h h h l l d i f1. To test relations to determine whether they are legal under a given set of
functional and multivalued dependencies2. To specify constraints on the set of legal relations. We shall thus concern p y g
ourselves only with relations that satisfy a given set of functional and multivalued dependencies.
If l ti f il t ti f i lti l d d dIf a relation r fails to satisfy a given multivalued dependency, we can construct a relations r′ that does satisfy the multivalueddependency by adding tuples to r.dependency by adding tuples to r.
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 28
Fourth Normal Form (4NF)
Relation schema R is in 4NFw.r.t. a set of dependencies D (FDs & MVDs)if for all MVD α →→ β in D+
α →→ β is a trivial MVD, ORα is a superkey for R
Eff t if th i t i i l MVD > it t b FDEffect: if there is any nontrivial MVD => it must be FD
Lemma: If R is in 4NF then it is in BCNF
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 29
Decomposition
No FDMVD: Name →→ ChildMVD: Name →→ Child
Name →→ PhoneName Child Phone
A B 1234
A C 1235A C 1235
A B 1235
A C 1234Decompose
MVDs become trivial
Name Child
A B
Name Phone
A 1234
A C A 1235
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 30
Lossless Join Decomposition
R: relation scheme; D: set of FDs & MVDs{R R } i l l d i i f R iff{R1, R2} is a lossless decomposition of R iff
R ∩ R →→ R orR1 ∩ R2 →→ R1, orR1 ∩ R2 →→ R2
a more general statement for lossless-joinnot always possible to obtain a dependency-preserving losslessnot always possible to obtain a dependency-preserving lossless join decomposition into 4NF
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 31
Example
R =(A, B, C, G, H, I)
F { A BF ={ A →→ BB →→ HICG →→ H }CG →→ H }
R is not in 4NF since A →→ B and A is not a superkey for RDecompositionDecompositiona) R1 = (A, B) (R1 is in 4NF)b) R2 = (A, C, G, H, I) (R2 is not in 4NF)) 2 ( ) ( 2 )c) R3 = (C, G, H) (R3 is in 4NF)d) R4 = (A, C, G, I) (R4 is not in 4NF)Since A →→ B and B →→ HI, A →→ HI, A →→ Ie) R5 = (A, I) (R5 is in 4NF)
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 32
f)R6 = (A, C, G) (R6 is in 4NF)
Example
Example: library information systemE h book h sEach book has
title, a set of authors, MVDs
titl th,Publisher, anda set of keywords
title →→ authortitle →→ keywordtitle → day, month, year
Non-1NF relation books
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 33
1NF Version
1NF version of books
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 34
flat-books
4NF Decomposition
Remove awkwardness of flat-books by assuming that the following multivalueddependencies hold:dependencies hold:
title →→ authortitle →→ keywordtitle →→ pub-name, pub-branch
Decompose flat-doc into 4NF using the schemas:( i l h )(title, author)(title, keyword)(title, pub-name, pub-branch)(title, pub name, pub branch)
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 35
4NF Decomposition
Original Slides:© Silberschatz, Korth, & Sudarshan
Advanced DB (2008-1)Copyright © 2006 - 2008 by S.-g. Lee Silberschatz Chap 7 - 36
END OF CHAPTER 7