database principles relational database design i
TRANSCRIPT
Database Principles
Relational Database Design I
Database Principles
Good Tables versus Bad Tables:
• A table in a relational database is good if it is about one thing. A table that is not good is bad.
• Problem of RDB Design: Build good tables and convert bad tables into good tables.
• What is a table “about”? The key to the answer is the key.
• The key to a table is the identifier of whatever the table is about.
Database Principles
Good Table Examples:
Pno Pdesc Colour
p1 screw redp2 bolt yellowp3 nut greenp4 washer red
Part
Sno Sname Location
s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil
Supplier
Sno Pno O_date
s1 p1 nov 3s2 p2 nov 4s3 p1 nov 5s3 p3 nov 6s4 p1 nov 7s4 p2 nov 8s4 p4 nov 9
Supplies
Supplier is good because its key is Sno, which identifies different suppliers, and each column in Supplier – Sname and Location – is a piece
of information about Suppliers.
Exercise: Explain why Part is a good table.
Supplies is good because its key is (Sno,Pno), which identify individual orders, and the only other column in the table – O_date – is a piece of info about individual orders.
Database Principles
Bad Table Example:
Sno Sname Location O_date Pno Pdesc Color
s1 Acme NY nov 3 p1 screw reds2 Ajax Bos nov 4 p2 bolt yellow s3 Apex Chi nov 5 p1 screw reds3 Apex Chi nov 6 p3 nut greens4 Ace LA nov 7 p1 screw reds4 Ace LA nov 8 p2 bolt yellows4 Ace LA nov 9 p4 washer red
Supplier Part Supplies
Even though this table has info about Suppliers, Parts and Supplies its key is (Sno,Pno).And so this table is “about” whatever its key identifies, namely Supplies. But the table contains various columns – Sname, Location, Pdesc, Color – that are not about Supplies but about Supplier and Part respectively.
So this table is “bad” and the process of RDB Design would be to reform this table intothe three tables on the previous slide.
Quick Observation: If your tables come from an ERD they are normally pretty “good”.
Database Principles
Bad Tables can be Useful, if not Good:
• The previous table, called bad, is so only if it is a permanent table. As part of an on-going query it is not considered bad since data is not stored permanently in this format.
• Suppose you never, ever expect to look at the supplies table information without knowing the name of the supplier and the part supplied. If the join table does not exist then you will always have to construct the join. This can be time consuming and to save that time you might keep the table in pre-joined, permanent form.
Database Principles
What’s so Bad about a Bad Table?
• Suppose instead of three tables
we only have one table
Pno Pdesc Colour
p1 screw redp2 bolt yellowp3 nut greenp4 washer red
Part
Sno Sname Location
s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil
Supplier
Sno Pno O_date
s1 p1 nov 3s2 p2 nov 4s3 p1 nov 5s3 p3 nov 6s4 p1 nov 7s4 p2 nov 8s4 p4 nov 9
Supplies
Sno Sname Location O_date Pno Pdesc Color
s1 Acme NY nov 3 p1 screw reds2 Ajax Bos nov 4 p2 bolt yellow s3 Apex Chi nov 5 p1 screw reds3 Apex Chi nov 6 p3 nut greens4 Ace LA nov 7 p1 screw reds4 Ace LA nov 8 p2 bolt yellows4 Ace LA nov 9 p4 washer red
Supplier Part Supplies
Database Principles
Insert Anomaly:
• We want to add supplier A-1 to the database but for now we have no parts that A-1 supplies. Since the key to the table is (Sno,Pno) we can’t add a row until we have values for both Sno and Pno.
Sno Sname Location O_date Pno Pdesc Color
s1 Acme NY nov 3 p1 screw reds2 Ajax Bos nov 4 p2 bolt yellow s3 Apex Chi nov 5 p1 screw reds3 Apex Chi nov 6 p3 nut greens4 Ace LA nov 7 p1 screw reds4 Ace LA nov 8 p2 bolt yellows4 Ace LA nov 9 p4 washer red
Supplier Part Supplies
s5 A-1 Phil null null null null
not permitted
Database Principles
Update Anomaly:
• What happens if the part, p2, changes its color from yellow to purple?
• We must search every row of the join-table and change every instance of yellow to purple in rows involving the supplying of part, p2.
Sno Sname Location O_date Pno Pdesc Color
s1 Acme NY nov 3 p1 screw reds2 Ajax Bos nov 4 p2 bolt yellow s3 Apex Chi nov 5 p1 screw reds3 Apex Chi nov 6 p3 nut greens4 Ace LA nov 7 p1 screw reds4 Ace LA nov 8 p2 bolt yellows4 Ace LA nov 9 p4 washer red
Supplier Part Supplies
multiplechanges
Database Principles
Update Anomaly (cont):
• This one change in the real world makes for many changes in the database.
• What if we mess up and end up not making all changes?
• Now, what color is p2?
Sno Sname Location O_date Pno Pdesc Color
s1 Acme NY nov 3 p1 screw reds2 Ajax Bos nov 4 p2 bolt yellow s3 Apex Chi nov 5 p1 screw reds3 Apex Chi nov 6 p3 nut greens4 Ace LA nov 7 p1 screw reds4 Ace LA nov 8 p2 bolt yellows4 Ace LA nov 9 p4 washer red
Supplier Part Supplies
purple
twodifferentcolors
Database Principles
Delete Anomaly:
• What if we cancel the order for bolts from supplier, s2?
• A consequence is that we lose all information about the supplier, s2.
Sno Sname Location O_date Pno Pdesc Color
s1 Acme NY nov 3 p1 screw reds2 Ajax Bos nov 4 p2 bolt yellow s3 Apex Chi nov 5 p1 screw reds3 Apex Chi nov 6 p3 nut greens4 Ace LA nov 7 p1 screw reds4 Ace LA nov 8 p2 bolt yellows4 Ace LA nov 9 p4 washer red
Supplier Part Supplies
Database Principles
So What Can Be Done?
• Suppose we keep these two tables
and we also keep this table
Pno Pdesc Colour
p1 screw redp2 bolt yellowp3 nut greenp4 washer red
Part
Sno Sname Location
s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil
Supplier
Sno Pno O_date
s1 p1 nov 3s2 p2 nov 4s3 p1 nov 5s3 p3 nov 6s4 p1 nov 7s4 p2 nov 8s4 p4 nov 9
Supplies
Sno Sname Location O_date Pno Pdesc Color
s1 Acme NY nov 3 p1 screw reds2 Ajax Bos nov 4 p2 bolt yellow s3 Apex Chi nov 5 p1 screw reds3 Apex Chi nov 6 p3 nut greens4 Ace LA nov 7 p1 screw reds4 Ace LA nov 8 p2 bolt yellows4 Ace LA nov 9 p4 washer red
Supplier Part Supplies
Any ideas?
Database Principles
So What Can Be Done? (cont)
• There is the issue of data consistency. • Given the same information stored in several places it
becomes a big job to make sure this data is consistent.• If we lose data consistency then all the data essentially
becomes “noise”.
Database Principles
Some Notation:• A table is sometimes called a relation.
– We use R, S and T and nearby letters to represent tables.
• Table columns are also called attributes. – We use A, B and C and nearby letters to represent
columns.• The possible values in a column A of table R are called the
domain of A, dom(A).• Table schemas are lists of table columns.
– We use R, S and T to represent schemas.
Database Principles
Some Notation (cont):• Table rows are also called tuples.
– We use r, s and t and nearby letters to represent rows
Subsets of a table schema are represented by X, Y and Z and nearby letters.– X R is a subset of the list of all columns in a table.
• r[A] is the value in row r column A.
• r[X] is the subrow of r consisting of the values in the columns of X.
∩
Database Principles
Example:
A B C D E F
a1 b1 c1 d1 e1 f1a2 b2 c2 d2 e2 f2a3 b3 c3 d3 e3 f3
an bn cn dn en fn. . .
R
Table Name: Letter from middle of alphabet – R, S, T
Column Name: Letter from beginning of alphabet – A, B, C
r = ( a2, b2, c2, d2, e2, f2 )
∩X = { A, B, C } R: X is a subset of the schema; a letter at the end of the alphabet
R = { A, B, C, D, E, F }; the schema of R
r[A] = (a2); a singleton tuple
r[X] = ( a2, b2, c2 ); a subset of r
r
Database Principles
What is a Key to a Table?
• A key is a set of columns of a table whose values uniquely identify distinct rows of the table.
• A key is a set of columns of a table such that if you know the values of the columns in the key, there is at most one row in the table with these values.
Def’n: For any table R, if X is a subset of R, then X is a key to the table R if the following is true: for any two rows r and s of R, if r[X] = s[X] then r = s. In other words, r and s are the same row.
In other words, any two rows that agree on X agree everywhere
Sno Sname Location
s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil
Supplier
One key to Supplier is {Sno}; are there any others?
What about {Sno, Location}?
Database Principles
Not all Columns are Keys:
• Why isn’t {Location} a key to Supplier?
• Because at some point in the future we may add a new supplier who comes from Boston, for example.
Sno Sname Location
s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil
Supplier
Database Principles
Keys, Keys and More Keys:
• We saw earlier that {Sno, Location} is also a key to Supplier. It is called a superkey.
Sno Sname Location
s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil
Supplier
Def’n: For a table R, if X is a key of R and X Y, then Y is a superkey of R.
Any set of columns that contains a key to a table is a superkey of the same table.
Superkeys are keys too.
Supplier has many superkeys – {Sno}, {Sno,Sname}, {Sno,Location}, Supplier
Database Principles
Exercise:
• Prove that any table has at least one superkey.
• Answer: The schema itself is a superkey.
Database Principles
Keys, Keys and More Keys:
• Some keys are smaller (fewer columns) than others.• Some keys can’t be made any smaller (fewer columns).
Sno Sname Location
s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil
Supplier
Def’n: For a table R, if X is a key of R and for any Y X, we know that Y is not a key of R, then X is called a candidate key of R.
The only candidate key to Supplier is {Sno}.
Database Principles
Candidate Keys:
• What made us decide {Sno} was a candidate key of Supplier?
• What makes us say the {Location} is not a key to Supplier?
• What makes us decide that something is a key is a rule about the real world that makes it so. We call such a rule and Enterprise Rule.
We know that no two suppliers were assigned the same number
We know there is no rule saying suppliers must come from different locations.
Database Principles
Multiple Candidate keys (1):
• In the table below there are 2 possible candidate keys:
StudentID SSN Fname Lname DOB Address
Student
{StudentID} and {SSN}
Database Principles
Multiple Candidate keys (2):
• In the table below there are 2 possible candidate keys:
• Candidate keys don’t need to be the same size.
{CourseID,SectionID} and {RoomNum, BldgID,TimeSlot}
CourseID SectionID RoomNum BldgID TimeSlot
Fall08RoomAssignments
Database Principles
Primary key:
• What do you do when you have candidates?
• The candidate key that wins the election is called the primary key. There is only one primary key in a table.
hold an election
Database Principles
Primary Key Examples:
• What are the primary keys of each of the tables below.
Sno Sname Location
s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil
SupplierPno Pdesc Colour
p1 screw redp2 bolt yellowp3 nut greenp4 washer red
Part
Sno Pno O_date
s1 p1 nov 3s2 p2 nov 4s3 p1 nov 5s3 p3 nov 6s4 p1 nov 7s4 p2 nov 8s4 p4 nov 9
Supplies
pk = {Sno} pk = {Pno}
pk = {Sno,Pno,O_date} or {Sno,Pno}
the difference in what is the primary key is determined by the Enterprise Rules
Database Principles
Foreign Keys:
• What makes someone a foreigner?
• What makes a set of columns a foreign key?
• In the Supplies table both {Sno} and {Pno} are foreign keys because they are primary keys in other tables; Supplier and Part respectively.
being physically in a country other than their own
columns are a foreign key if they are a primary key in some other table
Database Principles
What are the Foreign Keys?
borrowerid b_name b_addr b_status loan_limit
pk
Cardholder
borrowerid isbn r_date
pk
Reserves
isbn author title pub_name pub_date c_price
pk
Book
accession_no isbn p_price
pk
Copy
borrowerid accession_no l_date
Borrows
pk
fk fk
fkfk fk