dst revision – week 1

123
DST Revision – Week 1 Entity-Relationship Modelling, Database design Normalisation

Upload: steven-bowman

Post on 01-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

DST Revision – Week 1. Entity-Relationship Modelling, Database design Normalisation. Assessed Components. Assignment – due next week Exam – Jan 21 st Two hours 30 multiple choice questions (1 mark each) Two long questions (35 marks each). ER Diagrams. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DST Revision – Week 1

DST Revision – Week 1

Entity-Relationship Modelling, Database design

Normalisation

Page 2: DST Revision – Week 1

Assessed Components

• Assignment – due next week

• Exam – Jan 21st Two hours– 30 multiple choice questions (1 mark each)– Two long questions (35 marks each)

Page 3: DST Revision – Week 1

ER Diagrams

A tool for Conceptual Data Modelling

Page 4: DST Revision – Week 1

An Entity-Relationship Diagram

Page 5: DST Revision – Week 1

What’s wrong with this?

Page 6: DST Revision – Week 1

Discovering Data Entities

• Never confuse data entities with other elements of the problem to be solved

• A true data entity will have many possible instances, each with a distinguishing characteristic

• Treasurer is the person entering data – and data about the treasurer has nothing whatsoever to do with this problem

• Is the expense report entity necessary? No - it is only the result of extracting data from the database. Even though there will be multiple instances of expense reports given to the treasurer over time, data needed to compute the report contents each time are already represented by the ACCOUNT and EXPENSE entity types

• “Gives-to” and “Receives” are business activities, not relationships between entities.

Page 7: DST Revision – Week 1

The Correct E-R Model

Page 8: DST Revision – Week 1

Attributes and Weak Entities

Page 9: DST Revision – Week 1

Strong & Weak Entities

Most entities are classified as strong entity types [Rectangle] – ones that exist independently from other entity types (such as EMPLOYEE)

These always have a unique characteristic - an attribute or combination of attributes - that uniquely distinguish each occurrence of that identity

A weak entity type [[Double Rectangle]] depends on some other entity type. It has no meaning in the ER diagram without the entity on which it depends (such as DEPENDENT)

The entity type on which the weak entity type depends is called the Identifying owner (or owner for short).

Page 10: DST Revision – Week 1

Weak entities

The Identifying relationship is the relationship between a weak entity type and and its owner (such as ‘Has’ in the previous slide)

The weak entity identifier is its partial identifier (double underline) combined with that of its owner. During a later design stage dependent name will be combined with Employee_ID (the identifier of the owner) to form a full identifier for DEPENDENT.

Page 11: DST Revision – Week 1

Attributes

• An attribute is a property or characteristic of an entity type, for example the entity EMPLOYEE may have attributes Employee_Name and Employee_Address.

• In ER diagrams (drawn in this way) place the attribute name in an ellipse with a line connecting it to its associated entity

• Attributes may also be associated with relationships• An attribute is associated with exactly one entity or

relationship

Page 12: DST Revision – Week 1

A Composite Attribute

Page 13: DST Revision – Week 1

Simple and Composite Attributes

• Some attributes can be broken down into meaningful component parts, such as Address, which can be broken down into Street_Address, Town, Postcode... etc.

• The component attributes may appear above or below the composite attribute on an ER diagram

• Provide flexibility to users, you can refer to it as a single unit or to the individual components

• A simple (atomic) attribute is one that cannot be broken down into smaller components

Page 14: DST Revision – Week 1

An Entity with a Multivalued attribute (Skill) and a derived attribute (Years_Employed)

Page 15: DST Revision – Week 1

Multivalued Attributes

• An attribute that may have more than one value for a given instance, e.g. EMPLOYEE may have more than one Skill.

• A multivalued attribute is one that may take on more than one value – it is represented by an ellipse with double lines

Page 16: DST Revision – Week 1

Derived Attributes

• Some attribute values can be calculated or derived from others e.g., if Years_Employed needs to be calculated for EMPLOYEE, it can be calculated using Date_Employed and Today's_Date

• A derived attribute is one whose value can be calculated from related attribute values (plus possibly other data not in the database)

• A derived attribute is signified by an ellipse with a dashed line (see previous Fig.)

Page 17: DST Revision – Week 1

Simple Identifier attribute (Key)

Page 18: DST Revision – Week 1

Identifier Attribute

• Identifier attribute or Key is an attribute (or combination of attributes) that uniquely identifies individual instances of an entity type, such as Student_ID

• To be a candidate identifier, each entity instance must have a single value for the attribute, and the attribute must be associated with each entity

• The identifier attribute is underlined, such as Student_ID

Page 19: DST Revision – Week 1

Composite Identifier

• A Composite Identifier is when there is no single (or atomic attribute) that can serve as an identifier

• For example, in a database that tracks flights, Flight_ID is a composite identifier that has component attributes Flight_Number and Date – this combination is required to uniquely identify individual occurrences of Flight

• Flight_ID is underlined, whilst its components are not (see next slide)

Page 20: DST Revision – Week 1

Composite key attribute

Page 21: DST Revision – Week 1

Criteria for selecting identifiers

Some entities have more than one candidate identifier, so the following criteria should be used:

Choose identifier that will not change in value over the life of each instance of the entity type

Choose identifier that is guaranteed to have valid values and Will not be null (or unknown). If composite, make sure all parts will have valid values

Page 22: DST Revision – Week 1

Selecting identifiers

Don’t pick identifiers whose structure indicates classifications, locations or people that might change. e.g. the first two digits of an identifier may indicate a warehouse location, but such codes are often changed as conditions change, rendering them invalid.

Consider substituting new, simple identifiers for long, composite ones, e.g. an attribute called Game_Number could be used for the entity type GAME instead of Home_Team and Away_Team

Page 23: DST Revision – Week 1

Relationships

• A relationship is an association among the instances of one or more entity types that is of interest to the organisation

• Relationship Type is a meaningful association between (or among) entities – implying that the relationship allows us to answer questions that could not be answered given only the entity types. It is denoted by a diamond symbol

Page 24: DST Revision – Week 1

Relationship types and instances

(a) Relationship type (Completes)

Page 25: DST Revision – Week 1

Attributes on relationships

• Attributes may be associated with a relationship, as well as with an entity

• For example an organisation may want to record the date when an employee completes each course

• In the following diagram, the relationship ‘Completes’ joins the EMPLOYEE and COURSE entities, and Date_Completed is joined to this as it is a property of the

relationship ‘Completes’

Page 26: DST Revision – Week 1

Attribute on a Relationship

Page 27: DST Revision – Week 1

Associative entities (gerunds)

• The presence of one or more attributes on a relationship suggests that the relationship should be represented as an entity type

• An associative entity is an entity type that associates the instances of one or more entity types and contains attributes that are specific to the relationship between those entity instances.

• The associative entity type CERTIFICATE is represented with the diamond relationship symbol enclosed within the entity rectangle

Page 28: DST Revision – Week 1

Associative Entities

• The following figure shows the relationship ‘Completes’ converted to an associative entity type.

• A CERTIFICATE is awarded to each EMPLOYEE who completes a COURSE, each certificate has a Certificate_Number that serves as the identifier

Page 29: DST Revision – Week 1

(b) An associative entity (CERTIFICATE)

Page 30: DST Revision – Week 1

Associative Entities (gerunds)

• The purpose of the special symbol is to preserve the information that the entity was initially specified as a relationship on the ER diagram

• There is no relationship diamond on the line between an associative entity and a strong entity, because the associative entity represents the relationship

Page 31: DST Revision – Week 1

Associative entitiesHow do you know when to convert a relationship to an

associative entity type? Four conditions should exist:

• All of the relationships are ‘many’ relationships• The resulting associative identity type has independent

meaning to end-users, and can preferably be identified with a single-attribute identifier

• The associative entity has one or more attributes in addition to the identifier

• The associative entity participates in one or more relationships independent of the entities related in the associated relationship

Page 32: DST Revision – Week 1

Degree of a Relationship

• Is the number of entity types that participate in it.• Thus ‘Completes’ has degree 2, since there are two

participating entity types, EMPLOYEE and COURSE• The three most common relationship degrees are unary

(degree 1), binary (degree 2) and ternary (degree 3)• Higher degree relationships are possible but rarely

encountered in practice

Page 33: DST Revision – Week 1

Unary relationship

• Is between the instances of a single entity type (also called recursive relationships)

• ‘Is_Married_To’ is a one-to-one relationship between instances of the PERSON entity type

• ‘Manages’ is a one-to-many relationship between instances of the EMPLOYEE entity type

Page 34: DST Revision – Week 1

Binary relationships

• Between the instances of two entity types, and is the most common type of relationship encountered in data modelling. e.g. (one-to-one) an EMPLOYEE is assigned one PARKING_PLACE, and each PARKING_PLACE is assigned to one EMPLOYEE

• e.g. (one to many) a PRODUCT_LINE may contain many PRODUCTS, and each PRODUCT belongs to only one PRODUCT_LINE

• e.g. (many-to-many) a STUDENT may register for more than one COURSE, and each COURSE may have many STUDENTS

Page 35: DST Revision – Week 1

Ternary relationships

• A ternary relationship is a simultaneous relationship among the instances of three entity types

• Let’s see this in an E-R diagram

Page 36: DST Revision – Week 1

Ternary Relationships

Page 37: DST Revision – Week 1

Ternary Relationships

• Vendors can supply various parts to warehouses• The relationship ‘Supplies’ is used to record the specific

PARTs supplied by a given VENDOR to a particular WAREHOUSE

• There are two attributes on the relationship ‘Supplies’, Shipping_Mode and Unit_Cost

• e.g. one instance of ‘Supplies might record that VENDOR X can ship PART C to WAREHOUSE Y, that the Shipping_Mode is ‘next_day_air’ and the Unit_Cost is £5-00 per unit

Page 38: DST Revision – Week 1

Ternary relationships

• We do not use diamond symbols on the lines from SUPPLY_SCHEDULE to the three entities, because these lines do not represent binary relationships

• It is recommended that all ternary (or higher) relationships are converted into associative entities (as in the slide), as it makes the representation of participation constraints (discussed later) easier

• Many CASE tools cannot represent ternary relationships, so you must represent the ternary relationship with an

associative entity and three binary relationships

Page 39: DST Revision – Week 1

Cardinality Constraints

• The number of instances of one entity that can or must be associated with each instance of another entity.

• If we have two entity types A and B, the cardinality constraint specifies the number of instances of entity B that can (or must) be associated with entity A

• e.g. a video store may stock more than one VIDEOTAPE for each MOVIE, this is a ‘one-to-many’ relationship.

Page 40: DST Revision – Week 1

Cardinality Constraints

(a) Basic relationship

Page 41: DST Revision – Week 1

(b) Relationship with cardinality constraints

Page 42: DST Revision – Week 1

Minimum Cardinality

• The minimum cardinality of a relationship is the minimum number of instances of an entity B that may be associated with each instance of an entity A

• In our example, the minimum number of VIDEOTAPES of a MOVIE is zero (entity B is an optional participant in the ‘Is_Stocked_As’ relationship)

• This is signified by the symbol zero through the arrow near the VIDEOTAPE entity.

Page 43: DST Revision – Week 1

(b) Relationship with cardinality constraints

Page 44: DST Revision – Week 1

Maximum cardinality

• Is the maximum number of instances of an entity B that may be associated with each instance of entity A

• In the following slide the maximum cardinality for the VIDEOTAPE entity type is ‘many’ (an unspecified number greater than 1)

• This is indicated by the ‘crow’s foot’ symbol on the arrow next to the VIDEOTAPE entity symbol

Page 45: DST Revision – Week 1

(b) Relationship with cardinality constraints

Page 46: DST Revision – Week 1

Mandatory one cardinality

• Relationships are bi-directional, so there is also cardinality notation next to the MOVIE entity

• Notice that as the minimum and maximum are both one, this is called mandatory one cardinality (i.e., each VIDEOTAPE of a MOVIE must be a copy of exactly one movie)

• VIDEOTAPE is represented as a weak entity because it cannot exist unless the original owner movie also exists

Page 47: DST Revision – Week 1

Mandatory one cardinality

• The identifier of the MOVIE is ‘Movie_Name’• VIDEOTAPE does not have a unique identifier,

however the partial identifier Copy_Number together with Movie_Name would uniquely identify an instance of VIDEOTAPE

Page 48: DST Revision – Week 1

Example of mandatory cardinality constraints

• Each PATIENT has one or more PATIENT_HISTORIES (the initial patient visit is always recorded as an instance of PATIENT_HISTORY)

• Each instance of PATIENT_HISTORY ‘Belongs to’ exactly one PATIENT (see following Fig.)

Page 49: DST Revision – Week 1

Mandatory cardinalities

Page 50: DST Revision – Week 1

Example of one optional, one mandatory cardinality constraint• EMPLOYEE Is_Assigned_To PROJECT• Each PROJECT has at least one EMPLOYEE assigned

to it (some projects have more than one)• Each EMPLOYEE may or (optionally) may not be

assigned to any existing PROJECT, or may be assigned to one or more PROJECTs (see following Fig.)

Page 51: DST Revision – Week 1

One optional, one mandatory cardinality

Page 52: DST Revision – Week 1

An example using a ternary relationship

• PART and WAREHOUSE are mandatory participants in the relationship, whilst VENDOR is an optional participant

• The cardinality of each of the participating entities is mandatory one, since each SUPPLY_SCHEDULE instance must be related to exactly one instance of each of these participating entity types

Page 53: DST Revision – Week 1

An example using a ternary relationship

• Each VENDOR can supply many PARTs to any number of WAREHOUSES, but need not supply any parts

• Each PART can be supplied by any number of VENDORs to more than one WAREHOUSE, but each part must be supplied by at least one vendor to a warehouse

• Each WAREHOUSE can be supplied with any number of PARTS from more than one VENDOR, but each warehouse must be supplied with at least one part

Page 54: DST Revision – Week 1

Cardinality constraints in a ternary relationship

Page 55: DST Revision – Week 1

An example using a ternary relationship

• A ternary relationship is not equivalent to three binary relationships

• Unfortunately you cannot draw ternary relationships with many CASE tools

• Instead you must represent ternary relationships as three binaries

• If you are forced to do this, then do not draw the binary relationships with diamonds and make sure the cardinality next to the three strong entities are mandatory one

Page 56: DST Revision – Week 1

Multiple Relationships

• An organisation may want to model more than one relationship between the same entity types

• The following figure shows two relationships between PROFESSOR and COURSE

• The relationship Is_Qualified associates professors with the courses they are qualified to teach

• A given course may have more than one person qualified to teach it, or (optionally) may not have any qualified instructors (such as a new course)

• Each professor should be qualified to teach at least one course (we hope!)

Page 57: DST Revision – Week 1

Multiple Relationships

• The second relationship in this figure associates professors with the courses they actually teach during a given semester (where the maximum cardinality for a given semester is 4)

• This shows how a fixed constraint (upper or lower) can be recorded

• The attribute ‘Semester’ (which could be a composite attribute with components ‘Semester_Name’ and ‘Year’) is on the relationship Is_Scheduled)

Page 58: DST Revision – Week 1

(b) Professors and courses (fixed upon constraint)

Page 59: DST Revision – Week 1

Review of Basic E-R Notation

Page 60: DST Revision – Week 1
Page 61: DST Revision – Week 1

Building Relational Databases

Page 62: DST Revision – Week 1

Database Design Steps

1) Determine the purpose of the database• What is it going to do?• What data do you need to collect?• What information do you want to be able to extract?

2) What tables and fields do you need?• What fields belong in each table?• What properties does each field need?• How are the tables going to be related?• What are the Primary Keys that link the tables

3) Build the tables• Enter test data• Test/review• Revise

Page 63: DST Revision – Week 1

Basic Design Rules of Relational Databases

• Unique Field Names– Keep field names unique across tables, and

keep them as clear as possible in each table.

• No Calculated or Derived Fields– Calculations and derivations can be

performed in Queries, Forms and Reports. Doing them in a table only increases the chance of data entry error.

Page 64: DST Revision – Week 1

Basic Design Rules of Relational Databases

• Data is broken down into Smallest Logical Parts– Smallest “Sortable” parts. Remember it’s

much easier to put fields together than pull them apart.

• Unique Records– Each of your tables should have unique

records. We ensure this by setting one field to be a Primary Key. This can be a unique datum or an AutoNumber.

Page 65: DST Revision – Week 1

One to Many Relationships

• One Birdfeeder is visited by Many Birds• One Garden contains Many Birdfeeders• One Patient has Many Prescriptions• One Hospital has Many Patients• One Student attends Many Classes

• One to Many relationships are the most common relationships.

• A record MUST be in the One table in order to appear in the Many table.

Page 66: DST Revision – Week 1

One to Many Relationships

1 ∞Medical Record #

Patients Prescription Number

Medical Record #

Medications

Primary Key linked to Foreign Key

Page 67: DST Revision – Week 1

One to One Relationships

• One Garden has One Address• One Patient has One Home Phone Number• One Student has One Student ID

One to One relationships can often be combined into a single table.

Page 68: DST Revision – Week 1

Many to Many Relationships

• Many Students are taught by Many Teachers• Many Patients see Many Doctors• Many Medications are taken by Many Patients• Many Customers buy Many Products

Many to Many relationships are also very common.

You cannot handle these using an RDBMS!

Page 69: DST Revision – Week 1

Sales Database

Many to Many Relationship

CUSTOMERS

Customer ID

First

Last

Address

City

State

Zip

PRODUCTS

Product ID

Product

Supplier

Description

Units

Cost

Price

Page 70: DST Revision – Week 1

Sales Database

CUSTOMERS

Customer ID

First

Last

Address

City

State

Zip

PRODUCTS

Product ID

Product

Supplier

Description

Units

Cost

PriceSALES

Sales ID

Customer

Product

Date

Quantity

1

One Customer can have many

sales

1

One Product can be sold many

times

Page 71: DST Revision – Week 1

ExamplesPatients

Patient ID

First

Last

Address

City

State

Zip

Medications

Med ID

Medication

Description

Patient Meds

PM ID

Patient ID

Med ID

Dosage

Directions

1

One Patient can take many

Medications

1

One Kind of Medication can

be taken by Many Patients

Page 72: DST Revision – Week 1
Page 73: DST Revision – Week 1

Transforming E-R diagrams into Relational Databases

Page 74: DST Revision – Week 1

Relational Database

• A database modelled using:– Relations (properly formed tables)– Relationships between the Relations

Page 75: DST Revision – Week 1

Relation• Definition: A relation is a named, two-dimensional table of

data • Table consists of rows (records), and columns (attribute or

field)• Requirements for a table to qualify as a relation:

– It must have a unique name.– Every attribute value must be atomic (not multivalued, not composite)– Every row must be unique (can’t have two rows with exactly the

same values for all their fields)– Attributes (columns) in tables must have unique names– The order of the columns must be irrelevant– The order of the rows must be irrelevant

NOTE: all relations are in 1st Normal form

Page 76: DST Revision – Week 1

Correspondence with E-R Model

• Relations (tables) correspond with entity types and with many-to-many relationship types

• Rows correspond with entity instances and with many-to-many relationship instances

• Columns correspond with attributes

Page 77: DST Revision – Week 1

Key Fields

• Keys are special fields that serve two main purposes:– Primary keys are unique identifiers of the relation in question.

Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique

– Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship)

• Keys can be simple (a single field) or composite (more than one field)

• Keys usually are used as indexes to speed up the response to user queries

Page 78: DST Revision – Week 1

Primary Key

Foreign Key (implements 1:N relationship between customer and order)

Combined, these are a composite primary key (uniquely identifies the order line)…individually they are foreign keys (implement M:N relationship between order and product)

Page 79: DST Revision – Week 1

Constraints

• Reduce the chance that users will enter incorrect data

• Domain Constraints– The allowable values for an attribute (types,

lengths etc). – Assist with the integrity of the entity– No primary key attribute may be null. All

primary key fields MUST have data

Page 80: DST Revision – Week 1

Domain definitions enforce domain integrity constraints

Page 81: DST Revision – Week 1

Integrity Constraints

• Referential Integrity – rule that states that any foreign key value (on the relation of the many side) MUST match a primary key value in the relation of the one side. (Or the foreign key can be null) – For example: Delete Rules

• Restrict – don’t allow delete of “parent” side if related rows exist in “dependent” side

• Cascade – automatically delete “dependent” side rows that correspond with the “parent” side row to be deleted

• Set-to-Null – set the foreign key in the dependent side to null if deleting from the parent side not allowed for weak entities

Page 82: DST Revision – Week 1

Figure 5-5: Referential integrity constraints (Pine Valley Furniture)

Referential integrity

constraints are drawn via arrows from dependent to

parent table

Page 83: DST Revision – Week 1

Referential integrity

constraints are implemented with

foreign key to primary key references

Page 84: DST Revision – Week 1

Transforming E-R Diagrams into Relations

Mapping Regular Entities to Relations

1. Simple attributes: E-R attributes map directly onto the relation

2. Composite attributes: Use only their simple, component attributes

3. Multivalued Attribute - Becomes a separate relation with a foreign key taken from the superior entity

Page 85: DST Revision – Week 1

(a) CUSTOMER entity type with simple attributes

Mapping a regular entity

(b) CUSTOMER relation

Page 86: DST Revision – Week 1

(a) CUSTOMER entity type with composite attribute

Figure 5-9: Mapping a composite attribute

(b) CUSTOMER relation with address detail

Page 87: DST Revision – Week 1

Figure 5-10: Mapping a multivalued attribute

1–to–many relationship between original entity and new relation

(a)

Multivalued attribute becomes a separate relation (Table) with foreign key

(b)

Page 88: DST Revision – Week 1

Transforming ER Diagrams into Relations (cont.)

Mapping Binary Relationships– One-to-Many - Primary key on the one side

becomes a foreign key on the many side– Many-to-Many - Create a new relation with

the primary keys of the two entities as its primary key

– One-to-One - Primary key on the mandatory side becomes a foreign key on the optional side

Page 89: DST Revision – Week 1

Figure 5-12a: Example of mapping a 1:M relationshipRelationship between customers and orders

Note the mandatory one

Page 90: DST Revision – Week 1

Figure 5-12b Mapping the relationship

Again, no null value in the foreign key…this is because of the mandatory minimum cardinality

Foreign key

Page 91: DST Revision – Week 1

Figure 5-13a: Example of mapping an M:N relationshipE-R diagram (M:N)

The Supplies relationship will need to become a separate relation

Page 92: DST Revision – Week 1

Figure 5-13b Three resulting relations

New intersection

relationForeign key

Foreign key

Composite primary key

Page 93: DST Revision – Week 1

Figure 5-14a: Mapping a binary 1:1 relationshipIn_charge relationship

Page 94: DST Revision – Week 1

Figure 5-14b Resulting relations

Page 95: DST Revision – Week 1

Transforming ER Diagrams into Relations (cont.)

Mapping Associative Entities– Identifier Not Assigned

• Default primary key for the association relation is composed of the primary keys of the two entities (as in M:N relationship)

– Identifier Assigned • It is natural and familiar to end-users• Default identifier may not be unique

Page 96: DST Revision – Week 1
Page 97: DST Revision – Week 1
Page 98: DST Revision – Week 1

Figure 5-16a: Mapping an associative entity with an identifierAssociative entity

Page 99: DST Revision – Week 1

Figure 5-16b Three resulting relations

Page 100: DST Revision – Week 1

Transforming ER Diagrams into Relations (cont.)

Mapping Unary Relationships– One-to-Many - Recursive foreign key in the

same relation– Many-to-Many - Two relations:

• One for the entity type• One for an associative relation in which the

primary key has two attributes, both taken from the primary key of the entity

Page 101: DST Revision – Week 1

Figure 5-17: Mapping a unary 1:N relationship

(a) EMPLOYEE entity with Manages relationship

(b) EMPLOYEE relation with recursive foreign key

Page 102: DST Revision – Week 1

Figure 5-18: Mapping a unary M:N relationship

(a) Bill-of-materials relationships (M:N)

(b) ITEM and COMPONENT relations

Page 103: DST Revision – Week 1
Page 104: DST Revision – Week 1

Normalisation

Page 105: DST Revision – Week 1

Data Normalization

• Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data

• The process of decomposing relations with anomalies to produce smaller, well-structured relations

Page 106: DST Revision – Week 1

Well-Structured Relations

• A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies

• Goal is to avoid anomalies– Insertion Anomaly – adding new rows forces user to create

duplicate data– Deletion Anomaly – deleting rows may cause a loss of data

that would be needed for other future rows– Modification Anomaly – changing data in a row forces

changes to other rows because of duplication

A table should not contain more than one entity type

Page 107: DST Revision – Week 1

Example – Figure 5.2b

Question – Is this a relation? Answer – Yes: unique rows and no multivalued attributes

Question – What’s the primary key? Answer – Composite: Emp_ID, Course_Title

Page 108: DST Revision – Week 1

Anomalies in this Table• Insertion – can’t enter a new employee without

having the employee take a course• Deletion – if we remove employee 140, we lose

information about the existence of a Tax Acc class• Modification – giving a salary increase to employee

100 forces us to update multiple records

Why do these anomalies exist? Because there are two entity types into a single relation. This results in duplication, and an unnecessary dependency between the entities

Page 109: DST Revision – Week 1

Functional Dependencies and KeysCandidate Key:

– A unique identifier. One of the candidate keys will become the primary key

• E.g. perhaps there is both credit card number and NI number in a table…in this case both are candidate keys

– Each non-key field should be functionally dependent on the primary key

Page 110: DST Revision – Week 1

Steps in normalization

Page 111: DST Revision – Week 1

First Normal Form

• No multivalued attributes• Every attribute value is atomic• The next slide is not in 1st Normal Form

(multivalued attributes) therefore it is not a relation

• All relations are in 1st Normal Form

Page 112: DST Revision – Week 1

Table with multivalued attributes, not in 1st normal form

NOT a relation – just a table!

Page 113: DST Revision – Week 1

Table with no multivalued attributes and unique rows, in 1st normal form

This is a relation, but not a well-structured one

Page 114: DST Revision – Week 1

Anomalies in this Table• Insertion – if new product is ordered for order 1007

of existing customer, customer data must be re-entered, causing duplication

• Deletion – if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price

• Update – changing the price of product ID 4 requires update in several records

Why do these anomalies exist? Because there are multiple entity types into a single relation. This results in duplication, and an unnecessary dependency between the entities

Page 115: DST Revision – Week 1

Second Normal Form

• 1NF PLUS every non-key attribute is functionally dependent on the ENTIRE primary key– Every non-key attribute must be defined by

the entire key, not by only part of the key

Page 116: DST Revision – Week 1

Order_ID Order_Date, Customer_ID, Customer_Name, Customer_Address

Therefore, NOT in 2nd Normal Form

Customer_ID Customer_Name, Customer_Address

Product_ID Product_Description, Product_Finish, Unit_Price

Order_ID, Product_ID Order_Quantity

Page 117: DST Revision – Week 1

Getting it into Second Normal Form

Partial Dependencies are removed, but there are still transitive dependencies

Page 118: DST Revision – Week 1

Third Normal Form

• 2NF PLUS no transitive dependencies (functional dependencies on non-primary-key attributes)

• Note: this is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third

• Solution: non-key determinant with transitive dependencies go into a new table; non-key determinant becomes primary key in the new table and stays as foreign key in the old table

Page 119: DST Revision – Week 1

Getting it into Third Normal Form

Transitive dependencies are removed

Page 120: DST Revision – Week 1

Make sure you know how to…

1. Draw Entity-Relationship diagrams from a written specification.

2. Design a database schema using a E-R diagram

3. Normalise a database schema to the Third Normal Form

Page 121: DST Revision – Week 1

Make sure you understand and can discuss(1)...

• Entities – Strong, Weak, Associative• Relationships

– First, second and third degree (unary, binary, tertiary)– One to one, one to many and many to many

relationships– Primary and foreign keys

• Attributes– Simple (atomic) and Composite– Derived– Multivalued– Identifier attributes (keys) and how to choose them

Page 122: DST Revision – Week 1

Make sure you understand and can discuss(2)...

• Integrity constraints – Domain and referential• Anomalies

– Insertion, deletion and modification• Normalisation – 1st, 2nd and 3rd forms

Page 123: DST Revision – Week 1

Next week

Revise SQL