dst revision – week 1
DESCRIPTION
DST Revision – Week 1. Entity-Relationship Modelling, Database design Normalisation. Assessed Components. Assignment – due next week Exam – Jan 21 st Two hours 30 multiple choice questions (1 mark each) Two long questions (35 marks each). ER Diagrams. - PowerPoint PPT PresentationTRANSCRIPT
DST Revision – Week 1
Entity-Relationship Modelling, Database design
Normalisation
Assessed Components
• Assignment – due next week
• Exam – Jan 21st Two hours– 30 multiple choice questions (1 mark each)– Two long questions (35 marks each)
ER Diagrams
A tool for Conceptual Data Modelling
An Entity-Relationship Diagram
What’s wrong with this?
Discovering Data Entities
• Never confuse data entities with other elements of the problem to be solved
• A true data entity will have many possible instances, each with a distinguishing characteristic
• Treasurer is the person entering data – and data about the treasurer has nothing whatsoever to do with this problem
• Is the expense report entity necessary? No - it is only the result of extracting data from the database. Even though there will be multiple instances of expense reports given to the treasurer over time, data needed to compute the report contents each time are already represented by the ACCOUNT and EXPENSE entity types
• “Gives-to” and “Receives” are business activities, not relationships between entities.
The Correct E-R Model
Attributes and Weak Entities
Strong & Weak Entities
Most entities are classified as strong entity types [Rectangle] – ones that exist independently from other entity types (such as EMPLOYEE)
These always have a unique characteristic - an attribute or combination of attributes - that uniquely distinguish each occurrence of that identity
A weak entity type [[Double Rectangle]] depends on some other entity type. It has no meaning in the ER diagram without the entity on which it depends (such as DEPENDENT)
The entity type on which the weak entity type depends is called the Identifying owner (or owner for short).
Weak entities
The Identifying relationship is the relationship between a weak entity type and and its owner (such as ‘Has’ in the previous slide)
The weak entity identifier is its partial identifier (double underline) combined with that of its owner. During a later design stage dependent name will be combined with Employee_ID (the identifier of the owner) to form a full identifier for DEPENDENT.
Attributes
• An attribute is a property or characteristic of an entity type, for example the entity EMPLOYEE may have attributes Employee_Name and Employee_Address.
• In ER diagrams (drawn in this way) place the attribute name in an ellipse with a line connecting it to its associated entity
• Attributes may also be associated with relationships• An attribute is associated with exactly one entity or
relationship
A Composite Attribute
Simple and Composite Attributes
• Some attributes can be broken down into meaningful component parts, such as Address, which can be broken down into Street_Address, Town, Postcode... etc.
• The component attributes may appear above or below the composite attribute on an ER diagram
• Provide flexibility to users, you can refer to it as a single unit or to the individual components
• A simple (atomic) attribute is one that cannot be broken down into smaller components
An Entity with a Multivalued attribute (Skill) and a derived attribute (Years_Employed)
Multivalued Attributes
• An attribute that may have more than one value for a given instance, e.g. EMPLOYEE may have more than one Skill.
• A multivalued attribute is one that may take on more than one value – it is represented by an ellipse with double lines
Derived Attributes
• Some attribute values can be calculated or derived from others e.g., if Years_Employed needs to be calculated for EMPLOYEE, it can be calculated using Date_Employed and Today's_Date
• A derived attribute is one whose value can be calculated from related attribute values (plus possibly other data not in the database)
• A derived attribute is signified by an ellipse with a dashed line (see previous Fig.)
Simple Identifier attribute (Key)
Identifier Attribute
• Identifier attribute or Key is an attribute (or combination of attributes) that uniquely identifies individual instances of an entity type, such as Student_ID
• To be a candidate identifier, each entity instance must have a single value for the attribute, and the attribute must be associated with each entity
• The identifier attribute is underlined, such as Student_ID
Composite Identifier
• A Composite Identifier is when there is no single (or atomic attribute) that can serve as an identifier
• For example, in a database that tracks flights, Flight_ID is a composite identifier that has component attributes Flight_Number and Date – this combination is required to uniquely identify individual occurrences of Flight
• Flight_ID is underlined, whilst its components are not (see next slide)
Composite key attribute
Criteria for selecting identifiers
Some entities have more than one candidate identifier, so the following criteria should be used:
Choose identifier that will not change in value over the life of each instance of the entity type
Choose identifier that is guaranteed to have valid values and Will not be null (or unknown). If composite, make sure all parts will have valid values
Selecting identifiers
Don’t pick identifiers whose structure indicates classifications, locations or people that might change. e.g. the first two digits of an identifier may indicate a warehouse location, but such codes are often changed as conditions change, rendering them invalid.
Consider substituting new, simple identifiers for long, composite ones, e.g. an attribute called Game_Number could be used for the entity type GAME instead of Home_Team and Away_Team
Relationships
• A relationship is an association among the instances of one or more entity types that is of interest to the organisation
• Relationship Type is a meaningful association between (or among) entities – implying that the relationship allows us to answer questions that could not be answered given only the entity types. It is denoted by a diamond symbol
Relationship types and instances
(a) Relationship type (Completes)
Attributes on relationships
• Attributes may be associated with a relationship, as well as with an entity
• For example an organisation may want to record the date when an employee completes each course
• In the following diagram, the relationship ‘Completes’ joins the EMPLOYEE and COURSE entities, and Date_Completed is joined to this as it is a property of the
relationship ‘Completes’
Attribute on a Relationship
Associative entities (gerunds)
• The presence of one or more attributes on a relationship suggests that the relationship should be represented as an entity type
• An associative entity is an entity type that associates the instances of one or more entity types and contains attributes that are specific to the relationship between those entity instances.
• The associative entity type CERTIFICATE is represented with the diamond relationship symbol enclosed within the entity rectangle
Associative Entities
• The following figure shows the relationship ‘Completes’ converted to an associative entity type.
• A CERTIFICATE is awarded to each EMPLOYEE who completes a COURSE, each certificate has a Certificate_Number that serves as the identifier
(b) An associative entity (CERTIFICATE)
Associative Entities (gerunds)
• The purpose of the special symbol is to preserve the information that the entity was initially specified as a relationship on the ER diagram
• There is no relationship diamond on the line between an associative entity and a strong entity, because the associative entity represents the relationship
Associative entitiesHow do you know when to convert a relationship to an
associative entity type? Four conditions should exist:
• All of the relationships are ‘many’ relationships• The resulting associative identity type has independent
meaning to end-users, and can preferably be identified with a single-attribute identifier
• The associative entity has one or more attributes in addition to the identifier
• The associative entity participates in one or more relationships independent of the entities related in the associated relationship
Degree of a Relationship
• Is the number of entity types that participate in it.• Thus ‘Completes’ has degree 2, since there are two
participating entity types, EMPLOYEE and COURSE• The three most common relationship degrees are unary
(degree 1), binary (degree 2) and ternary (degree 3)• Higher degree relationships are possible but rarely
encountered in practice
Unary relationship
• Is between the instances of a single entity type (also called recursive relationships)
• ‘Is_Married_To’ is a one-to-one relationship between instances of the PERSON entity type
• ‘Manages’ is a one-to-many relationship between instances of the EMPLOYEE entity type
Binary relationships
• Between the instances of two entity types, and is the most common type of relationship encountered in data modelling. e.g. (one-to-one) an EMPLOYEE is assigned one PARKING_PLACE, and each PARKING_PLACE is assigned to one EMPLOYEE
• e.g. (one to many) a PRODUCT_LINE may contain many PRODUCTS, and each PRODUCT belongs to only one PRODUCT_LINE
• e.g. (many-to-many) a STUDENT may register for more than one COURSE, and each COURSE may have many STUDENTS
Ternary relationships
• A ternary relationship is a simultaneous relationship among the instances of three entity types
• Let’s see this in an E-R diagram
Ternary Relationships
Ternary Relationships
• Vendors can supply various parts to warehouses• The relationship ‘Supplies’ is used to record the specific
PARTs supplied by a given VENDOR to a particular WAREHOUSE
• There are two attributes on the relationship ‘Supplies’, Shipping_Mode and Unit_Cost
• e.g. one instance of ‘Supplies might record that VENDOR X can ship PART C to WAREHOUSE Y, that the Shipping_Mode is ‘next_day_air’ and the Unit_Cost is £5-00 per unit
Ternary relationships
• We do not use diamond symbols on the lines from SUPPLY_SCHEDULE to the three entities, because these lines do not represent binary relationships
• It is recommended that all ternary (or higher) relationships are converted into associative entities (as in the slide), as it makes the representation of participation constraints (discussed later) easier
• Many CASE tools cannot represent ternary relationships, so you must represent the ternary relationship with an
associative entity and three binary relationships
Cardinality Constraints
• The number of instances of one entity that can or must be associated with each instance of another entity.
• If we have two entity types A and B, the cardinality constraint specifies the number of instances of entity B that can (or must) be associated with entity A
• e.g. a video store may stock more than one VIDEOTAPE for each MOVIE, this is a ‘one-to-many’ relationship.
Cardinality Constraints
(a) Basic relationship
(b) Relationship with cardinality constraints
Minimum Cardinality
• The minimum cardinality of a relationship is the minimum number of instances of an entity B that may be associated with each instance of an entity A
• In our example, the minimum number of VIDEOTAPES of a MOVIE is zero (entity B is an optional participant in the ‘Is_Stocked_As’ relationship)
• This is signified by the symbol zero through the arrow near the VIDEOTAPE entity.
(b) Relationship with cardinality constraints
Maximum cardinality
• Is the maximum number of instances of an entity B that may be associated with each instance of entity A
• In the following slide the maximum cardinality for the VIDEOTAPE entity type is ‘many’ (an unspecified number greater than 1)
• This is indicated by the ‘crow’s foot’ symbol on the arrow next to the VIDEOTAPE entity symbol
(b) Relationship with cardinality constraints
Mandatory one cardinality
• Relationships are bi-directional, so there is also cardinality notation next to the MOVIE entity
• Notice that as the minimum and maximum are both one, this is called mandatory one cardinality (i.e., each VIDEOTAPE of a MOVIE must be a copy of exactly one movie)
• VIDEOTAPE is represented as a weak entity because it cannot exist unless the original owner movie also exists
Mandatory one cardinality
• The identifier of the MOVIE is ‘Movie_Name’• VIDEOTAPE does not have a unique identifier,
however the partial identifier Copy_Number together with Movie_Name would uniquely identify an instance of VIDEOTAPE
Example of mandatory cardinality constraints
• Each PATIENT has one or more PATIENT_HISTORIES (the initial patient visit is always recorded as an instance of PATIENT_HISTORY)
• Each instance of PATIENT_HISTORY ‘Belongs to’ exactly one PATIENT (see following Fig.)
Mandatory cardinalities
Example of one optional, one mandatory cardinality constraint• EMPLOYEE Is_Assigned_To PROJECT• Each PROJECT has at least one EMPLOYEE assigned
to it (some projects have more than one)• Each EMPLOYEE may or (optionally) may not be
assigned to any existing PROJECT, or may be assigned to one or more PROJECTs (see following Fig.)
One optional, one mandatory cardinality
An example using a ternary relationship
• PART and WAREHOUSE are mandatory participants in the relationship, whilst VENDOR is an optional participant
• The cardinality of each of the participating entities is mandatory one, since each SUPPLY_SCHEDULE instance must be related to exactly one instance of each of these participating entity types
An example using a ternary relationship
• Each VENDOR can supply many PARTs to any number of WAREHOUSES, but need not supply any parts
• Each PART can be supplied by any number of VENDORs to more than one WAREHOUSE, but each part must be supplied by at least one vendor to a warehouse
• Each WAREHOUSE can be supplied with any number of PARTS from more than one VENDOR, but each warehouse must be supplied with at least one part
Cardinality constraints in a ternary relationship
An example using a ternary relationship
• A ternary relationship is not equivalent to three binary relationships
• Unfortunately you cannot draw ternary relationships with many CASE tools
• Instead you must represent ternary relationships as three binaries
• If you are forced to do this, then do not draw the binary relationships with diamonds and make sure the cardinality next to the three strong entities are mandatory one
Multiple Relationships
• An organisation may want to model more than one relationship between the same entity types
• The following figure shows two relationships between PROFESSOR and COURSE
• The relationship Is_Qualified associates professors with the courses they are qualified to teach
• A given course may have more than one person qualified to teach it, or (optionally) may not have any qualified instructors (such as a new course)
• Each professor should be qualified to teach at least one course (we hope!)
Multiple Relationships
• The second relationship in this figure associates professors with the courses they actually teach during a given semester (where the maximum cardinality for a given semester is 4)
• This shows how a fixed constraint (upper or lower) can be recorded
• The attribute ‘Semester’ (which could be a composite attribute with components ‘Semester_Name’ and ‘Year’) is on the relationship Is_Scheduled)
(b) Professors and courses (fixed upon constraint)
Review of Basic E-R Notation
Building Relational Databases
Database Design Steps
1) Determine the purpose of the database• What is it going to do?• What data do you need to collect?• What information do you want to be able to extract?
2) What tables and fields do you need?• What fields belong in each table?• What properties does each field need?• How are the tables going to be related?• What are the Primary Keys that link the tables
3) Build the tables• Enter test data• Test/review• Revise
Basic Design Rules of Relational Databases
• Unique Field Names– Keep field names unique across tables, and
keep them as clear as possible in each table.
• No Calculated or Derived Fields– Calculations and derivations can be
performed in Queries, Forms and Reports. Doing them in a table only increases the chance of data entry error.
Basic Design Rules of Relational Databases
• Data is broken down into Smallest Logical Parts– Smallest “Sortable” parts. Remember it’s
much easier to put fields together than pull them apart.
• Unique Records– Each of your tables should have unique
records. We ensure this by setting one field to be a Primary Key. This can be a unique datum or an AutoNumber.
One to Many Relationships
• One Birdfeeder is visited by Many Birds• One Garden contains Many Birdfeeders• One Patient has Many Prescriptions• One Hospital has Many Patients• One Student attends Many Classes
• One to Many relationships are the most common relationships.
• A record MUST be in the One table in order to appear in the Many table.
One to Many Relationships
1 ∞Medical Record #
Patients Prescription Number
Medical Record #
Medications
Primary Key linked to Foreign Key
One to One Relationships
• One Garden has One Address• One Patient has One Home Phone Number• One Student has One Student ID
One to One relationships can often be combined into a single table.
Many to Many Relationships
• Many Students are taught by Many Teachers• Many Patients see Many Doctors• Many Medications are taken by Many Patients• Many Customers buy Many Products
Many to Many relationships are also very common.
You cannot handle these using an RDBMS!
Sales Database
Many to Many Relationship
CUSTOMERS
Customer ID
First
Last
Address
City
State
Zip
PRODUCTS
Product ID
Product
Supplier
Description
Units
Cost
Price
Sales Database
CUSTOMERS
Customer ID
First
Last
Address
City
State
Zip
PRODUCTS
Product ID
Product
Supplier
Description
Units
Cost
PriceSALES
Sales ID
Customer
Product
Date
Quantity
1
One Customer can have many
sales
1
One Product can be sold many
times
ExamplesPatients
Patient ID
First
Last
Address
City
State
Zip
Medications
Med ID
Medication
Description
Patient Meds
PM ID
Patient ID
Med ID
Dosage
Directions
1
One Patient can take many
Medications
1
One Kind of Medication can
be taken by Many Patients
Transforming E-R diagrams into Relational Databases
Relational Database
• A database modelled using:– Relations (properly formed tables)– Relationships between the Relations
Relation• Definition: A relation is a named, two-dimensional table of
data • Table consists of rows (records), and columns (attribute or
field)• Requirements for a table to qualify as a relation:
– It must have a unique name.– Every attribute value must be atomic (not multivalued, not composite)– Every row must be unique (can’t have two rows with exactly the
same values for all their fields)– Attributes (columns) in tables must have unique names– The order of the columns must be irrelevant– The order of the rows must be irrelevant
NOTE: all relations are in 1st Normal form
Correspondence with E-R Model
• Relations (tables) correspond with entity types and with many-to-many relationship types
• Rows correspond with entity instances and with many-to-many relationship instances
• Columns correspond with attributes
Key Fields
• Keys are special fields that serve two main purposes:– Primary keys are unique identifiers of the relation in question.
Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique
– Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship)
• Keys can be simple (a single field) or composite (more than one field)
• Keys usually are used as indexes to speed up the response to user queries
Primary Key
Foreign Key (implements 1:N relationship between customer and order)
Combined, these are a composite primary key (uniquely identifies the order line)…individually they are foreign keys (implement M:N relationship between order and product)
Constraints
• Reduce the chance that users will enter incorrect data
• Domain Constraints– The allowable values for an attribute (types,
lengths etc). – Assist with the integrity of the entity– No primary key attribute may be null. All
primary key fields MUST have data
Domain definitions enforce domain integrity constraints
Integrity Constraints
• Referential Integrity – rule that states that any foreign key value (on the relation of the many side) MUST match a primary key value in the relation of the one side. (Or the foreign key can be null) – For example: Delete Rules
• Restrict – don’t allow delete of “parent” side if related rows exist in “dependent” side
• Cascade – automatically delete “dependent” side rows that correspond with the “parent” side row to be deleted
• Set-to-Null – set the foreign key in the dependent side to null if deleting from the parent side not allowed for weak entities
Figure 5-5: Referential integrity constraints (Pine Valley Furniture)
Referential integrity
constraints are drawn via arrows from dependent to
parent table
Referential integrity
constraints are implemented with
foreign key to primary key references
Transforming E-R Diagrams into Relations
Mapping Regular Entities to Relations
1. Simple attributes: E-R attributes map directly onto the relation
2. Composite attributes: Use only their simple, component attributes
3. Multivalued Attribute - Becomes a separate relation with a foreign key taken from the superior entity
(a) CUSTOMER entity type with simple attributes
Mapping a regular entity
(b) CUSTOMER relation
(a) CUSTOMER entity type with composite attribute
Figure 5-9: Mapping a composite attribute
(b) CUSTOMER relation with address detail
Figure 5-10: Mapping a multivalued attribute
1–to–many relationship between original entity and new relation
(a)
Multivalued attribute becomes a separate relation (Table) with foreign key
(b)
Transforming ER Diagrams into Relations (cont.)
Mapping Binary Relationships– One-to-Many - Primary key on the one side
becomes a foreign key on the many side– Many-to-Many - Create a new relation with
the primary keys of the two entities as its primary key
– One-to-One - Primary key on the mandatory side becomes a foreign key on the optional side
Figure 5-12a: Example of mapping a 1:M relationshipRelationship between customers and orders
Note the mandatory one
Figure 5-12b Mapping the relationship
Again, no null value in the foreign key…this is because of the mandatory minimum cardinality
Foreign key
Figure 5-13a: Example of mapping an M:N relationshipE-R diagram (M:N)
The Supplies relationship will need to become a separate relation
Figure 5-13b Three resulting relations
New intersection
relationForeign key
Foreign key
Composite primary key
Figure 5-14a: Mapping a binary 1:1 relationshipIn_charge relationship
Figure 5-14b Resulting relations
Transforming ER Diagrams into Relations (cont.)
Mapping Associative Entities– Identifier Not Assigned
• Default primary key for the association relation is composed of the primary keys of the two entities (as in M:N relationship)
– Identifier Assigned • It is natural and familiar to end-users• Default identifier may not be unique
Figure 5-16a: Mapping an associative entity with an identifierAssociative entity
Figure 5-16b Three resulting relations
Transforming ER Diagrams into Relations (cont.)
Mapping Unary Relationships– One-to-Many - Recursive foreign key in the
same relation– Many-to-Many - Two relations:
• One for the entity type• One for an associative relation in which the
primary key has two attributes, both taken from the primary key of the entity
Figure 5-17: Mapping a unary 1:N relationship
(a) EMPLOYEE entity with Manages relationship
(b) EMPLOYEE relation with recursive foreign key
Figure 5-18: Mapping a unary M:N relationship
(a) Bill-of-materials relationships (M:N)
(b) ITEM and COMPONENT relations
Normalisation
Data Normalization
• Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data
• The process of decomposing relations with anomalies to produce smaller, well-structured relations
Well-Structured Relations
• A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies
• Goal is to avoid anomalies– Insertion Anomaly – adding new rows forces user to create
duplicate data– Deletion Anomaly – deleting rows may cause a loss of data
that would be needed for other future rows– Modification Anomaly – changing data in a row forces
changes to other rows because of duplication
A table should not contain more than one entity type
Example – Figure 5.2b
Question – Is this a relation? Answer – Yes: unique rows and no multivalued attributes
Question – What’s the primary key? Answer – Composite: Emp_ID, Course_Title
Anomalies in this Table• Insertion – can’t enter a new employee without
having the employee take a course• Deletion – if we remove employee 140, we lose
information about the existence of a Tax Acc class• Modification – giving a salary increase to employee
100 forces us to update multiple records
Why do these anomalies exist? Because there are two entity types into a single relation. This results in duplication, and an unnecessary dependency between the entities
Functional Dependencies and KeysCandidate Key:
– A unique identifier. One of the candidate keys will become the primary key
• E.g. perhaps there is both credit card number and NI number in a table…in this case both are candidate keys
– Each non-key field should be functionally dependent on the primary key
Steps in normalization
First Normal Form
• No multivalued attributes• Every attribute value is atomic• The next slide is not in 1st Normal Form
(multivalued attributes) therefore it is not a relation
• All relations are in 1st Normal Form
Table with multivalued attributes, not in 1st normal form
NOT a relation – just a table!
Table with no multivalued attributes and unique rows, in 1st normal form
This is a relation, but not a well-structured one
Anomalies in this Table• Insertion – if new product is ordered for order 1007
of existing customer, customer data must be re-entered, causing duplication
• Deletion – if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price
• Update – changing the price of product ID 4 requires update in several records
Why do these anomalies exist? Because there are multiple entity types into a single relation. This results in duplication, and an unnecessary dependency between the entities
Second Normal Form
• 1NF PLUS every non-key attribute is functionally dependent on the ENTIRE primary key– Every non-key attribute must be defined by
the entire key, not by only part of the key
Order_ID Order_Date, Customer_ID, Customer_Name, Customer_Address
Therefore, NOT in 2nd Normal Form
Customer_ID Customer_Name, Customer_Address
Product_ID Product_Description, Product_Finish, Unit_Price
Order_ID, Product_ID Order_Quantity
Getting it into Second Normal Form
Partial Dependencies are removed, but there are still transitive dependencies
Third Normal Form
• 2NF PLUS no transitive dependencies (functional dependencies on non-primary-key attributes)
• Note: this is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third
• Solution: non-key determinant with transitive dependencies go into a new table; non-key determinant becomes primary key in the new table and stays as foreign key in the old table
Getting it into Third Normal Form
Transitive dependencies are removed
Make sure you know how to…
1. Draw Entity-Relationship diagrams from a written specification.
2. Design a database schema using a E-R diagram
3. Normalise a database schema to the Third Normal Form
Make sure you understand and can discuss(1)...
• Entities – Strong, Weak, Associative• Relationships
– First, second and third degree (unary, binary, tertiary)– One to one, one to many and many to many
relationships– Primary and foreign keys
• Attributes– Simple (atomic) and Composite– Derived– Multivalued– Identifier attributes (keys) and how to choose them
Make sure you understand and can discuss(2)...
• Integrity constraints – Domain and referential• Anomalies
– Insertion, deletion and modification• Normalisation – 1st, 2nd and 3rd forms
Next week
Revise SQL