fivedots.coe.psu.ac.thfivedots.coe.psu.ac.th/~suthon/database/booklet.pdf · chapter 1 introduction...
TRANSCRIPT
Chapter 1
Introduction to Database system
Chapter 1 - Objectives
Some common uses of database systems. Characteristics of file-based systems. Problems with file-based approach. Meaning of the term database. Meaning of the term Database Management
System (DBMS).
2 Original Slides by T. Connolly
Chapter 1 - Objectives
Typical functions of a DBMS. Major components of the DBMS environment. Personnel involved in the DBMS environment. History of the development of DBMSs. Advantages and disadvantages of DBMSs.
3 Original Slides by T. Connolly
Chapter 1 - Objectives
Purpose of three-level database architecture. Contents of external, conceptual, and internal levels. Purpose of external/conceptual and
conceptual/internal mappings. Meaning of logical and physical data independence. Distinction between DDL and DML. A classification of data models.
4 Original Slides by T. Connolly
Chapter 2 - Objectives
Purpose/importance of conceptual modeling. Typical functions and services a DBMS should
provide. Function and importance of system catalog. Software components of a DBMS. Function and uses of Transaction Processing
Monitors.
5 Original Slides by T. Connolly
Examples of Database Applications
Purchases from the supermarket Purchases using your credit card Booking a holiday at the travel agents Using the local library Taking out insurance Renting a video Using the Internet Studying at university
6 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 1
File-Based Systems
Collection of application programs that perform services for the end users (e.g. reports).
Each program defines and manages its own data.
7 Original Slides by T. Connolly
File-Based Processing
8 Original Slides by T. Connolly
Limitations of File-Based Approach
Separation and isolation of data Each program maintains its own set of data. Users of one program may be unaware of
potentially useful data held by other programs.
Duplication of data Same data is held by different programs. Wasted space and potentially different values
and/or different formats for the same item.
9 Original Slides by T. Connolly
Limitations of File-Based Approach
Data dependence File structure is defined in the program code.
Incompatible file formats Programs are written in different languages, and so
cannot easily access each other’s files.
Fixed Queries/Proliferation of application programs Programs are written to satisfy particular functions. Any new requirement needs a new program.
10 Original Slides by T. Connolly
Database Approach
Arose because: Definition of data was embedded in application
programs, rather than being stored separately and independently.
No control over access and manipulation of data beyond that imposed by application programs.
Result: the database and Database Management System
(DBMS).
11 Original Slides by T. Connolly
Database
Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization.
System catalog (metadata) provides description of data to enable program–data independence.
Logically related data comprises entities, attributes, and relationships of an organization’s information.
12 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 2
Database Management System (DBMS)
A software system that enables users to define, create, maintain, and control access to the database.
(Database) application program: a computer program that interacts with database by issuing an appropriate request (SQL statement) to the DBMS.
13 Original Slides by T. Connolly
Database Management System (DBMS)
14 Original Slides by T. Connolly
Database Approach
Data definition language (DDL). Permits specification of data types, structures and
any data constraints. All specifications are stored in the database.
Data manipulation language (DML). General enquiry facility (query language) of the data.
15 Original Slides by T. Connolly
Database Approach
Controlled access to database may include: a security system an integrity system a concurrency control system a recovery control system a user-accessible catalog.
16 Original Slides by T. Connolly
Views
Allows each user to have his or her own view of the database.
A view is essentially some subset of the database.
17 Original Slides by T. Connolly
Views - Benefits
Reduce complexity Provide a level of security Provide a mechanism to customize the appearance
of the database Present a consistent, unchanging picture of the
structure of the database, even if the underlying database is changed
18 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 3
Components of DBMS Environment
19 Original Slides by T. Connolly
Components of DBMS Environment
Hardware Can range from a PC to a network of computers.
Software DBMS, operating system, network software (if
necessary) and also the application programs. Data Used by the organization and a description of this data
called the schema.
20 Original Slides by T. Connolly
Components of DBMS Environment
Procedures Instructions and rules that should be applied to
the design and use of the database and DBMS. People
21 Original Slides by T. Connolly
Roles in the Database Environment
Data Administrator (DA) Database Administrator (DBA) Database Designers (Logical and Physical) Application Programmers End Users (naive and sophisticated)
22 Original Slides by T. Connolly
History of Database Systems
First-generation Hierarchical and Network
Second generation Relational
Third generation Object-Relational Object-Oriented
23 Original Slides by T. Connolly
Advantages of DBMSs
Control of data redundancy Data consistency More information from the same amount of data Sharing of data Improved data integrity Improved security Enforcement of standards Economy of scale
24 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 4
Advantages of DBMSs
Balance conflicting requirements Improved data accessibility and responsiveness Increased productivity Improved maintenance through data independence Increased concurrency Improved backup and recovery services
25 Original Slides by T. Connolly
Disadvantages of DBMSs
Complexity Size Cost of DBMS Additional hardware costs Cost of conversion Performance Higher impact of a failure
26 Original Slides by T. Connolly
Objectives of Three-Level Architecture
All users should be able to access same data.
A user’s view is immune to changes made in other views.
Users should not need to know physical database storage details.
27 Original Slides by T. Connolly
Objectives of Three-Level Architecture
DBA should be able to change database storage structures without affecting the users’ views.
Internal structure of database should be unaffected by changes to physical aspects of storage.
DBA should be able to change conceptual structure of database without affecting all users.
28 Original Slides by T. Connolly
ANSI-SPARC Three-Level Architecture
29 Original Slides by T. Connolly
ANSI-SPARC Three-Level Architecture
External Level Users’ view of the database. Describes that part of database that is relevant to a
particular user.
Conceptual Level Community view of the database. Describes what data is stored in database and
relationships among the data.
30 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 5
ANSI-SPARC Three-Level Architecture
Internal Level Physical representation of the database on the
computer. Describes how the data is stored in the database.
31 Original Slides by T. Connolly
Differences between Three Levels of ANSI-SPARC Architecture
32 Original Slides by T. Connolly
Data Independence
Logical Data Independence Refers to immunity of external schemas to changes in
conceptual schema. Conceptual schema changes (e.g. addition/removal of
entities). Should not require changes to external schema or
rewrites of application programs.
33 Original Slides by T. Connolly
Data Independence
Physical Data Independence Refers to immunity of conceptual schema to changes in
the internal schema. Internal schema changes (e.g. using different file
organizations, storage structures/devices). Should not require change to conceptual or external
schemas.
34 Original Slides by T. Connolly
Data Independence and the ANSI-SPARC Three-Level Architecture
35 Original Slides by T. Connolly
Database Languages
Data Definition Language (DDL) Allows the DBA or user to describe and name entities,
attributes, and relationships required for the application
plus any associated integrity and security constraints.
36 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 6
Database Languages
Data Manipulation Language (DML) Provides basic data manipulation operations on data
held in the database. Procedural DML allows user to tell system exactly how to manipulate
data. Non-Procedural DML allows user to state what data is needed rather than how
it is to be retrieved. Fourth Generation Languages (4GLs)
37 Original Slides by T. Connolly
Data Model
Integrated collection of concepts for describing data, relationships between data, and constraints on the data in an organization.
Data Model comprises: a structural part; a manipulative part; possibly a set of integrity rules.
38 Original Slides by T. Connolly
Data Model
Purpose To represent data in an understandable way.
Categories of data models include: Object-based Record-based Physical.
39 Original Slides by T. Connolly
Data Models
Object-Based Data Models Entity-Relationship Semantic Functional Object-Oriented.
Record-Based Data Models Relational Data Model Network Data Model Hierarchical Data Model.
Physical Data Models
40 Original Slides by T. Connolly
Relational Data Model
41 Original Slides by T. Connolly
Network Data Model
42 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 7
Hierarchical Data Model
43 Original Slides by T. Connolly
Conceptual Modeling
Conceptual schema is the core of a system supporting all user views.
Should be complete and accurate representation of an organization’s data requirements.
Conceptual modeling is process of developing a model of information use that is independent of implementation details.
Result is a conceptual data model.
44 Original Slides by T. Connolly
Functions of a DBMS
Data Storage, Retrieval, and Update.
A User-Accessible Catalog.
Transaction Support.
Concurrency Control Services.
Recovery Services.
45 Original Slides by T. Connolly
Functions of a DBMS
Authorization Services.
Support for Data Communication.
Integrity Services.
Services to Promote Data Independence.
Utility Services.
46 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 8
Chapter 2
Entity-Relationship Model
Chapter 2 - Objectives Entity Sets Relationship Sets Design Issues Mapping Constraints Keys E-R Diagram Extended E-R Features Design of an E-R Database Schema Reduction of an E-R Schema to Tables
2 Original Slides by Avi Silberschatz
Entity Sets A database can be modeled as: a collection of entities, relationship among entities.
An entity is an object that exists and is distinguishable from other objects. Example: specific person, company, event, plant
Entities have attributes Example: people have names and addresses
An entity set is a set of entities of the same type that share the same properties. Example: set of all persons, companies, trees, holidays
3 Original Slides by Avi Silberschatz
Entity Sets customer and loancustomer-id customer- customer- customer- loan- amount
name street city number
4 Original Slides by Avi Silberschatz
Attributes An entity is represented by a set of attributes, that is
descriptive properties possessed by all members of an entity set.
Domain – the set of permitted values for each attribute Attribute types: Simple and composite attributes. Single-valued and multi-valued attributes
E.g. multivalued attribute: phone-numbers Derived attributes
Can be computed from other attributes E.g. age, given date of birth
Example: customer = (customer-id, customer-name,
customer-street, customer-city)loan = (loan-number, amount)
5 Original Slides by Avi Silberschatz
Composite Attributes
6 Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 9
Relationship Sets
A relationship is an association among several entities Example:
Hayes depositor A-102customer entity relationship set account entity
A relationship set is a mathematical relation among n 2 entities, each taken from entity sets
{(e1, e2, … en) | e1 E1, e2 E2, …, en En}
where (e1, e2, …, en) is a relationship Example:
(Hayes, A-102) depositor
7 Original Slides by Avi Silberschatz
Relationship Set borrower
8 Original Slides by Avi Silberschatz
Relationship Sets (Cont.)
9
An attribute can also be property of a relationship set. For instance, the depositor relationship set between entity sets
customer and account may have the attribute access-date
Original Slides by Avi Silberschatz
Degree of a Relationship Set Refers to number of entity sets that participate in a
relationship set. Relationship sets that involve two entity sets are binary (or
degree two). Generally, most relationship sets in a database system are binary.
Relationship sets may involve more than two entity sets.
Relationships between more than two entity sets are rare. Most relationships are binary. (More on this later.)
E.g. Suppose employees of a bank may have jobs (responsibilities) at multiple branches, with different jobs at different branches. Then there is a ternary relationship set between entity sets employee, job and branch
10 Original Slides by Avi Silberschatz
Mapping Cardinalities
11
Express the number of entities to which another entity can be associated via a relationship set.
Most useful in describing binary relationship sets. For a binary relationship set the mapping cardinality
must be one of the following types: One to one One to many Many to one Many to many
Original Slides by Avi Silberschatz
Mapping Cardinalities
12
One to one One to manyNote: Some elements in A and B may not be mapped to any elements in the other set
Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 10
Mapping Cardinalities
Many to one Many to manyNote: Some elements in A and B may not be mapped to any elements in the other set
13 Original Slides by Avi Silberschatz
Mapping Cardinalities affect ER Design
14
Can make access-date an attribute of account, instead of a relationship attribute, if each account can have only one customer I.e., the relationship from account to customer is many to one, or
equivalently, customer to account is one to many
Original Slides by Avi Silberschatz
E-R Diagrams
15
Rectangles represent entity sets. Diamonds represent relationship sets. Lines link attributes to entity sets and entity sets to relationship sets. Ellipses represent attributes
Double ellipses represent multivalued attributes. Dashed ellipses denote derived attributes.
Underline indicates primary key attributes (will study later)
Original Slides by Avi Silberschatz
E-R Diagram With Composite, Multivalued, and Derived Attributes
16 Original Slides by Avi Silberschatz
Relationship Sets with Attributes
17 Original Slides by Avi Silberschatz
Roles
18
Entity sets of a relationship need not be distinct The labels “manager” and “worker” are called roles; they specify how
employee entities interact via the works-for relationship set. Roles are indicated in E-R diagrams by labeling the lines that connect
diamonds to rectangles. Role labels are optional, and are used to clarify semantics of the relationship
Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 11
Cardinality Constraints
19
We express cardinality constraints by drawing either a directed line (), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set.
E.g.: One-to-one relationship: A customer is associated with at most one loan via the
relationship borrower A loan is associated with at most one customer via borrower
Original Slides by Avi Silberschatz
One-To-Many Relationship
20
In the one-to-many relationship a loan is associated with at most one customer via borrower, a customer is associated with several (including 0) loans via borrower
Original Slides by Avi Silberschatz
Many-To-One Relationships
In a many-to-one relationship a loan is associated with several (including 0) customers via borrower, a customer is associated with at most one loan via borrower
21 Original Slides by Avi Silberschatz
Many-To-Many Relationship
A customer is associated with several (possibly 0) loans via borrower
A loan is associated with several (possibly 0) customers via borrower
22 Original Slides by Avi Silberschatz
Participation of an Entity Set in a Relationship Set
23
Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set E.g. participation of loan in borrower is total every loan must have a customer associated to it via borrower
Partial participation: some entities may not participate in any relationship in the relationship set E.g. participation of customer in borrower is partial
Original Slides by Avi Silberschatz
Alternative Notation for Cardinality Limits
Cardinality limits can also express participation constraints
24 Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 12
Keys
25
A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity.
A candidate key of an entity set is a minimal super key Customer-id is candidate key of customer account-number is candidate key of account
Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.
Original Slides by Avi Silberschatz
Keys for Relationship Sets The combination of primary keys of the participating
entity sets forms a super key of a relationship set. (customer-id, account-number) is the super key of depositor NOTE: this means a pair of entity sets can have at most one
relationship in a particular relationship set. E.g. if we wish to track all access-dates to each account by each
customer, we cannot assume a relationship for each access. We can use a multivalued attribute though
Must consider the mapping cardinality of the relationship set when deciding the what are the candidate keys
Need to consider semantics of relationship set in selecting the primary key in case of more than one candidate key
26 Original Slides by Avi Silberschatz
E-R Diagram with a Ternary Relationship
27 Original Slides by Avi Silberschatz
Cardinality Constraints on Ternary Relationship
We allow at most one arrow out of a ternary (or greater degree) relationship to indicate a cardinality constraint
E.g. an arrow from works-on to job indicates each employee works on at most one job at any branch.
If there is more than one arrow, there are two ways of defining the meaning. E.g a ternary relationship R between A, B and C with arrows to
B and C could mean 1. each A entity is associated with a unique entity from B and C
or 2. each pair of entities from (A, B) is associated with a unique
C entity, and each pair (A, C) is associated with a unique B Each alternative has been used in different formalisms To avoid confusion we outlaw more than one arrow
28 Original Slides by Avi Silberschatz
Binary Vs. Non-Binary Relationships
29
Some relationships that appear to be non-binary may be better represented using binary relationships E.g. A ternary relationship parents, relating a child to
his/her father and mother, is best replaced by two binary relationships, father and mother Using two binary relationships allows partial information (e.g.
only mother being know) But there are some relationships that are naturally non-
binary E.g. works-on
Original Slides by Avi Silberschatz
Converting Non-Binary Relationships to Binary Form
30
In general, any non-binary relationship can be represented using binary relationships by creating an artificial entity set. Replace R between entity sets A, B and C by an entity set E, and three relationship
sets: 1. RA, relating E and A 2.RB, relating E and B3. RC, relating E and C
Create a special identifying attribute for E Add any attributes of R to E For each relationship (ai , bi , ci) in R, create
1. a new entity ei in the entity set E 2. add (ei , ai ) to RA
3. add (ei , bi ) to RB 4. add (ei , ci ) to RC
Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 13
Converting Non-Binary Relationships (Cont.)
31
Also need to translate constraints Translating all constraints may not be possible There may be instances in the translated schema that
cannot correspond to any instance of R Exercise: add constraints to the relationships RA, RB and RC to
ensure that a newly created entity corresponds to exactly one entity in each of entity sets A, B and C
We can avoid creating an identifying attribute by making E a weak entity set (described shortly) identified by the three relationship sets
Original Slides by Avi Silberschatz
Design Issues
32
Use of entity sets vs. attributesChoice mainly depends on the structure of the enterprise being modeled, and on the semantics associated with the attribute in question.
Use of entity sets vs. relationship setsPossible guideline is to designate a relationship set to describe an action that occurs between entities
Binary versus n-ary relationship setsAlthough it is possible to replace any nonbinary (n-ary, for n > 2) relationship set by a number of distinct binary relationship sets, a n-ary relationship set shows more clearly that several entities participate in a single relationship.
Placement of relationship attributes
Original Slides by Avi Silberschatz
Weak Entity Sets An entity set that does not have a primary key is referred
to as a weak entity set. The existence of a weak entity set depends on the
existence of a identifying entity set it must relate to the identifying entity set via a total, one-to-
many relationship set from the identifying to the weak entity set Identifying relationship depicted using a double diamond
The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set.
The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator.
33 Original Slides by Avi Silberschatz
Weak Entity Sets (Cont.)
34
We depict a weak entity set by double rectangles. We underline the discriminator of a weak entity set with
a dashed line. payment-number – discriminator of the payment entity
set Primary key for payment – (loan-number, payment-
number)
Original Slides by Avi Silberschatz
Weak Entity Sets (Cont.)
35
Note: the primary key of the strong entity set is not explicitly stored with the weak entity set, since it is implicit in the identifying relationship.
If loan-number were explicitly stored, payment could be made a strong entity, but then the relationship between payment and loan would be duplicated by an implicit relationship defined by the attribute loan-number common to payment and loan
Original Slides by Avi Silberschatz
More Weak Entity Set Examples
36
In a university, a course is a strong entity and a course-offering can be modeled as a weak entity
The discriminator of course-offering would be semester (including year) and section-number (if there is more than one section)
If we model course-offering as a strong entity we would model course-number as an attribute. Then the relationship with course would be implicit in the course-number attribute
Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 14
Specialization
37
Top-down design process; we designate subgroupings within an entity set that are distinctive from other entities in the set.
These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set.
Depicted by a triangle component labeled ISA (E.g. customer “is a” person).
Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked.
Original Slides by Avi Silberschatz 38
Specialization Example
Original Slides by Avi Silberschatz
Generalization
39
A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set.
Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.
The terms specialization and generalization are used interchangeably.
Original Slides by Avi Silberschatz
Specialization and Generalization (Contd.)
40
Can have multiple specializations of an entity set based on different features.
E.g. permanent-employee vs. temporary-employee, in addition to officer vs. secretary vs. teller
Each particular employee would be a member of one of permanent-employee or temporary-
employee, and also a member of one of officer, secretary, or teller
The ISA relationship also referred to as superclass -subclass relationship
Original Slides by Avi Silberschatz
Design Constraints on a Specialization/Generalization
41
Constraint on which entities can be members of a given lower-level entity set. condition-defined
E.g. all customers over 65 years are members of senior-citizen entity set; senior-citizen ISA person.
user-defined Constraint on whether or not entities may belong to
more than one lower-level entity set within a single generalization. Disjoint
an entity can belong to only one lower-level entity set Noted in E-R diagram by writing disjoint next to the ISA triangle
Overlapping an entity can belong to more than one lower-level entity set
Original Slides by Avi Silberschatz
Design Constraints on a Specialization/Generalization (Contd.)
42
Completeness constraint -- specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization. total : an entity must belong to one of the lower-level
entity sets partial: an entity need not belong to one of the lower-level
entity sets
Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 15
Aggregation
43
Consider the ternary relationship works-on, which we saw earlier
Suppose we want to record managers for tasks performed by an employee at a branch
Original Slides by Avi Silberschatz
Aggregation (Cont.)
44
Relationship sets works-on and manages represent overlapping information Every manages relationship corresponds to a works-on
relationship However, some works-on relationships may not correspond to any
manages relationships So we can’t discard the works-on relationship
Eliminate this redundancy via aggregation Treat relationship as an abstract entity Allows relationships between relationships Abstraction of relationship into new entity
Without introducing redundancy, the following diagram represents: An employee works on a particular job at a particular branch An employee, branch, job combination may have an associated
manager
Original Slides by Avi Silberschatz
E-R Diagram With Aggregation
45 Original Slides by Avi Silberschatz
E-R Design Decisions
46
The use of an attribute or entity set to represent an object.
Whether a real-world concept is best expressed by an entity set or a relationship set.
The use of a ternary relationship versus a pair of binary relationships.
The use of a strong or weak entity set. The use of specialization/generalization –
contributes to modularity in the design. The use of aggregation – can treat the aggregate
entity set as a single unit without concern for the details of its internal structure.
Original Slides by Avi Silberschatz
47
E-R Diagram for a Banking Enterprise
Original Slides by Avi Silberschatz
Summary of Symbols Used in E-R Notation
48 Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 16
Summary of Symbols (Cont.)
49 Original Slides by Avi Silberschatz
Alternative E-R Notations
50 Original Slides by Avi Silberschatz
Reduction of an E-R Schema to Tables
51
Primary keys allow entity sets and relationship sets to be expressed uniformly as tables which represent the contents of the database.
A database which conforms to an E-R diagram can be represented by a collection of tables.
For each entity set and relationship set there is a unique table which is assigned the name of the corresponding entity set or relationship set.
Each table has a number of columns (generally corresponding to attributes), which have unique names.
Converting an E-R diagram to a table format is the basis for deriving a relational database design from an E-R diagram.
Original Slides by Avi Silberschatz
Representing Entity Sets as Tables
52
A strong entity set reduces to a table with the same attributes.
Original Slides by Avi Silberschatz
Composite and Multivalued Attributes
53
Composite attributes are flattened out by creating a separate attribute for each component attribute E.g. given entity set customer with composite attribute name
with component attributes first-name and last-name the table corresponding to the entity set has two attributes
name.first-name and name.last-name A multivalued attribute M of an entity E is represented by
a separate table EM Table EM has attributes corresponding to the primary key of E
and an attribute corresponding to multivalued attribute M E.g. Multivalued attribute dependent-names of employee is
represented by a tableemployee-dependent-names( employee-id, dname)
Each value of the multivalued attribute maps to a separate row of the table EM E.g., an employee entity with primary key John and
dependents Johnson and Johndotir maps to two rows: (John, Johnson) and (John, Johndotir)
Original Slides by Avi Silberschatz
Representing Weak Entity Sets
54
A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set
Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 17
Representing Relationship Sets as Tables
55
A many-to-many relationship set is represented as a table with columns for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set.
E.g.: table for relationship set borrower
Original Slides by Avi Silberschatz
Redundancy of Tables
56
Many-to-one and one-to-many relationship sets that are total on the many-side can be represented by adding an extra attribute to the many side, containing the primary key of the one side
E.g.: Instead of creating a table for relationship account-branch, add an attribute branch to the entity set account
Original Slides by Avi Silberschatz
Redundancy of Tables (Cont.)
57
For one-to-one relationship sets, either side can be chosen to act as the “many” side That is, extra attribute can be added to either of the tables
corresponding to the two entity sets If participation is partial on the many side, replacing a
table by an extra attribute in the relation corresponding to the “many” side could result in null values
The table corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant. E.g. The payment table already contains the information that
would appear in the loan-payment table (i.e., the columns loan-number and payment-number).
Original Slides by Avi Silberschatz
Representing Specialization as Tables
58
Method 1: Form a table for the higher level entity Form a table for each lower level entity set, include primary key of
higher level entity set and local attributes
table table attributesperson name, street, city customer name, credit-ratingemployee name, salary
Drawback: getting information about, e.g., employee requires accessing two tables
Original Slides by Avi Silberschatz
Representing Specialization as Tables (Cont.)
59
Method 2: Form a table for each entity set with all local and inherited
attributestable table attributes
person name, street, citycustomer name, street, city, credit-ratingemployee name, street, city, salary
If specialization is total, table for generalized entity (person) not required to store information Can be defined as a “view” relation containing union of
specialization tables But explicit table may still be needed for foreign key constraints
Drawback: street and city may be stored redundantly for persons who are both customers and employees
Original Slides by Avi Silberschatz
Relations Corresponding to Aggregation
60
To represent aggregation, create a table containing primary key of the aggregated relationship, the primary key of the associated entity set Any descriptive attributes
Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 18
Relations Corresponding to Aggregation (Cont.)
61
E.g. to represent aggregation manages between relationship works-on and entity set manager, create a tablemanages(employee-id, branch-name, title, manager-name)
Table works-on is redundant provided we are willing to store null values for attribute manager-name in table manages
Original Slides by Avi Silberschatz
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 19
Chapter 3
The Relational database design
Chapter 3 - Objectives
2
Terminology of relational model.
How tables are used to represent data.
Connection between mathematical relations and relations in the relational model.
Properties of database relations.
How to identify CK, PK, and FKs.
Meaning of entity integrity and referential integrity.
Original Slides by T. Connolly
Chapter 3 - Objectives
3
The purpose of normalization.
How normalization can be used when designing a relational database.
The potential problems associated with redundant data in base relations.
The concept of functional dependency, which describes the relationship between attributes.
The characteristics of functional dependencies used in normalization.
Original Slides by T. Connolly
Chapter 3 - Objectives
4
How to identify functional dependencies for a given
relation.
How functional dependencies identify the primary key
for a relation.
How to undertake the process of normalization.
How normalization uses functional dependencies to
group attributes into relations that are in a known
normal form.
Original Slides by T. Connolly
Chapter 3 - Objectives
5
How to identify the most commonly used normal
forms, namely First Normal Form (1NF), Second
Normal Form (2NF), and Third Normal Form (3NF).
The problems associated with relations that break the
rules of 1NF, 2NF, or 3NF.
How to represent attributes shown on a form as 3NF
relations using normalization.
Original Slides by T. Connolly
Relational Model Terminology
6
A relation is a table with columns and rows. Only applies to logical structure of the
database, not the physical structure.
Attribute is a named column of a relation.
Domain is the set of allowable values for one or more attributes.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 20
Relational Model Terminology
7
Tuple is a row of a relation.
Degree is the number of attributes in a relation.
Cardinality is the number of tuples in a relation.
Relational Database is a collection of normalized relations with distinct relation names.
Original Slides by T. Connolly
Instances of Branch and Staff Relations
8 Original Slides by T. Connolly
Examples of Attribute Domains
9 Original Slides by T. Connolly
Alternative Terminology for Relational Model
10 Original Slides by T. Connolly
Mathematical Definition of Relation
11
Consider two sets, D1 & D2, where D1 = {2, 4} and D2 = {1, 3, 5}.
Cartesian product, D1 D2, is set of all ordered pairs, where first element is member of D1 and second element is member of D2.
D1 D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)}
Alternative way is to find all combinations of elements with first from D1 and second from D2.
Original Slides by T. Connolly
Mathematical Definition of Relation
12
Any subset of Cartesian product is a relation; e.g.R = {(2, 1), (4, 1)}
May specify which pairs are in relation using some condition for selection; e.g. second element is 1:
R = {(x, y) | x D1, y D2, and y = 1} first element is always twice the second:
S = {(x, y) | x D1, y D2, and x = 2y}
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 21
Mathematical Definition of Relation
13
Consider three sets D1, D2, D3 with Cartesian Product D1 D2 D3; e.g.
D1 = {1, 3} D2 = {2, 4} D3 = {5, 6}D1 D2 D3 = {(1,2,5), (1,2,6), (1,4,5), (1,4,6), (3,2,5), (3,2,6), (3,4,5), (3,4,6)}
Any subset of these ordered triples is a relation.
Original Slides by T. Connolly
Mathematical Definition of Relation
14
Cartesian product of n sets (D1, D2, . . ., Dn) is:
D1 D2 . . . Dn = {(d1, d2, . . . , dn) | d1 D1, d2 D2, . . . , dnDn}
usually written as: n
XDii = 1
Any set of n-tuples from this Cartesian product is a relation on the n sets.
Original Slides by T. Connolly
Database Relations
15
Relation schema Named relation defined by a set of attribute and
domain name pairs.
Relational database schema Set of relation schemas, each with a distinct name.
Original Slides by T. Connolly
Properties of Relations
16
Relation name is distinct from all other relation names in relational schema.
Each cell of relation contains exactly one atomic (single) value.
Each attribute has a distinct name.
Values of an attribute are all from the same domain.
Original Slides by T. Connolly
Properties of Relations
17
Each tuple is distinct; there are no duplicate tuples.
Order of attributes has no significance.
Order of tuples has no significance, theoretically.
Original Slides by T. Connolly
Relational Keys
18
Superkey An attribute, or set of attributes, that uniquely
identifies a tuple within a relation.
Candidate Key Superkey (K) such that no proper subset is a
superkey within the relation. In each tuple of R, values of K uniquely identify that
tuple (uniqueness). No proper subset of K has the uniqueness property
(irreducibility).
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 22
Relational Keys
19
Primary Key Candidate key selected to identify tuples
uniquely within relation.
Alternate Keys Candidate keys that are not selected to be
primary key.
Foreign Key Attribute, or set of attributes, within one relation
that matches candidate key of some (possibly same) relation.
Original Slides by T. Connolly
Integrity Constraints
20
Null Represents value for an attribute that is
currently unknown or not applicable for tuple.
Deals with incomplete or exceptional data. Represents the absence of a value and is not
the same as zero or spaces, which are values.
Original Slides by T. Connolly
Integrity Constraints
21
Entity Integrity In a base relation, no attribute of a primary key can
be null.
Referential Integrity If foreign key exists in a relation, either foreign key
value must match a candidate key value of some tuple in its home relation or foreign key value must be wholly null.
Original Slides by T. Connolly
Integrity Constraints
22
General Constraints Additional rules specified by users or database
administrators that define or constrain some aspect of the enterprise.
Original Slides by T. Connolly
Purpose of Normalization
23
Normalization is a technique for producing a set of suitable relations that support the data requirements of an enterprise.
Original Slides by T. Connolly
Purpose of Normalization
24
Characteristics of a suitable set of relations include: the minimal number of attributes necessary to
support the data requirements of the enterprise; attributes with a close logical relationship are found in
the same relation; minimal redundancy with each attribute represented
only once with the important exception of attributes that form all or part of foreign keys.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 23
Purpose of Normalization
25
The benefits of using a database that has a suitable set of relations is that the database will be: easier for the user to access and maintain the data; take up minimal storage space on the computer.
Original Slides by T. Connolly
How Normalization Supports Database Design
26
How normalization can be used to support database design.Original Slides by T. Connolly
Data Redundancy and Update Anomalies
27
Major aim of relational database design is to group attributes into relations to minimize data redundancy.
Original Slides by T. Connolly
Data Redundancy and Update Anomalies
28
Potential benefits for implemented database include: Updates to the data stored in the database are
achieved with a minimal number of operations thus reducing the opportunities for data inconsistencies.
Reduction in the file storage space required by the base relations thus minimizing costs.
Original Slides by T. Connolly
Data Redundancy and Update Anomalies
29
Problems associated with data redundancy are illustrated by comparing the Staff and Branch relations with the StaffBranch relation.
Original Slides by T. Connolly
Data Redundancy and Update Anomalies
30 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 24
Data Redundancy and Update Anomalies
31
StaffBranch relation has redundant data; the details of a branch are repeated for every member of staff.
In contrast, the branch information appears only once for each branch in the Branch relation and only the branch number (branchNo) is repeated in the Staff relation, to represent where each member of staff is located.
Original Slides by T. Connolly
Data Redundancy and Update Anomalies
32
Relations that contain redundant information may potentially suffer from update anomalies.
Types of update anomalies include Insertion Deletion Modification
Original Slides by T. Connolly
Lossless-join and Dependency Preservation Properties
33
Two important properties of decomposition. Lossless-join property enables us to find any instance of
the original relation from corresponding instances in the smaller relations.
Dependency preservation property enables us to enforce a constraint on the original relation by enforcing some constraint on each of the smaller relations.
Original Slides by T. Connolly
Functional Dependencies
34
Important concept associated with normalization.
Functional dependency describes relationship between attributes.
For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A B), if each value of A in R is associated with exactly one value of B in R.
Original Slides by T. Connolly
Characteristics of Functional Dependencies
35
Property of the meaning or semantics of the attributes in a relation.
Diagrammatic representation.
The determinant of a functional dependency refers to the attribute or group of attributes on the left-hand side of the arrow.
Original Slides by T. Connolly
An Example Functional Dependency
36 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 25
Example Functional Dependency that holds for all Time
37
Consider the values shown in staffNo and sName attributes of the Staff relation (see Slide 12).
Based on sample data, the following functional dependencies appear to hold.
staffNo → sNamesName → staffNo
Original Slides by T. Connolly
Example Functional Dependency that holds for all Time
38
However, the only functional dependency that remains true for all possible values for the staffNo and sName attributes of the Staff relation is:
staffNo → sName
Original Slides by T. Connolly
Characteristics of Functional Dependencies
39
Determinants should have the minimal number of attributes necessary to maintain the functional dependency with the attribute(s) on the right hand-side.
This requirement is called full functional dependency.
Original Slides by T. Connolly
Characteristics of Functional Dependencies
40
Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A.
Original Slides by T. Connolly
Example Full Functional Dependency
41
Exists in the Staff relation (see Slide 12).
staffNo, sName → branchNo
True - each value of (staffNo, sName) is associated with a single value of branchNo.
However, branchNo is also functionally dependent on a subset of (staffNo, sName), namely staffNo. Example above is a partial dependency.
Original Slides by T. Connolly
Characteristics of Functional Dependencies
42
Main characteristics of functional dependencies used in normalization: There is a one-to-one relationship between the
attribute(s) on the left-hand side (determinant) and those on the right-hand side of a functional dependency.
Holds for all time. The determinant has the minimal number of attributes
necessary to maintain the dependency with the attribute(s) on the right hand-side.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 26
Transitive Dependencies
43
Important to recognize a transitive dependency because its existence in a relation can potentially cause update anomalies.
Transitive dependency describes a condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).
Original Slides by T. Connolly
Example Transitive Dependency
44
Consider functional dependencies in the StaffBranch relation (see Slide 12).
staffNo → sName, position, salary, branchNo, bAddress
branchNo → bAddress
Transitive dependency, branchNo → bAddress exists on staffNo via branchNo.
Original Slides by T. Connolly
The Process of Normalization
45
Formal technique for analyzing a relation based on its primary key and the functional dependencies between the attributes of that relation.
Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.
Original Slides by T. Connolly
Identifying Functional Dependencies
46
Identifying all functional dependencies between a set of attributes is relatively simple if the meaning of each attribute and the relationships between the attributes are well understood.
This information should be provided by the enterprise in the form of discussions with users and/or documentation such as the users’ requirements specification.
Original Slides by T. Connolly
Identifying Functional Dependencies
47
However, if the users are unavailable for consultation and/or the documentation is incomplete then depending on the database application it may be necessary for the database designer to use their common sense and/or experience to provide the missing information.
Original Slides by T. Connolly
Example - Identifying a set of functional dependencies for the StaffBranch relation
48
Examine semantics of attributes in StaffBranch relation (see Slide 12). Assume that position held and branch determine a member of staff ’s salary.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 27
Example - Identifying a set of functional dependencies for the StaffBranch relation
49
With sufficient information available, identify the functional dependencies for the StaffBranch relation as:
staffNo → sName, position, salary, branchNo, bAddressbranchNo → bAddressbAddress → branchNobranchNo, position → salarybAddress, position → salary
Original Slides by T. Connolly
Example - Using sample data to identify functional dependencies.
50
Consider the data for attributes denoted A, B, C, D, and E in the Sample relation (see Slide 33).
Important to establish that sample data values shown in relation are representative of all possible values that can be held by attributes A, B, C, D, and E. Assume true despite the relatively small amount of data shown in this relation.
Original Slides by T. Connolly
Example - Using sample data to identify functional dependencies.
51 Original Slides by T. Connolly
Example - Using sample data to identify functional dependencies.
52
Function dependencies between attributes A to E in the Sample relation.
A C (fd1)C A (fd2)B D (fd3)A, B E (fd4)
Original Slides by T. Connolly
Identifying the Primary Key for a Relation using Functional Dependencies
53
Main purpose of identifying a set of functional dependencies for a relation is to specify the set of integrity constraints that must hold on a relation.
An important integrity constraint to consider first is the identification of candidate keys, one of which is selected to be the primary key for the relation.
Original Slides by T. Connolly
Example - Identify Primary Key for StaffBranch Relation
54
StaffBranch relation has five functional dependencies (see Slide 31).
The determinants are staffNo, branchNo, bAddress, (branchNo, position), and (bAddress, position).
To identify all candidate key(s), identify the attribute (or group of attributes) that uniquely identifies each tuple in this relation.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 28
Example - Identifying Primary Key for StaffBranch Relation
55
All attributes that are not part of a candidate key should be functionally dependent on the key.
The only candidate key and therefore primary key for StaffBranch relation, is staffNo, as all other attributes of the relation are functionally dependent on staffNo.
Original Slides by T. Connolly
Example - Identifying Primary Key for Sample Relation
56
Sample relation has four functional dependencies (see Slide 31).
The determinants in the Sample relation are A, B, C, and (A, B). However, the only determinant that functionally determines all the other attributes of the relation is (A, B).
(A, B) is identified as the primary key for this relation.
Original Slides by T. Connolly
The Process of Normalization
57
As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.
Original Slides by T. Connolly
The Process of Normalization
58 Original Slides by T. Connolly
The Process of Normalization
59
Unnormalized Form (UNF)
60
A table that contains one or more repeating groups.
To create an unnormalized table Transform the data from the information source (e.g.
form) into table format with columns and rows.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 29
First Normal Form (1NF)
61
A relation in which the intersection of each row and column contains one and only one value.
Original Slides by T. Connolly
UNF to 1NF
62
Nominate an attribute or group of attributes to act as the key for the unnormalized table.
Identify the repeating group(s) in the unnormalized table which repeats for the key attribute(s).
Original Slides by T. Connolly
UNF to 1NF
63
Remove the repeating group by Entering appropriate data into the empty columns of
rows containing the repeating data (‘flattening’ the table).
Or by Placing the repeating data along with a copy of the
original key attribute(s) into a separate relation.
Original Slides by T. Connolly
Second Normal Form (2NF)
64
Based on the concept of full functional dependency.
Full functional dependency indicates that if A and B are attributes of a relation, B is fully dependent on A if B is functionally dependent
on A but not on any proper subset of A.
Original Slides by T. Connolly
Second Normal Form (2NF)
65
A relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key.
Original Slides by T. Connolly
1NF to 2NF
66
Identify the primary key for the 1NF relation.
Identify the functional dependencies in the relation.
If partial dependencies exist on the primary key remove them by placing then in a new relation along with a copy of their determinant.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 30
Third Normal Form (3NF)
67
Based on the concept of transitive dependency.
Transitive Dependency is a condition where A, B and C are attributes of a relation such that if A
B and B C, then C is transitively dependent on A through B.
(Provided that A is not functionally dependent on B or C).
Original Slides by T. Connolly
Third Normal Form (3NF)
68
A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key.
Original Slides by T. Connolly
2NF to 3NF
69
Identify the primary key in the 2NF relation.
Identify functional dependencies in the relation.
If transitive dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their dominant.
Original Slides by T. Connolly
General Definitions of 2NF and 3NF
70
Second normal form (2NF) A relation that is in first normal form and every non-
primary-key attribute is fully functionally dependent on any candidate key.
Third normal form (3NF) A relation that is in first and second normal form and
in which no non-primary-key attribute is transitively dependent on any candidate key.
Original Slides by T. Connolly
Example of Normalization
71 Original Slides by T. Connolly
Unormalized table
72
More than one valuePrimary key
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 31
Repeating group
73
(propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName)
Original Slides by T. Connolly 74
clientRental ( clientNo, propertyNo, cName, pAddress, rentStart, rentFinish, rent, ownerNo, oName)
First normal form (1NF)
Original Slides by T. Connolly
Alternative First normal form (1NF) (optional)
75
clientRental( clientNo, cName)PropertyRentalOwner(clientNo, propertyNo,, pAddress,
rentStart, rentFinish, rent, ownerNo, oName)Original Slides by T. Connolly
Functional dependencies of the ClientRental relation
76 Original Slides by T. Connolly
Second normal form (2NF)
77
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)
Original Slides by T. Connolly
Functional dependencies for the Client, Rental, and PropertyOwner relations
78 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 32
Third normal form (3NF)
79
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo)Owner (ownerNo, oName)
Original Slides by T. Connolly
The decomposition of the ClientRental 1NF relation into 3NF relations (optional)
80 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 33
Chapter 4
Introduction to Database system
Chapter 4 - Objectives Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files Data-Dictionary Storage Storage Structures for Object-Oriented Databases
Original Slides by Avi Silberschatz2
Classification of Physical Storage Media Speed with which data can be accessed Cost per unit of data Reliability data loss on power failure or system crash physical failure of the storage device
Can differentiate storage into: volatile storage: loses contents when power is switched off non-volatile storage:
Contents persist even when power is switched off. Includes secondary and tertiary storage, as well as batter-backed up
main-memory.
Original Slides by Avi Silberschatz3
Physical Storage Media Cache – fastest and most costly form of storage; volatile;
managed by the computer system hardware. Main memory:
fast access (10s to 100s of nanoseconds; 1 nanosecond = 10–9 seconds) generally too small (or too expensive) to store the entire database
capacities of up to a few Gigabytes widely used currently Capacities have gone up and per-byte costs have decreased steadily and
rapidly (roughly factor of 2 every 2 to 3 years)
Volatile — contents of main memory are usually lost if a power failure or system crash occurs.
Original Slides by Avi Silberschatz4
Physical Storage Media (Cont.) Flash memory
Data survives power failure Data can be written at a location only once, but location can be erased
and written to again Can support only a limited number of write/erase cycles. Erasing of memory has to be done to an entire bank of memory
Reads are roughly as fast as main memory But writes are slow (few microseconds), erase is slower Cost per unit of storage roughly similar to main memory Widely used in embedded devices such as digital cameras also known as EEPROM (Electrically Erasable Programmable Read-Only
Memory)
Original Slides by Avi Silberschatz5
Physical Storage Media (Cont.) Magnetic-disk
Data is stored on spinning disk, and read/written magnetically Primary medium for the long-term storage of data; typically stores entire
database. Data must be moved from disk to main memory for access, and written
back for storage Much slower access than main memory (more on this later)
direct-access – possible to read data on disk in any order, unlike magnetic tape
Hard disks vs floppy disks Capacities range up to roughly 100 GB currently
Much larger capacity and cost/byte than main memory/flash memory Growing constantly and rapidly with technology improvements (factor of 2 to
3 every 2 years) Survives power failures and system crashes
disk failure can destroy data, but is very rare
Original Slides by Avi Silberschatz6
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 34
Physical Storage Media (Cont.) Optical storage
non-volatile, data is read optically from a spinning disk using a laser CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms Write-one, read-many (WORM) optical disks used for archival storage
(CD-R and DVD-R) Multiple write versions also available (CD-RW, DVD-RW, and DVD-
RAM) Reads and writes are slower than with magnetic disk Juke-box systems, with large numbers of removable disks, a few drives,
and a mechanism for automatic loading/unloading of disks available for storing large volumes of data
Original Slides by Avi Silberschatz7
Physical Storage Media (Cont.) Tape storage
non-volatile, used primarily for backup (to recover from disk failure), and for archival data
sequential-access – much slower than disk very high capacity (40 to 300 GB tapes available) tape can be removed from drive storage costs much cheaper than
disk, but drives are expensive Tape jukeboxes available for storing massive amounts of data
hundreds of terabytes (1 terabyte = 109 bytes) to even a petabyte (1 petabyte = 1012 bytes)
Original Slides by Avi Silberschatz8
Storage Hierarchy
Original Slides by Avi Silberschatz9
Storage Hierarchy (Cont.) primary storage: Fastest media but volatile (cache, main
memory). secondary storage: next level in hierarchy, non-volatile,
moderately fast access time also called on-line storage E.g. flash memory, magnetic disks
tertiary storage: lowest level in hierarchy, non-volatile, slow access time also called off-line storage E.g. magnetic tape, optical storage
Original Slides by Avi Silberschatz10
Magnetic Hard Disk Mechanism
NOTE: Diagram is schematic, and simplifies the structure of actual disk drives
Original Slides by Avi Silberschatz11
Magnetic Disks (optional) Read-write head
Positioned very close to the platter surface (almost touching it) Reads or writes magnetically encoded information.
Surface of platter divided into circular tracks Over 16,000 tracks per platter on typical hard disks
Each track is divided into sectors. A sector is the smallest unit of data that can be read or written. Sector size typically 512 bytes Typical sectors per track: 200 (on inner tracks) to 400 (on outer tracks)
To read/write a sector disk arm swings to position head on right track platter spins continually; data is read/written as sector passes under head
Head-disk assemblies multiple disk platters on a single spindle (typically 2 to 4) one head per platter, mounted on a common arm.
Cylinder i consists of ith track of all the platters
Original Slides by Avi Silberschatz12
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 35
Magnetic Disks (Cont.) (optional) Earlier generation disks were susceptible to head-crashes
Surface of earlier generation disks had metal-oxide coatings which would disintegrate on head crash and damage all data on disk
Current generation disks are less susceptible to such disastrous failures, although individual sectors may get corrupted
Disk controller – interfaces between the computer system and the disk drive hardware. accepts high-level commands to read or write a sector initiates actions such as moving the disk arm to the right track and actually
reading or writing the data Computes and attaches checksums to each sector to verify that data is read
back correctly If data is corrupted, with very high probability stored checksum won’t match
recomputed checksum Ensures successful writing by reading back sector after writing it Performs remapping of bad sectors
Original Slides by Avi Silberschatz13
Disk Subsystem Multiple disks connected to a computer system through a controller
Controllers functionality (checksum, bad sector remapping) often carried out by individual disks; reduces load on controller
Disk interface standards families ATA (AT adaptor) range of standards SCSI (Small Computer System Interconnect) range of standards Several variants of each standard (different speeds and capabilities)
Original Slides by Avi Silberschatz14
Performance Measures of Disks Access time – the time it takes from when a read or write request is issued to
when data transfer begins. Consists of: Seek time – time it takes to reposition the arm over the correct track.
Average seek time is 1/2 the worst case seek time. Would be 1/3 if all tracks had the same number of sectors, and we ignore the time
to start and stop arm movement 4 to 10 milliseconds on typical disks
Rotational latency – time it takes for the sector to be accessed to appear under the head. Average latency is 1/2 of the worst case latency. 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
Data-transfer rate – the rate at which data can be retrieved from or stored to the disk. 4 to 8 MB per second is typical Multiple disks may share a controller, so rate that controller can handle is also important
E.g. ATA-5: 66 MB/second, SCSI-3: 40 MB/s Fiber Channel: 256 MB/s
Original Slides by Avi Silberschatz15
Performance Measures (Cont.) Mean time to failure (MTTF) – the average time the disk
is expected to run continuously without any failure. Typically 3 to 5 years Probability of failure of new disks is quite low, corresponding to a
“theoretical MTTF” of 30,000 to 1,200,000 hours for a new disk E.g., an MTTF of 1,200,000 hours for a new disk means that given 1000
relatively new disks, on an average one will fail every 1200 hours
MTTF decreases as disk ages
Original Slides by Avi Silberschatz16
RAID RAID: Redundant Arrays of Independent Disks
disk organization techniques that manage a large numbers of disks, providing a view of a single disk of
high capacity and high speed by using multiple disks in parallel, and
high reliability by storing data redundantly, so that data can be recovered even if a disk fails
The chance that some disk out of a set of N disks will fail is much higher than the chance that a specific single disk will fail. E.g., a system with 100 disks, each with MTTF of 100,000 hours (approx. 11 years), will
have a system MTTF of 1000 hours (approx. 41 days)
Techniques for using redundancy to avoid data loss are critical with large numbers of disks
Originally a cost-effective alternative to large, expensive disks I in RAID originally stood for ``inexpensive’’
Today RAIDs are used for their higher reliability and bandwidth.
The “I” is interpreted as independent
Original Slides by Avi Silberschatz17
Improvement of Reliability via Redundancy Redundancy – store extra information that can be used to rebuild
information lost in a disk failure E.g., Mirroring (or shadowing)
Duplicate every disk. Logical disk consists of two physical disks. Every write is carried out on both disks
Reads can take place from either disk If one disk in a pair fails, data still available in the other
Data loss would occur only if a disk fails, and its mirror disk also fails before the system is repaired Probability of combined event is very small
Except for dependent failure modes such as fire or building collapse or electrical power surges
Mean time to data loss depends on mean time to failure, and mean time to repair E.g. MTTF of 100,000 hours, mean time to repair of 10 hours gives mean time to
data loss of 500*106 hours (or 57,000 years) for a mirrored pair of disks (ignoring dependent failure modes)
Original Slides by Avi Silberschatz18
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 36
Improvement in Performance via Parallelism Two main goals of parallelism in a disk system:
1. Load balance multiple small accesses to increase throughput2. Parallelize large accesses to reduce response time.
Improve transfer rate by striping data across multiple disks. Bit-level striping – split the bits of each byte across multiple disks
In an array of eight disks, write bit i of each byte to disk i. Each access can read data at eight times the rate of a single disk. But seek/access time worse than for a single disk
Bit level striping is not used much any more
Block-level striping – with n disks, block i of a file goes to disk (i mod n) + 1 Requests for different blocks can run in parallel if the blocks reside on different
disks A request for a long sequence of blocks can utilize all disks in parallel
Original Slides by Avi Silberschatz19
RAID Levels
Schemes to provide redundancy at lower cost by using disk striping combined with parity bits Different RAID organizations, or RAID levels, have differing cost, performance and
reliability characteristics
RAID Level 1: Mirrored disks with block striping
Offers best write performance. Popular for applications such as storing log files in a database system.
RAID Level 0: Block striping; non-redundant.
Used in high-performance applications where data lost is not critical.
Original Slides by Avi Silberschatz20
RAID Levels (Cont.)
RAID Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit striping.
RAID Level 3: Bit-Interleaved Parity a single parity bit is enough for error correction, not just detection, since we
know which disk has failed When writing data, corresponding parity bits must also be computed and written to a
parity bit disk
To recover data in a damaged disk, compute XOR of bits from other disks (including parity bit disk)
Original Slides by Avi Silberschatz21
RAID Levels (Cont.) RAID Level 3 (Cont.)
Faster data transfer than with a single disk, but fewer I/Os per second since every disk has to participate in every I/O.
Subsumes Level 2 (provides all its benefits, at lower cost).
RAID Level 4: Block-Interleaved Parity; uses block-level striping, and keeps a parity block on a separate disk for corresponding blocks from Nother disks. When writing data block, corresponding block of parity bits must also be
computed and written to parity disk To find value of a damaged block, compute XOR of bits from corresponding
blocks (including parity block) from other disks.
Original Slides by Avi Silberschatz22
RAID Levels (Cont.) RAID Level 4 (Cont.)
Provides higher I/O rates for independent block reads than Level 3 block read goes to a single disk, so blocks stored on different disks can be
read in parallel
Provides high transfer rates for reads of multiple blocks than no-striping Before writing a block, parity data must be computed
Can be done by using old parity block, old value of current block and new value of current block (2 block reads + 2 block writes)
Or by recomputing the parity value using the new values of blocks corresponding to the parity block More efficient for writing large amounts of data sequentially
Parity block becomes a bottleneck for independent block writes since every block write also writes to parity disk
Original Slides by Avi Silberschatz23
RAID Levels (Cont.) RAID Level 5: Block-Interleaved Distributed Parity; partitions data and
parity among all N + 1 disks, rather than storing data in N disks and parity in 1 disk. E.g., with 5 disks, parity block for nth set of blocks is stored on disk (n mod 5) +
1, with the data blocks stored on the other 4 disks.
Original Slides by Avi Silberschatz24
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 37
RAID Levels (Cont.) RAID Level 5 (Cont.)
Higher I/O rates than Level 4. Block writes occur in parallel if the blocks and their parity blocks are on different disks.
Subsumes Level 4: provides same benefits, but avoids bottleneck of parity disk.
RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores extra redundant information to guard against multiple disk failures. Better reliability than Level 5 at a higher cost; not used as widely.
Original Slides by Avi Silberschatz25 Original Slides by Avi Silberschatz26
Choice of RAID Level (optional) Factors in choosing RAID level
Monetary cost Performance: Number of I/O operations per second, and bandwidth during normal
operation Performance during failure Performance during rebuild of failed disk
Including time taken to rebuild failed disk
RAID 0 is used only when data safety is not important E.g. data can be recovered quickly from other sources
Level 2 and 4 never used since they are subsumed by 3 and 5 Level 3 is not used anymore since bit-striping forces single block reads to access all
disks, wasting disk arm movement, which block striping (level 5) avoids Level 6 is rarely used since levels 1 and 5 offer adequate safety for almost all
applications So competition is between 1 and 5 only
Original Slides by Avi Silberschatz27
Choice of RAID Level (Cont.) (optional) Level 1 provides much better write performance than level 5
Level 5 requires at least 2 block reads and 2 block writes to write a single block, whereas Level 1 only requires 2 block writes
Level 1 preferred for high update environments such as log disks
Level 1 had higher storage cost than level 5 disk drive capacities increasing rapidly (50%/year) whereas disk access times have
decreased much less (x 3 in 10 years) I/O requirements have increased greatly, e.g. for Web servers When enough disks have been bought to satisfy required rate of I/O, they often
have spare storage capacity so there is often no extra monetary cost for Level 1!
Level 5 is preferred for applications with low update rate,and large amounts of data
Level 1 is preferred for all other applications
Original Slides by Avi Silberschatz28
Hardware Issues Software RAID: RAID implementations done entirely in
software, with no special hardware support Hardware RAID: RAID implementations with special
hardware Use non-volatile RAM to record writes that are being executed Beware: power failure during write can result in corrupted disk
E.g. failure after writing one block but before writing the second in a mirrored system
Such corrupted data must be detected when power is restored Recovery from corruption is similar to recovery from failed disk NV-RAM helps to efficiently detected potentially corrupted blocks
Otherwise all blocks of disk must be read and compared with mirror/parity block
Original Slides by Avi Silberschatz29
Hardware Issues (Cont.) (optional) Hot swapping: replacement of disk while system is running, without
power down Supported by some hardware RAID systems, reduces time to recovery, and improves availability greatly
Many systems maintain spare disks which are kept online, and used as replacements for failed disks immediately on detection of failure Reduces time to recovery greatly
Many hardware RAID systems ensure that a single point of failure will not stop the functioning of the system by using Redundant power supplies with battery backup Multiple controllers and multiple interconnections to guard against
controller/interconnection failures
Original Slides by Avi Silberschatz30
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 38
Optical Disks Compact disk-read only memory (CD-ROM)
Disks can be loaded into or removed from a drive High storage capacity (640 MB per disk) High seek times or about 100 msec (optical read head is heavier and slower) Higher latency (3000 RPM) and lower data-transfer rates (3-6 MB/s) compared
to magnetic disks
Digital Video Disk (DVD) DVD-5 holds 4.7 GB , and DVD-9 holds 8.5 GB DVD-10 and DVD-18 are double sided formats with capacities of 9.4 GB and 17
GB Other characteristics similar to CD-ROM
Record once versions (CD-R and DVD-R) are becoming popular data can only be written once, and cannot be erased. high capacity and long lifetime; used for archival storage Multi-write versions (CD-RW, DVD-RW and DVD-RAM) also available
Original Slides by Avi Silberschatz31
Magnetic Tapes Hold large volumes of data and provide high transfer rates
Few GB for DAT (Digital Audio Tape) format, 10-40 GB with DLT (Digital Linear Tape) format, 100 GB+ with Ultrium format, and 330 GB with Ampex helical scan format
Transfer rates from few to 10s of MB/s
Currently the cheapest storage medium Tapes are cheap, but cost of drives is very high
Very slow access time in comparison to magnetic disks and optical disks limited to sequential access. Some formats (Accelis) provide faster seek (10s of seconds) at cost of lower capacity
Used mainly for backup, for storage of infrequently used information, and as an off-line medium for transferring information from one system to another.
Tape jukeboxes used for very large capacity storage (terabyte (1012 bytes) to petabye (1015 bytes)
Original Slides by Avi Silberschatz32
Data Dictionary Storage
Information about relations names of relations names and types of attributes of each relation names and definitions of views integrity constraints
User and accounting information, including passwords Statistical and descriptive data
number of tuples in each relation
Physical file organization information How relation is stored (sequential/hash/…) Physical location of relation
operating system file name or disk addresses of blocks containing records of the relation
Information about indices (Chapter 12)
Data dictionary (also called system catalog) stores metadata: that is, data about data, such as
Original Slides by Avi Silberschatz33
Data Dictionary Storage (Cont.) Catalog structure: can use either
specialized data structures designed for efficient access
a set of relations, with existing system features used to ensure efficient access
The latter alternative is usually preferred
A possible catalog representation:
Relation-metadata = (relation-name, number-of-attributes, storage-organization, location)
Attribute-metadata = (attribute-name, relation-name, domain-type, position, length)
User-metadata = (user-name, encrypted-password, group)Index-metadata = (index-name, relation-name, index-type,
index-attributes)View-metadata = (view-name, definition)
Original Slides by Avi Silberschatz34
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 39
Chapter 5
Database Development
Chapter 5 - Objectives
2
Main components of an information system.
Main stages of database system development lifecycle.
Main phases of database design: conceptual, logical, and physical design.
2Original Slides by T. Connolly
Software Depression
Original Slides by T. Connolly3
Last few decades have seen proliferation of software applications, many requiring constant maintenance involving: correcting faults, implementing new user requirements, modifying software to run on new or upgraded
platforms. Effort spent on maintenance began to absorb resources
at an alarming rate.
Software Depression
Original Slides by T. Connolly4
As a result, many major software projects were late, over budget, unreliable, difficult to maintain, performed poorly.
In late 1960s, led to ‘software crisis’, now refer to as the ‘software depression’.
Software Depression
Original Slides by T. Connolly5
Major reasons for failure of software projects includes:- lack of a complete requirements specification;- lack of appropriate development methodology;- poor decomposition of design into manageable components.
Structured approach to development was proposed called Information Systems Lifecycle (ISLC).
Information System
Original Slides by T. Connolly6
Resources that enable collection, management, control, and dissemination of information throughout an organization.
Database is fundamental component of IS, and its development/usage should be viewed from perspective of the wider requirements of the organization.
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 40
Database System Development Lifecycle
Original Slides by T. Connolly7
Database planning
System definition
Requirements collection and analysis
Database design
DBMS selection (optional)
Database System Development Lifecycle
Original Slides by T. Connolly8
Application design
Prototyping (optional)
Implementation
Data conversion and loading
Testing
Operational maintenance
9
Database Planning
System Definition
Requirements correction and
analysis
Conceptual database design
Logical database design
Physical database design
Implementation
Data conversion and loading
Testing
Operational maintenance
Database design
Application design
DBMS selection (optional)
Prototyping (optional)
Stages of the Database System Development Lifecycle
Database Planning
Original Slides by T. Connolly10
Management activities that allow stages of database system development lifecycle to be realized as efficiently and effectively as possible.
Must be integrated with overall IS strategy of the organization.
Database Planning
Original Slides by T. Connolly11
Database planning should also include development of standards that govern: how data will be collected, how the format should be specified, what necessary documentation will be needed, how design and implementation should proceed.
System Definition
Original Slides by T. Connolly12
Describes scope and boundaries of database system and the major user views.
User view defines what is required of a database system from perspective of: a particular job role (such as Manager or Supervisor) or enterprise application area (such as marketing, personnel,
or stock control).
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 41
System Definition
Original Slides by T. Connolly13
Database application may have one or more user views.
Identifying user views helps ensure that no major users of the database are forgotten when developing requirements for new system.
User views also help in development of complex database system allowing requirements to be broken down into manageable pieces.
Representation of a Database System with Multiple User Views
Original Slides by T. Connolly14
Representation of a database system with multiple user views: user views (1,2, and 3) and (5 and 6) have overlapping requirements (shown as hatched areas), whereas user view 4 has distinct requirements.
Requirements Collection and Analysis
Original Slides by T. Connolly15
Process of collecting and analyzing information about the part of organization to be supported by the database system, and using this information to identify users’ requirements of new system.
Requirements Collection and Analysis
Original Slides by T. Connolly16
Information is gathered for each major user view including: a description of data used or generated; details of how data is to be used/generated; any additional requirements for new database system.
Information is analyzed to identify requirements to be included in new database system. Described in the requirements specification.
Database Design
Original Slides by T. Connolly17
Process of creating a design for a database that will support the enterprise’s mission statement and mission objectives for the required database system.
Database Design
Original Slides by T. Connolly18
Main approaches include: Top-down Bottom-up Inside-out Mixed
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 42
Database Design
Original Slides by T. Connolly19
Main purposes of data modeling include: to assist in understanding the meaning (semantics) of the
data; to facilitate communication about the information
requirements.
Building data model requires answering questions about entities, relationships, and attributes.
Database Design
Original Slides by T. Connolly20
A data model ensures we understand:- each user’s perspective of the data;- nature of the data itself, independent of its physical
representations;- use of data across user views.
Database Design
Original Slides by T. Connolly21
Three phases of database design:
Conceptual database design Logical database design Physical database design.
Conceptual Database Design
Original Slides by T. Connolly22
Process of constructing a model of the data used in an enterprise, independent of all physical considerations.
Data model is built using the information in users’ requirements specification.
Conceptual data model is source of information for logical design phase.
Logical Database Design
Original Slides by T. Connolly23
Process of constructing a model of the data used in an enterprise based on a specific data model (e.g. relational), but independent of a particular DBMS and other physical considerations.
Conceptual data model is refined and mapped on to a logical data model.
Physical Database Design
Original Slides by T. Connolly24
Process of producing a description of the database implementation on secondary storage.
Describes base relations, file organizations, and indexes used to achieve efficient access to data. Also describes any associated integrity constraints and secuirty measures.
Tailored to a specific DBMS system.
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 43
Three-Level ANSI-SPARC Architecture and Phases of Database Design
Original Slides by T. Connolly25
DBMS Selection (optional)
Original Slides by T. Connolly26
Selection of an appropriate DBMS to support the database system.
Undertaken at any time prior to logical design provided sufficient information is available regarding system requirements.
Main steps to selecting a DBMS: define Terms of Reference of study; shortlist two or three products; evaluate products; recommend selection and produce report.
Example - Evaluation of DBMS Product (optional)
Original Slides by T. Connolly27
Analysis of features for DBMS product evaluation.
Application Design
Original Slides by T. Connolly28
Design of user interface and application programs that use and process the database.
Database design and application design are parallel activities.
Includes two important activities: transaction design; user interface design.
Application Design – Transactions (optional)
Original Slides by T. Connolly29
An action, or series of actions, carried out by a single user or application program, which accesses or changes content of the database.
Should define and document the high-level characteristics of the transactions required.
Application Design – Transactions (optional)
Original Slides by T. Connolly30
Important characteristics of transactions: data to be used by the transaction; functional characteristics of the transaction; output of the transaction; importance to the users; expected rate of usage.
Three main types of transactions: retrieval, update, and mixed.
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 44
Prototyping (optional)
Original Slides by T. Connolly31
Building working model of a database system.
Purpose to identify features of a system that work well, or are
inadequate; to suggest improvements or even new features; to clarify the users’ requirements; to evaluate feasibility of a particular system design.
Implementation
Original Slides by T. Connolly32
Physical realization of the database and application designs. Use DDL to create database schemas and empty
database files. Use DDL to create any specified user views. Use 3GL or 4GL to create the application programs.
This will include the database transactions implemented using the DML, possibly embedded in a host programming language.
Data Conversion and Loading
Original Slides by T. Connolly33
Transferring any existing data into new database and converting any existing applications to run on new database.
Only required when new database system is replacing an old system. DBMS normally has utility that loads existing files into
new database. May be possible to convert and use application
programs from old system for use by new system.
Testing
Original Slides by T. Connolly34
Process of running the database system with intent of finding errors.
Use carefully planned test strategies and realistic data.
Testing cannot show absence of faults; it can show only that software faults are present.
Demonstrates that database and application programs appear to be working according to requirements.
Testing (optional)
Original Slides by T. Connolly35
Should also test usability of system. Evaluation conducted against a usability
specification.
Examples of criteria include: Learnability; Performance; Robustness; Recoverability; Adaptability.
Operational Maintenance
Original Slides by T. Connolly36
Process of monitoring and maintaining database system following installation.
Monitoring performance of system. if performance falls, may require tuning or
reorganization of the database. Maintaining and upgrading database application
(when required). Incorporating new requirements into database
application.
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 45
Chapter 6
Relational Algebra
Chapter 6 - Objectives
2
Meaning of the term relational completeness.
How to form queries in relational algebra.
Original Slides by T. Connolly
Introduction
3
Relational algebra and relational calculus are formal languages associated with the relational model.
Informally, relational algebra is a (high-level) procedural language and relational calculus a non-procedural language.
However, formally both are equivalent to one another.
Original Slides by T. Connolly
Relational Algebra
4
Relational algebra operations work on one or more relations to define another relation without changing the original relations.
Both operands and results are relations, so output from one operation can become input to another operation.
Allows expressions to be nested, just as in arithmetic. This property is called closure.
Original Slides by T. Connolly
Relational Algebra
5
Five basic operations in relational algebra: Selection, Projection, Cartesian product, Union, and Set Difference.
These perform most of the data retrieval operations needed.
Also have Join, Intersection, and Division operations, which can be expressed in terms of 5 basic operations.
Original Slides by T. Connolly
Relational Algebra Operations
6 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 46
Relational Algebra Operations
7 Original Slides by T. Connolly
Selection (or Restriction)
8
predicate (R) Works on a single relation R and defines a relation that
contains only those tuples (rows) of R that satisfy the specified condition (predicate).
Original Slides by T. Connolly
Example - Selection (or Restriction)
9
List all staff with a salary greater than £10,000.
salary > 10000 (Staff)
Original Slides by T. Connolly
Projection
10
col1, . . . , coln(R) Works on a single relation R and defines a relation that
contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates.
Original Slides by T. Connolly
Example - Projection
11
Produce a list of salaries for all staff, showing only staffNo, fName, lName, and salary details.
staffNo, fName, lName, salary(Staff)
Original Slides by T. Connolly
Union
12
R S Union of two relations R and S defines a relation that
contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated.
R and S must be union-compatible.
If R and S have I and J tuples, respectively, union is obtained by concatenating them into one relation with a maximum of (I + J) tuples.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 47
Example - Union
13
List all cities where there is either a branch office or a property for rent.
city(Branch) city(PropertyForRent)
Original Slides by T. Connolly
Set Difference
14
R – S Defines a relation consisting of the tuples that are in
relation R, but not in S. R and S must be union-compatible.
Original Slides by T. Connolly
Example - Set Difference
15
List all cities where there is a branch office but no properties for rent.
city(Branch) – city(PropertyForRent)
Original Slides by T. Connolly
Intersection
16
R S Defines a relation consisting of the set of all tuples
that are in both R and S. R and S must be union-compatible.
Expressed using basic operations:R S = R – (R – S)
Original Slides by T. Connolly
Example - Intersection
17
List all cities where there is both a branch office and at least one property for rent.
city(Branch) city(PropertyForRent)
Original Slides by T. Connolly
Cartesian product
18
R X S Defines a relation that is the concatenation of every
tuple of relation R with every tuple of relation S.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 48
Example - Cartesian product
19
List the names and comments of all clients who have viewed a property for rent.(clientNo, fName, lName(Client)) X (clientNo, propertyNo,
comment (Viewing))
Original Slides by T. Connolly
Example - Cartesian product and Selection
20
Use selection operation to extract those tuples where Client.clientNo = Viewing.clientNo.Client.clientNo = Viewing.clientNo((clientNo, fName, lName(Client))
(clientNo, propertyNo, comment(Viewing)))
Cartesian product and Selection can be reduced to a singleoperation called a Join.
Original Slides by T. Connolly
Join Operations
21
Join is a derivative of Cartesian product.
Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations.
One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.
Original Slides by T. Connolly
Join Operations
22
Various forms of join operation Theta join Equijoin (a particular type of Theta join) Natural join Outer join Semijoin
Original Slides by T. Connolly
Theta join (-join)
23
R FS Defines a relation that contains tuples satisfying
the predicate F from the Cartesian product of R and S.
The predicate F is of the form R.ai S.bi where may be one of the comparison operators (<, , >, , =, ).
Original Slides by T. Connolly
Theta join (-join)
24
Can rewrite Theta join using basic Selection and Cartesian product operations.
R FS = F(R S)
Degree of a Theta join is sum of degrees of the operand relations R and S. If predicate F contains only equality (=), the term Equijoin is used.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 49
Example - Equijoin
25
List the names and comments of all clients who have viewed a property for rent.
(clientNo, fName, lName(Client)) Client.clientNo = Viewing.clientNo
(clientNo, propertyNo, comment(Viewing))
Original Slides by T. Connolly
Natural join
26
R S An Equijoin of the two relations R and S over all
common attributes x. One occurrence of each common attribute is eliminated from the result.
Original Slides by T. Connolly
Example - Natural join
27
List the names and comments of all clients who have viewed a property for rent.(clientNo, fName, lName(Client)) (clientNo, propertyNo, comment(Viewing))
Original Slides by T. Connolly
Outer join
28
To display rows in the result that do not have matching values in the join column, use Outer join.
R S (Left) outer join is join in which tuples from R that
do not have matching values in common columns of S are also included in result relation.
Original Slides by T. Connolly
Example - Left Outer join
29
Produce a status report on property viewings.
propertyNo, street, city(PropertyForRent) Viewing
Original Slides by T. Connolly
Semijoin
30
R F S Defines a relation that contains the tuples of R that
participate in the join of R with S.
Can rewrite Semijoin using Projection and Join:
R F S = A(R F S)
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 50
Example - Semijoin
31
List complete details of all staff who work at the branch in Glasgow.
Staff Staff.branchNo=Branch.branchNo(city=‘Glasgow’(Branch))
Original Slides by T. Connolly
Division
32
R S Defines a relation over the attributes C that consists
of set of tuples from R that match combination of every tuple in S.
Expressed using basic operations:T1 C(R)
T2 C((S X T1) – R)
T T1 –T2
Original Slides by T. Connolly
Example - Division
33
Identify all clients who have viewed all properties with three rooms.
(clientNo, propertyNo(Viewing)) (propertyNo(rooms = 3 (PropertyForRent)))
Original Slides by T. Connolly
Aggregate Operations
34
AL(R) Applies aggregate function list, AL, to R to define a
relation over the aggregate list. AL contains one or more (<aggregate_function>,
<attribute>) pairs .
Main aggregate functions are: COUNT, SUM, AVG, MIN, and MAX.
Original Slides by T. Connolly
Example – Aggregate Operations
35
How many properties cost more than £350 per month to rent?
R(myCount) COUNT propertyNo (σrent > 350(PropertyForRent))
Grouping Operation
36
GAAL(R) Groups tuples of R by grouping attributes, GA, and
then applies aggregate function list, AL, to define a new relation.
AL contains one or more (<aggregate_function>, <attribute>) pairs.
Resulting relation contains the grouping attributes, GA, along with results of each of the aggregate functions.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 51
Example – Grouping Operation
37
Find the number of staff working in each branch and the sum of their salaries.
R(branchNo, myCount, mySum)
branchNo COUNT staffNo, SUM salary (Staff)
Other Languages
38
Transform-oriented languages are non-procedural languages that use relations to transform input data into required outputs (e.g. SQL).
Graphical languages provide user with picture of the structure of the relation. User fills in example of what is wanted and system returns required data in that format (e.g. QBE).
Original Slides by T. Connolly
Other Languages
39
4GLs can create complete customized application using limited set of commands in a user-friendly, often menu-driven environment.
Some systems accept a form of natural language, sometimes called a 5GL, although this development is still at an early stage.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 52
Chapter 7
SQL: Data Manipulation and Data Definition
Chapter 7 - Objectives
2
Purpose and importance of SQL. How to retrieve data from database using SELECT
and: Use compound WHERE conditions. Sort query results using ORDER BY. Use aggregate functions. Group data using GROUP BY and HAVING. Use subqueries. Join tables together. Perform set operations (UNION, INTERSECT, EXCEPT).
How to update database using INSERT, UPDATE, andDELETE.
Original Slides by T. Connolly
Chapter 7 - Objectives
3
Data types supported by SQL standard.
Purpose of integrity enhancement feature of SQL.
How to define integrity constraints using SQL.
How to use the integrity enhancement feature inthe CREATE and ALTERTABLE statements.
Original Slides by T. Connolly
Objectives of SQL
4
Ideally, database language should allow user to: create the database and relation structures; perform insertion, modification, deletion of data from
relations; perform simple and complex queries.
Must perform these tasks with minimal usereffort and command structure/syntax must beeasy to learn.
It must be portable.
Original Slides by T. Connolly
Objectives of SQL
5
SQL is a transform-oriented language with 2major components:
A DDL for defining database structure. A DML for retrieving and updating data.
Until SQL:1999, SQL did not contain flow ofcontrol commands. These had to be implementedusing a programming or job-control language, orinteractively by the decisions of user.
Original Slides by T. Connolly
Objectives of SQL
6
SQL is relatively easy to learn: it is non-procedural - you specify what information you
require, rather than how to get it; it is essentially free-format.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 53
Objectives of SQL
7
Consists of standard English words:
1) CREATETABLE Staff(staffNoVARCHAR(5),lNameVARCHAR(15),salary DECIMAL(7,2));
2) INSERT INTO StaffVALUES (‘SG16’,‘Brown’, 8300);3) SELECT staffNo, lName, salary
FROM StaffWHERE salary > 10000;
Original Slides by T. Connolly
Objectives of SQL
8
Can be used by range of users including DBAs,management, application developers, and othertypes of end users.
An ISO standard now exists for SQL, making itboth the formal and de facto standard languagefor relational databases.
Original Slides by T. Connolly
History of SQL
9
In 1974, D. Chamberlin (IBM San Jose Laboratory)defined language called ‘Structured English QueryLanguage’ (SEQUEL).
A revised version, SEQUEL/2, was defined in 1976but name was subsequently changed to SQL forlegal reasons.
Original Slides by T. Connolly
History of SQL
10
Still pronounced ‘see-quel’, though officialpronunciation is ‘S-Q-L’.
IBM subsequently produced a prototype DBMScalled System R, based on SEQUEL/2.
Roots of SQL, however, are in SQUARE(Specifying Queries as Relational Expressions),which predates System R project.
Original Slides by T. Connolly
History of SQL
11
In late 70s, ORACLE appeared and was probably firstcommercial RDBMS based on SQL.
In 1987, ANSI and ISO published an initial standard forSQL.
In 1989, ISO published an addendum that defined an‘Integrity Enhancement Feature’.
In 1992, first major revision to ISO standard occurred,referred to as SQL2 or SQL/92.
In 1999, SQL:1999 was released with support forobject-oriented data management.
In late 2003, SQL:2003 was released.
Original Slides by T. Connolly
Importance of SQL
12
SQL has become part of application architectures such as IBM’s Systems Application Architecture.
It is strategic choice of many large and influential organizations (e.g. X/OPEN).
SQL is Federal Information Processing Standard (FIPS) to which conformance is required for all sales of databases to American Government.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 54
Importance of SQL
13
SQL is used in other standards and eveninfluences development of other standards as adefinitional tool. Examples include:
ISO’s Information Resource Directory System (IRDS)Standard
Remote Data Access (RDA) Standard.
Original Slides by T. Connolly
Writing SQL Commands
14
SQL statement consists of reserved words anduser-defined words.
– Reserved words are a fixed part of SQL and mustbe spelt exactly as required and cannot be splitacross lines.
– User-defined words are made up by user andrepresent names of various database objectssuch as relations, columns, views.
Original Slides by T. Connolly
Writing SQL Commands
15
Most components of an SQL statement are caseinsensitive, except for literal character data.
More readable with indentation and lineation: Each clause should begin on a new line. Start of a clause should line up with start of other
clauses. If clause has several parts, should each appear on a
separate line and be indented under start of clause.
Original Slides by T. Connolly
Writing SQL Commands
16
Use extended form of BNF notation:
- Upper-case letters represent reserved words.- Lower-case letters represent user-defined words.- | indicates a choice among alternatives.- Curly braces indicate a required element.- Square brackets indicate an optional element.- … indicates optional repetition (0 or more).
Original Slides by T. Connolly
Literals
17
Literals are constants used in SQL statements.
All non-numeric literals must be enclosed insingle quotes (e.g.‘London’).
All numeric literals must not be enclosed inquotes (e.g. 650.00).
Original Slides by T. Connolly
SELECT Statement
18
SELECT [DISTINCT | ALL]{* | [columnExpression [AS newName]] [,...] }
FROM TableName [alias] [, ...][WHERE condition][GROUP BY columnList] [HAVING condition][ORDER BY columnList]
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 55
SELECT Statement
19
FROM Specifies table(s) to be used.WHERE Filters rows.GROUP BYForms groups of rows with same
column value.HAVING Filters groups subject to some
condition.SELECT Specifies which columns are to
appear in output.ORDER BY Specifies the order of the output.
Original Slides by T. Connolly
SELECT Statement
20
Order of the clauses cannot be changed.
Only SELECT and FROM are mandatory.
Original Slides by T. Connolly
Example 1 All Columns, All Rows
21
List full details of all staff.
SELECT staffNo, fName, lName, address,position, sex, DOB, salary, branchNo
FROM Staff;
Can use * as an abbreviation for ‘all columns’:
SELECT *FROM Staff;
Original Slides by T. Connolly
Example 1 All Columns, All Rows
22 Original Slides by T. Connolly
Result table for Example 1
Example 2 Specific Columns, All Rows
23
Produce a list of salaries for all staff, showing onlystaff number, first and last names, and salary.
SELECT staffNo, fName, lName, salaryFROM Staff;
Original Slides by T. Connolly
Example 2 Specific Columns, All Rows
24 Original Slides by T. Connolly
Result table for Example 2
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 56
Example 3 Use of DISTINCT
25
List the property numbers of all properties thathave been viewed.
SELECT propertyNoFROM Viewing;
Original Slides by T. Connolly
Example 3 Use of DISTINCT
26
Use DISTINCT to eliminate duplicates:
SELECT DISTINCT propertyNoFROMViewing;
Original Slides by T. Connolly
Example 4 Calculated Fields
27
Produce list of monthly salaries for all staff,showing staff number, first/last name, and salary.
SELECT staffNo, fName, lName, salary/12 FROM Staff;
Original Slides by T. Connolly
Result table for Example 4
Example 5 Calculated Fields
28
To name column, use AS clause:
SELECT staffNo, fName, lName, salary/12 AS monthlySalary
FROM Staff;
Original Slides by T. Connolly
Example 5 Comparison Search Condition
29
List all staff with a salary greater than 10,000.
SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary > 10000;
Original Slides by T. Connolly
Result table for Example 5
Example 6 Compound Comparison Search Condition
30
List addresses of all branch offices in London orGlasgow.
SELECT *FROM BranchWHERE city = ‘London’ OR city = ‘Glasgow’;
Original Slides by T. Connolly
Result table for Example 6
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 57
Example 7 Range Search Condition
31
List all staff with a salary between 20,000 and30,000.
SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary BETWEEN 20000 AND 30000;
BETWEEN test includes the endpoints of range.
Original Slides by T. Connolly
Example 7 Range Search Condition
32 Original Slides by T. Connolly
Result table for Example 7
Example 7 Range Search Condition
33
Also a negated version NOT BETWEEN. BETWEEN does not add much to SQL’s
expressive power. Could also write:
SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary>=20000 AND salary <= 30000;
Useful, though, for a range of values.
Original Slides by T. Connolly
Example 8 Set Membership
34
List all managers and supervisors.
SELECT staffNo, fName, lName, positionFROM StaffWHERE position IN (‘Manager’,‘Supervisor’);
Original Slides by T. Connolly
Result table for Example 8
Example 8 Set Membership
35
There is a negated version (NOT IN). IN does not add much to SQL’s expressive power.Could have expressed this as:
SELECT staffNo, fName, lName, positionFROM StaffWHERE position=‘Manager’ OR
position=‘Supervisor’;
IN is more efficient when set contains many values.
Original Slides by T. Connolly
Example 9 Pattern Matching
36
Find all owners with the string ‘Glasgow’ in theiraddress.
SELECT ownerNo, fName, lName, address, telNoFROM PrivateOwnerWHERE address LIKE ‘%Glasgow%’;
Original Slides by T. Connolly
Result table for Example 9
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 58
Example 9 Pattern Matching
37
SQL has two special pattern matching symbols:
%: sequence of zero or more characters; _ (underscore): any single character.
LIKE ‘%Glasgow%’ means a sequence of charactersof any length containing ‘Glasgow’.
Original Slides by T. Connolly
Example 10 NULL Search Condition
38
List details of all viewings on property PG4 wherea comment has not been supplied.
There are 2 viewings for property PG4, one withand one without a comment.
Have to test for null explicitly using specialkeyword IS NULL:
SELECT clientNo, viewDateFROMViewingWHERE propertyNo = ‘PG4’ AND
comment IS NULL;
Original Slides by T. Connolly
Example 10 NULL Search Condition
39
Negated version (IS NOT NULL) can testfor non-null values.
Example 11 Single Column Ordering
40
List salaries for all staff, arranged in descendingorder of salary.
SELECT staffNo, fName, lName, salaryFROM StaffORDER BY salary DESC;
Original Slides by T. Connolly
Example 11 Single Column Ordering
41 Original Slides by T. Connolly
Result table for Example 11
Example 12 Multiple Column Ordering
42
Produce abbreviated list of properties in order ofproperty type.
SELECT propertyNo, type, rooms, rentFROM PropertyForRentORDER BY type;
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 59
Example 12 Multiple Column Ordering
43 Original Slides by T. Connolly
Result table for Example 12 with one sort key
Example 12 Multiple Column Ordering
44
Four flats in this list - as no minor sort keyspecified, system arranges these rows in any orderit chooses.
To arrange in order of rent, specify minor order:
SELECT propertyNo, type, rooms, rentFROM PropertyForRentORDER BY type, rent DESC;
Original Slides by T. Connolly
Example 12 Multiple Column Ordering
45 Original Slides by T. Connolly
Result table for Example 12 with two sort keys
SELECT Statement - Aggregates
46
ISO standard defines five aggregate functions:
COUNT returns number of values in specified column.
SUM returns sum of values in specified column.
AVG returns average of values in specified column.
MIN returns smallest value in specified column.
MAX returns largest value in specified column.
Original Slides by T. Connolly
SELECT Statement - Aggregates
47
Each operates on a single column of a table andreturns a single value.
COUNT, MIN, and MAX apply to numeric andnon-numeric fields, but SUM and AVG may beused on numeric fields only.
Apart from COUNT(*), each function eliminatesnulls first and operates only on remaining non-null values.
Original Slides by T. Connolly
SELECT Statement - Aggregates
48
COUNT(*) counts all rows of a table, regardlessof whether nulls or duplicate values occur.
Can use DISTINCT before column name toeliminate duplicates.
DISTINCT has no effect with MIN/MAX, but mayhave with SUM/AVG.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 60
SELECT Statement - Aggregates
49
Aggregate functions can be used only in SELECTlist and in HAVING clause.
If SELECT list includes an aggregate functionand there is no GROUP BY clause, SELECT listcannot reference a column out with anaggregate function. For example, the following isillegal:
SELECT staffNo, COUNT(salary)FROM Staff;
Original Slides by T. Connolly
Example 13 Use of COUNT(*)
50
How many properties cost more than £350 permonth to rent?
SELECT COUNT(*) AS myCountFROM PropertyForRentWHERE rent > 350;
Example14 Use of COUNT(DISTINCT)
51
How many different properties viewed in May ‘04?
SELECT COUNT(DISTINCT propertyNo) AS myCountFROMViewingWHERE viewDate BETWEEN ‘1-May-04’
AND ‘31-May-04’;
Example 15 Use of COUNT and SUM
52
Find number of Managers and sum of theirsalaries.
SELECT COUNT(staffNo) AS myCount,SUM(salary) AS mySum
FROM StaffWHERE position = ‘Manager’;
Example 16 Use of MIN, MAX, AVG
53
Find minimum, maximum, and average staffsalary.
SELECT MIN(salary) AS myMin,MAX(salary) AS myMax,AVG(salary) AS myAvg
FROM Staff;
SELECT Statement - Grouping
54
Use GROUP BY clause to get sub-totals. SELECT and GROUP BY closely integrated: each
item in SELECT list must be single-valued pergroup, and SELECT clause may only contain: column names aggregate functions constants expression involving combinations of the above.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 61
SELECT Statement - Grouping
55
All column names in SELECT list must appear inGROUP BY clause unless name is used only in anaggregate function.
If WHERE is used with GROUP BY, WHERE isapplied first, then groups are formed fromremaining rows satisfying predicate.
ISO considers two nulls to be equal for purposesof GROUP BY.
Original Slides by T. Connolly
Example 17 Use of GROUP BY
56
Find number of staff in each branch and theirtotal salaries.
SELECT branchNo,COUNT(staffNo) AS myCount,SUM(salary) AS mySum
FROM StaffGROUP BY branchNoORDER BY branchNo;
Original Slides by T. Connolly
Example 17 Use of GROUP BY
57 Original Slides by T. Connolly
Restricted Groupings – HAVING clause
58
HAVING clause is designed for use with GROUP BY to restrict groups that appear in final result table.
Similar to WHERE, but WHERE filters individualrows whereas HAVING filters groups.
Column names in HAVING clause must alsoappear in the GROUP BY list or be containedwithin an aggregate function.
Original Slides by T. Connolly
Example 18 Use of HAVING
59
For each branch with more than 1 member ofstaff, find number of staff in each branch andsum of their salaries.
SELECT branchNo,COUNT(staffNo) AS myCount,SUM(salary) AS mySum
FROM StaffGROUP BY branchNoHAVING COUNT(staffNo) > 1ORDER BY branchNo;
Original Slides by T. Connolly
Example 18 Use of HAVING
60 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 62
Subqueries
61
Some SQL statements can have a SELECTembedded within them.
A subselect can be used in WHERE and HAVINGclauses of an outer SELECT, where it is called asubquery or nested query.
Subselects may also appear in INSERT, UPDATE,and DELETE statements.
Original Slides by T. Connolly
Example 19 Subquery with Equality
62
List staff who work in branch at ‘163 Main St’.
SELECT staffNo, fName, lName, positionFROM StaffWHERE branchNo =
(SELECT branchNoFROM BranchWHERE street = ‘163 Main St’);
Original Slides by T. Connolly
Example 19 Subquery with Equality
63
Inner SELECT finds branch number for branch at‘163 Main St’ (‘B003’).
Outer SELECT then retrieves details of all staffwho work at this branch.
Outer SELECT then becomes:
SELECT staffNo, fName, lName, positionFROM StaffWHERE branchNo = ‘B003’;
Original Slides by T. Connolly
Example 19 Subquery with Equality
64 Original Slides by T. Connolly
Result table for Example 19
Example 20 Subquery with Aggregate
65
List all staff whose salary is greater than the averagesalary, and show by how much.
SELECT staffNo, fName, lName, position,salary – (SELECT AVG(salary) FROM Staff) As SalDiff
FROM StaffWHERE salary >
(SELECT AVG(salary)FROM Staff);
Original Slides by T. Connolly
Example 20 Subquery with Aggregate
66
Cannot write ‘WHERE salary > AVG(salary)’ Instead, use subquery to find average salary
(17000), and then use outer SELECT to find thosestaff with salary greater than this:
SELECT staffNo, fName, lName, position,salary – 17000 As salDiff
FROM StaffWHERE salary > 17000;
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 63
Example 20 Subquery with Aggregate
67 Original Slides by T. Connolly
Result table for Example 20
Subquery Rules
68
ORDER BY clause may not be used in asubquery (although it may be used in outermostSELECT).
Subquery SELECT list must consist of a singlecolumn name or expression, except forsubqueries that use EXISTS.
By default, column names refer to table name inFROM clause of subquery. Can refer to a table inFROM using an alias.
Original Slides by T. Connolly
Subquery Rules
69
When subquery is an operand in a comparison,subquery must appear on right-hand side.
A subquery may not be used as an operand in anexpression.
Original Slides by T. Connolly
Example 21 Nested subquery: use of IN
70
List properties handled by staff at ‘163 Main St’.
SELECT propertyNo, street, city, postcode, type, rooms, rentFROM PropertyForRentWHERE staffNo IN
(SELECT staffNoFROM StaffWHERE branchNo =
(SELECT branchNoFROM BranchWHERE street = ‘163 Main St’));
Original Slides by T. Connolly
Example 21 Nested subquery: use of IN
71 Original Slides by T. Connolly
Result table for Example 21
ANY and ALL
72
ANY and ALL may be used with subqueries thatproduce a single column of numbers.
With ALL, condition will only be true if it issatisfied by all values produced by subquery.
With ANY, condition will be true if it is satisfiedby any values produced by subquery.
If subquery is empty, ALL returns true, ANYreturns false.
SOME may be used in place of ANY.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 64
Example 22 Use of ANY/SOME
73
Find staff whose salary is larger than salary of atleast one member of staff at branch B003.
SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary > SOME
(SELECT salaryFROM StaffWHERE branchNo = ‘B003’);
Original Slides by T. Connolly
Example 22 Use of ANY/SOME
74
Inner query produces set {12000, 18000, 24000} andouter query selects those staff whose salaries aregreater than any of the values in this set.
Original Slides by T. Connolly
Result table for Example 22
Example 23 Use of ALL
75
Find staff whose salary is larger than salary ofevery member of staff at branch B003.
SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary > ALL
(SELECT salaryFROM StaffWHERE branchNo = ‘B003’);
Original Slides by T. Connolly
Example 23 Use of ALL
76 Original Slides by T. Connolly
Result table for Example 23
Multi-Table Queries
77
Can use subqueries provided result columnscome from same table.
If result columns come from more than one tablemust use a join.
To perform join, include more than one table inFROM clause.
Use comma as separator and typically includeWHERE clause to specify join column(s).
Original Slides by T. Connolly
Multi-Table Queries
78
Also possible to use an alias for a table named inFROM clause.
Alias is separated from table name with a space.
Alias can be used to qualify column names whenthere is ambiguity.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 65
Example 24 Simple Join
79
List names of all clients who have viewed aproperty along with any comment supplied.
SELECT c.clientNo, fName, lName,propertyNo, comment
FROM Client c,Viewing vWHERE c.clientNo = v.clientNo;
Original Slides by T. Connolly
Example 24 Simple Join
80
Only those rows from both tables that have identicalvalues in the clientNo columns (c.clientNo =v.clientNo) are included in result.
Equivalent to equi-join in relational algebra.
Original Slides by T. Connolly
Alternative JOIN Constructs
81
SQL provides alternative ways to specify joins:
FROM Client c JOIN Viewing v ON c.clientNo =v.clientNoFROM Client JOINViewing USING clientNoFROM Client NATURAL JOINViewing
In each case, FROM replaces original FROM andWHERE. However, first produces table with twoidentical clientNo columns.
Original Slides by T. Connolly
Example 25 Sorting a join
82
For each branch, list numbers and names ofstaff who manage properties, and propertiesthey manage.
SELECT s.branchNo, s.staffNo, fName, lName,propertyNo
FROM Staff s, PropertyForRent pWHERE s.staffNo = p.staffNoORDER BY s.branchNo, s.staffNo, propertyNo;
Original Slides by T. Connolly
Example 25 Sorting a join
Original Slides by T. Connolly83
Result table for Example 25
Example 26 Three Table Join
84
For each branch, list staff who manage properties,including city in which branch is located andproperties they manage.
SELECT b.branchNo, b.city, s.staffNo, fName, lName,propertyNo
FROM Branch b, Staff s, PropertyForRent pWHERE b.branchNo = s.branchNo AND
s.staffNo = p.staffNoORDER BY b.branchNo, s.staffNo, propertyNo;
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 66
Example 26 Three Table Join
85
Alternative formulation for FROM and WHERE:
FROM (Branch b JOIN Staff s USING branchNo) ASbs JOIN PropertyForRent p USING staffNo
Original Slides by T. Connolly
Example 27 Multiple Grouping Columns
86
Find number of properties handled by each staffmember.
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCountFROM Staff s, PropertyForRent pWHERE s.staffNo = p.staffNoGROUP BY s.branchNo, s.staffNoORDER BY s.branchNo, s.staffNo;
Original Slides by T. Connolly
Example 27 Multiple Grouping Columns
87 Original Slides by T. Connolly
Computing a Join
88
Procedure for generating results of a join are:
1. Form Cartesian product of the tables named inFROM clause.
2. If there is a WHERE clause, apply the search conditionto each row of the product table, retaining those rowsthat satisfy the condition.
3. For each remaining row, determine value of each itemin SELECT list to produce a single row in result table.
Original Slides by T. Connolly
Computing a Join
89
4. If DISTINCT has been specified, eliminate anyduplicate rows from the result table.
5. If there is an ORDER BY clause, sort result table asrequired.
SQL provides special format of SELECT for Cartesianproduct:
SELECT [DISTINCT | ALL] {* | columnList}FROMTable1 CROSS JOINTable2
Original Slides by T. Connolly
Outer Joins
90
If one row of a joined table is unmatched, row isomitted from result table.
Outer join operations retain rows that do not satisfythe join condition.
Consider following tables:
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 67
Outer Joins
91
The (inner) join of these two tables:
SELECT b.*, p.*FROM Branch1 b, PropertyForRent1 pWHERE b.bCity = p.pCity;
Original Slides by T. Connolly
Result table for inner join of Branch1 and PropertyForRent1 tables
Outer Joins
92
Result table has two rows where cities are same. There are no rows corresponding to branches in
Bristol and Aberdeen. To include unmatched rows in result table, use an
Outer join.
Original Slides by T. Connolly
Example 28 Left Outer Join
93
List branches and properties that are in samecity along with any unmatched branches.
SELECT b.*, p.*FROM Branch1 b LEFT JOIN
PropertyForRent1 p ON b.bCity = p.pCity;
Original Slides by T. Connolly
Example 28 Left Outer Join
94
Includes those rows of first (left) table unmatchedwith rows from second (right) table.
Columns from second table are filled with NULLs.
Original Slides by T. Connolly
Example 29 Right Outer Join
95
List branches and properties in same city and anyunmatched properties.
SELECT b.*, p.*FROM Branch1 b RIGHT JOIN
PropertyForRent1 p ON b.bCity = p.pCity;
Original Slides by T. Connolly
Example 29 Right Outer Join
96
Right Outer join includes those rows of second (right)table that are unmatched with rows from first (left)table.
Columns from first table are filled with NULLs.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 68
Example 30 Full Outer Join
97
List branches and properties in same city and anyunmatched branches or properties.
SELECT b.*, p.*FROM Branch1 b FULL JOIN
PropertyForRent1 p ON b.bCity = p.pCity;
Original Slides by T. Connolly
Example 30 Full Outer Join
98
Includes rows that are unmatched in both tables. Unmatched columns are filled with NULLs.
Original Slides by T. Connolly
EXISTS and NOT EXISTS
99
EXISTS and NOT EXISTS are for use only withsubqueries.
Produce a simple true/false result.
True if and only if there exists at least one row inresult table returned by subquery.
False if subquery returns an empty result table.
NOT EXISTS is the opposite of EXISTS.
Original Slides by T. Connolly
EXISTS and NOT EXISTS
100
As (NOT) EXISTS check only for existence ornon-existence of rows in subquery result table,subquery can contain any number of columns.
Common for subqueries following (NOT) EXISTSto be of form:
(SELECT * ...)
Original Slides by T. Connolly
Example 31 Query using EXISTS
101
Find all staff who work in a London branch.
SELECT staffNo, fName, lName, positionFROM Staff sWHERE EXISTS
(SELECT *FROM Branch bWHERE s.branchNo = b.branchNo AND
city = ‘London’);
Original Slides by T. Connolly
Example 31 Query using EXISTS
102 Original Slides by T. Connolly
Result table for Example 31
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 69
Example 31 Query using EXISTS
103
Note, search condition s.branchNo = b.branchNo isnecessary to consider correct branch record for eachmember of staff.
If omitted, would get all staff records listed outbecause subquery:SELECT * FROM Branch WHERE city=‘London’
would always be true and query would be:SELECT staffNo, fName, lName, position FROM StaffWHERE true;
Original Slides by T. Connolly
Example 31 Query using EXISTS
104
Could also write this query using join construct:
SELECT staffNo, fName, lName, positionFROM Staff s, Branch bWHERE s.branchNo = b.branchNo AND
city = ‘London’;
Original Slides by T. Connolly
Union, Intersect, and Difference (Except)
105
Can use normal set operations of Union, Intersection,and Difference to combine results of two or morequeries into a single result table.
Union of two tables, A and B, is table containing allrows in either A or B or both.
Intersection is table containing all rows common toboth A and B.
Difference is table containing all rows in A but not inB.
Two tables must be union compatible.
Original Slides by T. Connolly
Union, Intersect, and Difference (Except)
106
Format of set operator clause in each case is:
op [ALL] [CORRESPONDING [BY {column1 [, ...]}]]
If CORRESPONDING BY specified, set operation performed onthe named column(s).
If CORRESPONDING specified but not BY clause, operationperformed on common columns.
If ALL specified, result can include duplicate rows.
Original Slides by T. Connolly
Union, Intersect, and Difference (Except)
107 Original Slides by T. Connolly
Example 32 Use of UNION
108
List all cities where there is either a branch officeor a property.
(SELECT cityFROM BranchWHERE city IS NOT NULL) UNION(SELECT cityFROM PropertyForRentWHERE city IS NOT NULL);
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 70
Example 32 Use of UNION
109
Or
(SELECT *FROM BranchWHERE city IS NOT NULL)UNION CORRESPONDING BY city(SELECT *FROM PropertyForRentWHERE city IS NOT NULL);
Original Slides by T. Connolly
Example 32 Use of UNION
110
Produces result tables from both queries andmerges both tables together.
Original Slides by T. Connolly
Example 33 Use of INTERSECT
111
List all cities where there is both a branch officeand a property.
(SELECT city FROM Branch)INTERSECT(SELECT city FROM PropertyForRent);
Original Slides by T. Connolly
Example 33 Use of INTERSECT
112
Or
(SELECT * FROM Branch)INTERSECT CORRESPONDING BY city(SELECT * FROM PropertyForRent);
Original Slides by T. Connolly
Example 33 Use of INTERSECT
113
Could rewrite this query without INTERSECToperator:
SELECT b.cityFROM Branch b PropertyForRent pWHERE b.city = p.city;
Or:SELECT DISTINCT city FROM Branch bWHERE EXISTS
(SELECT * FROM PropertyForRent pWHERE p.city = b.city);
Original Slides by T. Connolly
Example 34 Use of EXCEPT
114
List of all cities where there is a branch officebut no properties.
(SELECT city FROM Branch)EXCEPT(SELECT city FROM PropertyForRent);
Or
(SELECT * FROM Branch)EXCEPT CORRESPONDING BY city(SELECT * FROM PropertyForRent);
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 71
Example 34 Use of EXCEPT
115
Could rewrite this query without EXCEPT:SELECT DISTINCT city FROM BranchWHERE city NOT IN
(SELECT city FROM PropertyForRent); Or
SELECT DISTINCT city FROM Branch bWHERE NOT EXISTS
(SELECT * FROM PropertyForRent pWHERE p.city = b.city);
Original Slides by T. Connolly
INSERT
116
INSERT INTOTableName [ (columnList) ]VALUES (dataValueList)
columnList is optional; if omitted, SQL assumes a list ofall columns in their original CREATETABLE order.
Any columns omitted must have been declared asNULL when table was created, unless DEFAULT wasspecified when creating column.
Original Slides by T. Connolly
INSERT
117
dataValueList must match columnList as follows: number of items in each list must be same; must be direct correspondence in position of items
in two lists; data type of each item in dataValueList must be
compatible with data type of corresponding column.
Original Slides by T. Connolly
Example 35 INSERT … VALUES
118
Insert a new row into Staff table supplying datafor all columns.
INSERT INTO StaffVALUES (‘SG16’, ‘Alan’, ‘Brown’, ‘Assistant’, ‘M’,
Date‘1957-05-25’, 8300,‘B003’);
Original Slides by T. Connolly
Example 36 INSERT using Defaults
119
Insert a new row into Staff table supplying data forall mandatory columns.
INSERT INTO Staff (staffNo, fName, lName,position, salary, branchNo)
VALUES (‘SG44’,‘Anne’,‘Jones’,‘Assistant’, 8100,‘B003’);
OrINSERT INTO StaffVALUES (‘SG44’,‘Anne’,‘Jones’,‘Assistant’, NULL,
NULL, 8100,‘B003’);
Original Slides by T. Connolly
INSERT … SELECT
120
Second form of INSERT allows multiple rows tobe copied from one or more tables to another:
INSERT INTOTableName [ (columnList) ]SELECT ...
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 72
Example 37 INSERT … SELECT
121
Assume there is a table StaffPropCount thatcontains names of staff and number of propertiesthey manage:
StaffPropCount(staffNo, fName, lName, propCnt)
Populate StaffPropCount using Staff andPropertyForRent tables.
Original Slides by T. Connolly
Example 37 INSERT … SELECT
122
INSERT INTO StaffPropCount(SELECT s.staffNo, fName, lName, COUNT(*)FROM Staff s, PropertyForRent pWHERE s.staffNo = p.staffNoGROUP BY s.staffNo, fName, lName)UNION(SELECT staffNo, fName, lName, 0FROM StaffWHERE staffNo NOT IN
(SELECT DISTINCT staffNoFROM PropertyForRent));
Original Slides by T. Connolly
Example 37 INSERT … SELECT
123
If second part of UNION is omitted, excludes those staff who currently do not manage any properties.
Original Slides by T. Connolly
UPDATE
124
UPDATETableNameSET columnName1 = dataValue1
[, columnName2 = dataValue2...][WHERE searchCondition]
TableName can be name of a base table or anupdatable view.
SET clause specifies names of one or more columnsthat are to be updated.
Original Slides by T. Connolly
UPDATE
125
WHERE clause is optional: if omitted, named columns are updated for all rows
in table; if specified, only those rows that satisfy
searchCondition are updated.
New dataValue(s) must be compatible withdata type for corresponding column.
Original Slides by T. Connolly
Example 38/39 UPDATE All Rows
126
Give all staff a 3% pay increase.
UPDATE StaffSET salary = salary*1.03;
Give all Managers a 5% pay increase.
UPDATE StaffSET salary = salary*1.05WHERE position = ‘Manager’;
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 73
Example 40 UPDATE Multiple Columns
127
Promote David Ford (staffNo=‘SG14’) toManager and change his salary to £18,000.
UPDATE StaffSET position = ‘Manager’, salary = 18000WHERE staffNo = ‘SG14’;
Original Slides by T. Connolly
DELETE
128
DELETE FROMTableName[WHERE searchCondition]
TableName can be name of a base table or anupdatable view.
searchCondition is optional; if omitted, all rows aredeleted from table. This does not delete table. Ifsearch_condition is specified, only those rows thatsatisfy condition are deleted.
Original Slides by T. Connolly
Example 41/42 DELETE Specific Rows
129
Delete all viewings that relate to property PG4.
DELETE FROMViewingWHERE propertyNo = ‘PG4’;
Delete all records from theViewing table.
DELETE FROMViewing;
Original Slides by T. Connolly
ISO SQL Data Types
130 Original Slides by T. Connolly
Integrity Enhancement Feature
131
Consider five types of integrity constraints:
required data domain constraints entity integrity referential integrity general constraints.
Original Slides by T. Connolly
Integrity Enhancement Feature
132
Required Dataposition VARCHAR(10) NOT NULL
Domain Constraints(a) CHECK
sex CHAR NOT NULLCHECK (sex IN (‘M’,‘F’))
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 74
Integrity Enhancement Feature
133
(b) CREATE DOMAIN
CREATE DOMAIN DomainName [AS] dataType[DEFAULT defaultOption][CHECK (searchCondition)]
For example:
CREATE DOMAIN SexType AS CHARCHECK (VALUE IN (‘M’,‘F’));
sex SexType NOT NULL
Original Slides by T. Connolly
Integrity Enhancement Feature
134
searchCondition can involve a table lookup:
CREATE DOMAIN BranchNo AS CHAR(4)CHECK (VALUE IN (SELECT branchNo
FROM Branch));
Domains can be removed using DROP DOMAIN:
DROP DOMAIN DomainName[RESTRICT | CASCADE]
Original Slides by T. Connolly
IEF - Entity Integrity
135
Primary key of a table must contain a unique, non-null value for each row.
ISO standard supports FOREIGN KEY clause inCREATE and ALTERTABLE statements:
PRIMARY KEY(staffNo)PRIMARY KEY(clientNo, propertyNo)
Can only have one PRIMARY KEY clause per table.Can still ensure uniqueness for alternate keys usingUNIQUE:
UNIQUE(telNo)
Original Slides by T. Connolly
IEF - Referential Integrity
136
FK is column or set of columns that links each row inchild table containing foreign FK to row of parent tablecontaining matching PK.
Referential integrity means that, if FK contains a value,that value must refer to existing row in parent table.
ISO standard supports definition of FKs withFOREIGN KEY clause in CREATE and ALTERTABLE:
FOREIGN KEY(branchNo) REFERENCES Branch
Original Slides by T. Connolly
IEF - Referential Integrity
137
Any INSERT/UPDATE attempting to create FK valuein child table without matching CK value in parent isrejected.
Action taken attempting to update/delete a CK valuein parent table with matching rows in child isdependent on referential action specified using ONUPDATE and ON DELETE subclauses:
CASCADE - SET NULL SET DEFAULT - NO ACTION
Original Slides by T. Connolly
IEF - Referential Integrity
138
CASCADE: Delete row from parent and deletematching rows in child, and so on in cascading manner.SET NULL: Delete row from parent and set FKcolumn(s) in child to NULL. Only valid if FK columns areNOT NULL.SET DEFAULT: Delete row from parent and set eachcomponent of FK in child to specified default. Only validif DEFAULT specified for FK columns.NO ACTION: Reject delete from parent. Default.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 75
IEF - Referential Integrity
139
FOREIGN KEY (staffNo) REFERENCES Staff ON DELETE SET NULL
FOREIGN KEY (ownerNo) REFERENCES Owner ON UPDATE CASCADE
Original Slides by T. Connolly
IEF - General Constraints
140
Could use CHECK/UNIQUE in CREATE and ALTERTABLE.
Similar to the CHECK clause, also have:
CREATE ASSERTION AssertionNameCHECK (searchCondition)
Original Slides by T. Connolly
IEF - General Constraints
141
CREATE ASSERTION StaffNotHandlingTooMuchCHECK (NOT EXISTS (SELECT staffNo
FROM PropertyForRentGROUP BY staffNoHAVING COUNT(*) > 100))
Original Slides by T. Connolly
Data Definition
142
SQL DDL allows database objects such as schemas,domains, tables, views, and indexes to be created anddestroyed.
Main SQL DDL statements are:CREATE SCHEMA DROP SCHEMACREATE/ALTER DOMAIN DROP DOMAINCREATE/ALTER TABLE DROP TABLECREATE VIEW DROP VIEW
Many DBMSs also provide:CREATE INDEX DROP INDEX
Original Slides by T. Connolly
Data Definition
143
Relations and other database objects exist in anenvironment.
Each environment contains one or more catalogs, andeach catalog consists of set of schemas.
Schema is named collection of related databaseobjects.
Objects in a schema can be tables, views, domains,assertions, collations, translations, and character sets.All have same owner.
Original Slides by T. Connolly
CREATE SCHEMA
144
CREATE SCHEMA [Name |AUTHORIZATION CreatorId ]
DROP SCHEMA Name [RESTRICT | CASCADE ]
With RESTRICT (default), schema must be empty oroperation fails.
With CASCADE, operation cascades to drop all objectsassociated with schema in order defined above. If any ofthese operations fail, DROP SCHEMA fails.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 76
CREATE TABLE
145
CREATETABLETableName{(colName dataType [NOT NULL] [UNIQUE][DEFAULT defaultOption][CHECK searchCondition] [,...]}[PRIMARY KEY (listOfColumns),]{[UNIQUE (listOfColumns),] […,]}{[FOREIGN KEY (listOfFKColumns)REFERENCES ParentTableName [(listOfCKColumns)],[ON UPDATE referentialAction][ON DELETE referentialAction ]] [,…]}
{[CHECK (searchCondition)] [,…] })
Original Slides by T. Connolly
CREATE TABLE
146
Creates a table with one or more columns of the specifieddataType.
With NOT NULL, system rejects any attempt to insert anull in the column.
Can specify a DEFAULT value for the column.
Primary keys should always be specified as NOT NULL.
FOREIGN KEY clause specifies FK along with thereferential action.
Original Slides by T. Connolly
Example 43 - CREATE TABLE
147
CREATE DOMAIN OwnerNumber ASVARCHAR(5)CHECK (VALUE IN (SELECT ownerNo FROM PrivateOwner));
CREATE DOMAIN StaffNumber ASVARCHAR(5)CHECK (VALUE IN (SELECT staffNo FROM Staff));
CREATE DOMAIN PNumber ASVARCHAR(5);
CREATE DOMAIN PRooms AS SMALLINT;CHECK(VALUE BETWEEN 1 AND 15);
CREATE DOMAIN PRent AS DECIMAL(6,2)CHECK(VALUE BETWEEN 0 AND 9999.99);
Original Slides by T. Connolly
Example 43 - CREATE TABLE
148
CREATETABLE PropertyForRent (propertyNo PNumber NOT NULL, ….rooms PRooms NOT NULL DEFAULT 4, rent PRent NOT NULL, DEFAULT 600, ownerNo OwnerNumber NOT NULL, staffNo StaffNumber
Constraint StaffNotHandlingTooMuch ….branchNo BranchNumber NOT NULL,PRIMARY KEY (propertyNo),FOREIGN KEY (staffNo) REFERENCES Staff ON DELETE SET NULL ON UPDATE CASCADE ….);
Original Slides by T. Connolly
ALTER TABLE
149
Add a new column to a table. Drop a column from a table. Add a new table constraint. Drop a table constraint. Set a default for a column. Drop a default for a column.
Original Slides by T. Connolly
Example 44(a) - ALTER TABLE
150
Change Staff table by removing default of ‘Assistant’for position column and setting default for sexcolumn to female (‘F’).
ALTERTABLE StaffALTER position DROP DEFAULT;
ALTERTABLE StaffALTER sex SET DEFAULT ‘F’;
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 77
Example 44(b) - ALTER TABLE
151
Remove constraint from PropertyForRent that staff are not allowed to handle more than 100 properties at a time. Add new column to Client table.
ALTERTABLE PropertyForRentDROP CONSTRAINT StaffNotHandlingTooMuch;
ALTERTABLE ClientADD prefNoRooms PRooms;
Original Slides by T. Connolly
DROP TABLE
152
DROPTABLETableName [RESTRICT | CASCADE]
e.g. DROPTABLE PropertyForRent;
Removes named table and all rows within it. With RESTRICT, if any other objects depend for their
existence on continued existence of this table, SQLdoes not allow request.
With CASCADE, SQL drops all dependent objects(and objects dependent on these objects).
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 78
Chapter 8
Fundamental Database and Information System
Chapter 8 - Objectives
2
Function and importance of transactions. Properties of transactions. Concurrency Control Recovery Control Distributed DBMS
Original Slides by T. Connolly
Chapter 8 - Objectives
3
Data Warehouse Business Intelligent OLAP Data Mining
Original Slides by T. Connolly
Transaction Support
4
Transaction Action, or series of actions, carried out by user or application,
which reads or updates contents of database.
Logical unit of work on the database. Application program is series of transactions with non-
database processing in between. Transforms database from one consistent state to
another, although consistency may be violated during transaction.
Original Slides by T. Connolly
Example Transaction
5 Original Slides by T. Connolly
Transaction Support
6
Can have one of two outcomes: Success - transaction commits and database reaches a new
consistent state. Failure - transaction aborts, and database must be restored to
consistent state before it started. Such a transaction is rolled back or undone.
Committed transaction cannot be aborted. Aborted transaction that is rolled back can be restarted
later.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 79
State Transition Diagram for Transaction
7 Original Slides by T. Connolly
Properties of Transactions
8
Four basic (ACID) properties of a transaction are:
Atomicity ‘All or nothing’ property. Consistency Must transform database from one
consistent state to another. Isolation Partial effects of incomplete transactions
should not be visible to other transactions. Durability Effects of a committed transaction are
permanent and must not be lost because of later failure.
Original Slides by T. Connolly
DBMS Transaction Subsystem
9 Original Slides by T. Connolly
Concurrency Control
10
Process of managing simultaneous operations on the database without having them interfere with one another.
Prevents interference when two or more users are accessing database simultaneously and at least one is updating data.
Although two transactions may be correct in themselves, interleaving of operations may produce an incorrect result.
Original Slides by T. Connolly
Concurrency Control Techniques
11
Two basic concurrency control techniques: Locking, Timestamping.
Both are conservative approaches: delay transactions in case they conflict with other transactions.
Optimistic methods assume conflict is rare and only check for conflicts at commit.
Original Slides by T. Connolly
Locking
12
Transaction uses locks to deny access to other transactions and so prevent incorrect updates.
Most widely used approach to ensure serializability. Generally, a transaction must claim a shared (read) or
exclusive (write) lock on a data item before read or write.
Lock prevents another transaction from modifying item or even reading it, in the case of a write lock.
Dead Lock might occurs.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 80
Timestamping
13
Transactions ordered globally so that older transactions, transactions with smaller timestamps, get priority in the event of conflict.
Conflict is resolved by rolling back and restarting transaction.
No locks so no deadlock.
Original Slides by T. Connolly
Database Recovery
14
Process of restoring database to a correct state in the event of a failure.
Need for Recovery Control Two types of storage: volatile (main memory) and nonvolatile. Volatile storage does not survive system crashes. Stable storage represents information that has been replicated
in several nonvolatile storage media with independent failure modes.
Original Slides by T. Connolly
Types of Failures
15
System crashes, resulting in loss of main memory. Media failures, resulting in loss of parts of secondary
storage. Application software errors. Natural physical disasters. Carelessness or unintentional destruction of data or
facilities. Sabotage.
Original Slides by T. Connolly
Transactions and Recovery
16
Transactions represent basic unit of recovery. Recovery manager responsible for atomicity and
durability. If failure occurs between commit and database buffers
being flushed to secondary storage then, to ensure durability, recovery manager has to redo (rollforward) transaction’s updates.
Original Slides by T. Connolly
Transactions and Recovery
17
If transaction had not committed at failure time, recovery manager has to undo (rollback) any effects of that transaction for atomicity.
Partial undo - only one transaction has to be undone. Global undo - all transactions have to be undone.
Original Slides by T. Connolly
Distributed Database Concepts
18
Distributed Database A logically interrelated collection of shared data (and a
description of this data), physically distributed over a computer network.
Distributed DBMS Software system that permits the management of the
distributed database and makes the distribution transparent to users.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 81
Concepts
19
Collection of logically-related shared data. Data split into fragments. Fragments may be replicated. Fragments/replicas allocated to sites. Sites linked by a communications network. Data at each site is under control of a DBMS. DBMSs handle local applications autonomously. Each DBMS participates in at least one global application.
Original Slides by T. Connolly
Distributed DBMS
20 Original Slides by T. Connolly
DistributedProcessing
A centralized database that can be accessed over a computer network.
21 Original Slides by T. Connolly
Advantages of DDBMSs
22
Reflects organizational structure Improved shareability and local autonomy Improved availability Improved reliability Improved performance Economics Modular growth
Original Slides by T. Connolly
Disadvantages of DDBMSs
23
Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex
Original Slides by T. Connolly
Data Warehousing Concepts
24
A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process (Inmon, 1993).
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 82
Benefits of Data Warehousing
25
Potential high returns on investment
Competitive advantage
Increased productivity of corporate decision-makers
Original Slides by T. Connolly
Comparison of OLTP Systems and Data Warehousing
Original Slides by T. Connolly26
Data Mart
27
A subset of a data warehouse that supports the requirements of a particular department or business function.
Characteristics include Focuses on only the requirements of one department or
business function. Do not normally contain detailed operational data unlike data
warehouses. More easily understood and navigated.
Original Slides by T. Connolly 28
Typical Data Warehouse and Data Mart Architecture
Original Slides by T. Connolly
Business Intelligence Technologies
29
Accompanying the growth in data warehousing is an ever-increasing demand by users for more powerful access tools that provide advanced analytical capabilities.
There are two main types of access tools available to meet this demand, namely Online Analytical Processing (OLAP) and data mining.
Original Slides by T. Connolly
Business Intelligence Technologies
30
OLAP and Data Mining differ in what they offer the user and because of this they are complementary technologies.
An environment that includes a data warehouse (or more commonly one or more data marts) together with tools such as OLAP and /or data mining are collectively referred to as Business Intelligence (BI) technologies.
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 83
Online Analytical Processing (OLAP)
31
The dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data, Codd (1993).
Describes a technology that uses a multi-dimensional view of aggregate data to provide quick access to strategic information for the purposes of advanced analysis.
Original Slides by T. Connolly
Online Analytical Processing (OLAP)
32
Enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a wide variety of possible views of the data.
Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise.
Original Slides by T. Connolly
Online Analytical Processing (OLAP)
33
Can easily answer ‘who?’ and ‘what?’ questions, however, ability to answer ‘what if?’ and ‘why?’ type questions distinguishes OLAP from general-purpose query tools.
Types of analysis ranges from basic navigation and browsing (slicing and dicing) to calculations, to more complex analyses such as time series and complex modeling.
Original Slides by T. Connolly
Examples of OLAP applications in various functional areas
Original Slides by T. Connolly34
Multi-dimensional Data as Three-field table versus Two-dimensional Matrix
35 Original Slides by T. Connolly
Multi-dimensional Data as Four-field Table versus Three-dimensional Cube
36 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 84
OLAP Benefits
37
Increased productivity of end-users. Reduced backlog of applications development for IT staff. Retention of organizational control over the integrity of
corporate data. Reduced query drag and network traffic on OLTP
systems or on the data warehouse. Improved potential revenue and profitability.
Original Slides by T. Connolly
Data Mining
38
The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions, (Simoudis,1996).
Involves the analysis of data and the use of software techniques for finding hidden and unexpected patterns and relationships in sets of data.
Original Slides by T. Connolly
Data Mining
39
Reveals information that is hidden and unexpected, as little value in finding patterns and relationships that are already intuitive.
Patterns and relationships are identified by examining the underlying rules and features in the data.
Original Slides by T. Connolly
Data Mining
40
Tends to work from the data up and most accurate results normally require large volumes of data to deliver reliable conclusions.
Starts by developing an optimal representation of structure of sample data, during which time knowledge is acquired and extended to larger sets of data.
Original Slides by T. Connolly
Data Mining
41
Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing.
Relatively new technology, however already used in a number of industries.
Original Slides by T. Connolly
Examples of Applications of Data Mining
42
Retail / Marketing Identifying buying patterns of customers Finding associations among customer demographic
characteristics Predicting response to mailing campaigns Market basket analysis
Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 85
Examples of Applications of Data Mining
43
Insurance Claims analysis Predicting which customers will buy new policies
Medicine Characterizing patient behavior to predict surgery visits Identifying successful medical therapies for different illnesses
Original Slides by T. Connolly
Data Mining and Data Warehousing
44
A data warehouse is well equipped for providing data for mining.
Data quality and consistency is a pre-requisite for mining to ensure the accuracy of the predictive models. Data warehouses are populated with clean, consistent data.
Original Slides by T. Connolly
Data Mining Operations and Associated Techniques
45 Original Slides by T. Connolly
Example of Classification using Tree Induction
46 Original Slides by T. Connolly
Example of Database Segmentation using a Scatterplot
47 Original Slides by T. Connolly
Example of Database Segmentation using a Visualization
48 Original Slides by T. Connolly
241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM
Dept of Computer Engineering, PSU 2/54 86