fivedots.coe.psu.ac.thfivedots.coe.psu.ac.th/~suthon/database/booklet.pdf · chapter 1 introduction...

Chapter 1

Introduction to Database system

Chapter 1 - Objectives

Some common uses of database systems. Characteristics of file-based systems. Problems with file-based approach. Meaning of the term database. Meaning of the term Database Management

System (DBMS).

2 Original Slides by T. Connolly


Typical functions of a DBMS. Major components of the DBMS environment. Personnel involved in the DBMS environment. History of the development of DBMSs. Advantages and disadvantages of DBMSs.



Purpose of three-level database architecture. Contents of external, conceptual, and internal levels. Purpose of external/conceptual and

conceptual/internal mappings. Meaning of logical and physical data independence. Distinction between DDL and DML. A classification of data models.



Purpose/importance of conceptual modeling. Typical functions and services a DBMS should

provide. Function and importance of system catalog. Software components of a DBMS. Function and uses of Transaction Processing

Monitors.


Examples of Database Applications

Purchases from the supermarket Purchases using your credit card Booking a holiday at the travel agents Using the local library Taking out insurance Renting a video Using the Internet Studying at university


241-212 INTRODUCTION TO DATABASE AND INFORMATION SYSTEM

Dept of Computer Engineering, PSU 2/54 1

File-Based Systems

Collection of application programs that perform services for the end users (e.g. reports).

Each program defines and manages its own data.


File-Based Processing


Limitations of File-Based Approach

Separation and isolation of data Each program maintains its own set of data. Users of one program may be unaware of

potentially useful data held by other programs.

Duplication of data Same data is held by different programs. Wasted space and potentially different values

and/or different formats for the same item.


Limitations of File-Based Approach

Data dependence File structure is defined in the program code.

Incompatible file formats Programs are written in different languages, and so

cannot easily access each other’s files.

Fixed Queries/Proliferation of application programs Programs are written to satisfy particular functions. Any new requirement needs a new program.


Database Approach

Arose because: Definition of data was embedded in application

programs, rather than being stored separately and independently.

No control over access and manipulation of data beyond that imposed by application programs.

Result: the database and Database Management System

(DBMS).


Database

Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization.

System catalog (metadata) provides description of data to enable program–data independence.

Logically related data comprises entities, attributes, and relationships of an organization’s information.




Database Management System (DBMS)

A software system that enables users to define, create, maintain, and control access to the database.

(Database) application program: a computer program that interacts with database by issuing an appropriate request (SQL statement) to the DBMS.


Database Management System (DBMS)


Database Approach

Data definition language (DDL). Permits specification of data types, structures and

any data constraints. All specifications are stored in the database.

Data manipulation language (DML). General enquiry facility (query language) of the data.


Database Approach

Controlled access to database may include: a security system an integrity system a concurrency control system a recovery control system a user-accessible catalog.


Views

Allows each user to have his or her own view of the database.

A view is essentially some subset of the database.


Views - Benefits

Reduce complexity Provide a level of security Provide a mechanism to customize the appearance

of the database Present a consistent, unchanging picture of the

structure of the database, even if the underlying database is changed




Components of DBMS Environment



Hardware Can range from a PC to a network of computers.

Software DBMS, operating system, network software (if

necessary) and also the application programs. Data Used by the organization and a description of this data

called the schema.



Procedures Instructions and rules that should be applied to

the design and use of the database and DBMS. People


Roles in the Database Environment

Data Administrator (DA) Database Administrator (DBA) Database Designers (Logical and Physical) Application Programmers End Users (naive and sophisticated)


History of Database Systems

First-generation Hierarchical and Network

Second generation Relational

Third generation Object-Relational Object-Oriented


Advantages of DBMSs

Control of data redundancy Data consistency More information from the same amount of data Sharing of data Improved data integrity Improved security Enforcement of standards Economy of scale




Advantages of DBMSs

Balance conflicting requirements Improved data accessibility and responsiveness Increased productivity Improved maintenance through data independence Increased concurrency Improved backup and recovery services


Disadvantages of DBMSs

Complexity Size Cost of DBMS Additional hardware costs Cost of conversion Performance Higher impact of a failure


Objectives of Three-Level Architecture

All users should be able to access same data.

A user’s view is immune to changes made in other views.

Users should not need to know physical database storage details.


Objectives of Three-Level Architecture

DBA should be able to change database storage structures without affecting the users’ views.

Internal structure of database should be unaffected by changes to physical aspects of storage.

DBA should be able to change conceptual structure of database without affecting all users.


ANSI-SPARC Three-Level Architecture



External Level Users’ view of the database. Describes that part of database that is relevant to a

particular user.

Conceptual Level Community view of the database. Describes what data is stored in database and

relationships among the data.





Internal Level Physical representation of the database on the

computer. Describes how the data is stored in the database.


Differences between Three Levels of ANSI-SPARC Architecture


Data Independence

Logical Data Independence Refers to immunity of external schemas to changes in

conceptual schema. Conceptual schema changes (e.g. addition/removal of

entities). Should not require changes to external schema or

rewrites of application programs.


Data Independence

Physical Data Independence Refers to immunity of conceptual schema to changes in

the internal schema. Internal schema changes (e.g. using different file

organizations, storage structures/devices). Should not require change to conceptual or external

schemas.


Data Independence and the ANSI-SPARC Three-Level Architecture


Database Languages

Data Definition Language (DDL) Allows the DBA or user to describe and name entities,

attributes, and relationships required for the application

plus any associated integrity and security constraints.




Database Languages

Data Manipulation Language (DML) Provides basic data manipulation operations on data

held in the database. Procedural DML allows user to tell system exactly how to manipulate

data. Non-Procedural DML allows user to state what data is needed rather than how

it is to be retrieved. Fourth Generation Languages (4GLs)


Data Model

Integrated collection of concepts for describing data, relationships between data, and constraints on the data in an organization.

Data Model comprises: a structural part; a manipulative part; possibly a set of integrity rules.


Data Model

Purpose To represent data in an understandable way.

Categories of data models include: Object-based Record-based Physical.


Data Models

Object-Based Data Models Entity-Relationship Semantic Functional Object-Oriented.

Record-Based Data Models Relational Data Model Network Data Model Hierarchical Data Model.

Physical Data Models


Relational Data Model


Network Data Model




Hierarchical Data Model


Conceptual Modeling

Conceptual schema is the core of a system supporting all user views.

Should be complete and accurate representation of an organization’s data requirements.

Conceptual modeling is process of developing a model of information use that is independent of implementation details.

Result is a conceptual data model.


Functions of a DBMS

Data Storage, Retrieval, and Update.

A User-Accessible Catalog.

Transaction Support.

Concurrency Control Services.

Recovery Services.


Functions of a DBMS

Authorization Services.

Support for Data Communication.

Integrity Services.

Services to Promote Data Independence.

Utility Services.




Chapter 2

Entity-Relationship Model

Chapter 2 - Objectives Entity Sets Relationship Sets Design Issues Mapping Constraints Keys E-R Diagram Extended E-R Features Design of an E-R Database Schema Reduction of an E-R Schema to Tables

2 Original Slides by Avi Silberschatz

Entity Sets A database can be modeled as: a collection of entities, relationship among entities.

An entity is an object that exists and is distinguishable from other objects. Example: specific person, company, event, plant

Entities have attributes Example: people have names and addresses

An entity set is a set of entities of the same type that share the same properties. Example: set of all persons, companies, trees, holidays


Entity Sets customer and loancustomer-id customer- customer- customer- loan- amount

name street city number


Attributes An entity is represented by a set of attributes, that is

descriptive properties possessed by all members of an entity set.

Domain – the set of permitted values for each attribute Attribute types: Simple and composite attributes. Single-valued and multi-valued attributes

E.g. multivalued attribute: phone-numbers Derived attributes

Can be computed from other attributes E.g. age, given date of birth

Example: customer = (customer-id, customer-name,

customer-street, customer-city)loan = (loan-number, amount)


Composite Attributes




Relationship Sets

A relationship is an association among several entities Example:

Hayes depositor A-102customer entity relationship set account entity

A relationship set is a mathematical relation among n 2 entities, each taken from entity sets

{(e1, e2, … en) | e1 E1, e2 E2, …, en En}

where (e1, e2, …, en) is a relationship Example:

(Hayes, A-102) depositor


Relationship Set borrower


Relationship Sets (Cont.)

9

An attribute can also be property of a relationship set. For instance, the depositor relationship set between entity sets

customer and account may have the attribute access-date

Original Slides by Avi Silberschatz

Degree of a Relationship Set Refers to number of entity sets that participate in a

relationship set. Relationship sets that involve two entity sets are binary (or

degree two). Generally, most relationship sets in a database system are binary.

Relationship sets may involve more than two entity sets.

Relationships between more than two entity sets are rare. Most relationships are binary. (More on this later.)

E.g. Suppose employees of a bank may have jobs (responsibilities) at multiple branches, with different jobs at different branches. Then there is a ternary relationship set between entity sets employee, job and branch


Mapping Cardinalities

11

Express the number of entities to which another entity can be associated via a relationship set.

Most useful in describing binary relationship sets. For a binary relationship set the mapping cardinality

must be one of the following types: One to one One to many Many to one Many to many



12

One to one One to manyNote: Some elements in A and B may not be mapped to any elements in the other set





Many to one Many to manyNote: Some elements in A and B may not be mapped to any elements in the other set


Mapping Cardinalities affect ER Design

14

Can make access-date an attribute of account, instead of a relationship attribute, if each account can have only one customer I.e., the relationship from account to customer is many to one, or

equivalently, customer to account is one to many


E-R Diagrams

15

Rectangles represent entity sets. Diamonds represent relationship sets. Lines link attributes to entity sets and entity sets to relationship sets. Ellipses represent attributes

Double ellipses represent multivalued attributes. Dashed ellipses denote derived attributes.

Underline indicates primary key attributes (will study later)


E-R Diagram With Composite, Multivalued, and Derived Attributes


Relationship Sets with Attributes


Roles

18

Entity sets of a relationship need not be distinct The labels “manager” and “worker” are called roles; they specify how

employee entities interact via the works-for relationship set. Roles are indicated in E-R diagrams by labeling the lines that connect

diamonds to rectangles. Role labels are optional, and are used to clarify semantics of the relationship




Cardinality Constraints

19

We express cardinality constraints by drawing either a directed line (), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set.

E.g.: One-to-one relationship: A customer is associated with at most one loan via the

relationship borrower A loan is associated with at most one customer via borrower


One-To-Many Relationship

20

In the one-to-many relationship a loan is associated with at most one customer via borrower, a customer is associated with several (including 0) loans via borrower


Many-To-One Relationships

In a many-to-one relationship a loan is associated with several (including 0) customers via borrower, a customer is associated with at most one loan via borrower


Many-To-Many Relationship

A customer is associated with several (possibly 0) loans via borrower

A loan is associated with several (possibly 0) customers via borrower


Participation of an Entity Set in a Relationship Set

23

Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set E.g. participation of loan in borrower is total every loan must have a customer associated to it via borrower

Partial participation: some entities may not participate in any relationship in the relationship set E.g. participation of customer in borrower is partial


Alternative Notation for Cardinality Limits

Cardinality limits can also express participation constraints




Keys

25

A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity.

A candidate key of an entity set is a minimal super key Customer-id is candidate key of customer account-number is candidate key of account

Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.


Keys for Relationship Sets The combination of primary keys of the participating

entity sets forms a super key of a relationship set. (customer-id, account-number) is the super key of depositor NOTE: this means a pair of entity sets can have at most one

relationship in a particular relationship set. E.g. if we wish to track all access-dates to each account by each

customer, we cannot assume a relationship for each access. We can use a multivalued attribute though

Must consider the mapping cardinality of the relationship set when deciding the what are the candidate keys

Need to consider semantics of relationship set in selecting the primary key in case of more than one candidate key


E-R Diagram with a Ternary Relationship


Cardinality Constraints on Ternary Relationship

We allow at most one arrow out of a ternary (or greater degree) relationship to indicate a cardinality constraint

E.g. an arrow from works-on to job indicates each employee works on at most one job at any branch.

If there is more than one arrow, there are two ways of defining the meaning. E.g a ternary relationship R between A, B and C with arrows to

B and C could mean 1. each A entity is associated with a unique entity from B and C

or 2. each pair of entities from (A, B) is associated with a unique

C entity, and each pair (A, C) is associated with a unique B Each alternative has been used in different formalisms To avoid confusion we outlaw more than one arrow


Binary Vs. Non-Binary Relationships

29

Some relationships that appear to be non-binary may be better represented using binary relationships E.g. A ternary relationship parents, relating a child to

his/her father and mother, is best replaced by two binary relationships, father and mother Using two binary relationships allows partial information (e.g.

only mother being know) But there are some relationships that are naturally non-

binary E.g. works-on


Converting Non-Binary Relationships to Binary Form

30

In general, any non-binary relationship can be represented using binary relationships by creating an artificial entity set. Replace R between entity sets A, B and C by an entity set E, and three relationship

sets: 1. RA, relating E and A 2.RB, relating E and B3. RC, relating E and C

Create a special identifying attribute for E Add any attributes of R to E For each relationship (ai , bi , ci) in R, create

1. a new entity ei in the entity set E 2. add (ei , ai ) to RA

3. add (ei , bi ) to RB 4. add (ei , ci ) to RC




Converting Non-Binary Relationships (Cont.)

31

Also need to translate constraints Translating all constraints may not be possible There may be instances in the translated schema that

cannot correspond to any instance of R Exercise: add constraints to the relationships RA, RB and RC to

ensure that a newly created entity corresponds to exactly one entity in each of entity sets A, B and C

We can avoid creating an identifying attribute by making E a weak entity set (described shortly) identified by the three relationship sets


Design Issues

32

Use of entity sets vs. attributesChoice mainly depends on the structure of the enterprise being modeled, and on the semantics associated with the attribute in question.

Use of entity sets vs. relationship setsPossible guideline is to designate a relationship set to describe an action that occurs between entities

Binary versus n-ary relationship setsAlthough it is possible to replace any nonbinary (n-ary, for n > 2) relationship set by a number of distinct binary relationship sets, a n-ary relationship set shows more clearly that several entities participate in a single relationship.

Placement of relationship attributes


Weak Entity Sets An entity set that does not have a primary key is referred

to as a weak entity set. The existence of a weak entity set depends on the

existence of a identifying entity set it must relate to the identifying entity set via a total, one-to-

many relationship set from the identifying to the weak entity set Identifying relationship depicted using a double diamond

The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set.

The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator.


Weak Entity Sets (Cont.)

34

We depict a weak entity set by double rectangles. We underline the discriminator of a weak entity set with

a dashed line. payment-number – discriminator of the payment entity

set Primary key for payment – (loan-number, payment-

number)


Weak Entity Sets (Cont.)

35

Note: the primary key of the strong entity set is not explicitly stored with the weak entity set, since it is implicit in the identifying relationship.

If loan-number were explicitly stored, payment could be made a strong entity, but then the relationship between payment and loan would be duplicated by an implicit relationship defined by the attribute loan-number common to payment and loan


More Weak Entity Set Examples

36

In a university, a course is a strong entity and a course-offering can be modeled as a weak entity

The discriminator of course-offering would be semester (including year) and section-number (if there is more than one section)

If we model course-offering as a strong entity we would model course-number as an attribute. Then the relationship with course would be implicit in the course-number attribute




Specialization

37

Top-down design process; we designate subgroupings within an entity set that are distinctive from other entities in the set.

These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set.

Depicted by a triangle component labeled ISA (E.g. customer “is a” person).

Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked.

Original Slides by Avi Silberschatz 38

Specialization Example


Generalization

39

A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set.

Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.

The terms specialization and generalization are used interchangeably.


Specialization and Generalization (Contd.)

40

Can have multiple specializations of an entity set based on different features.

E.g. permanent-employee vs. temporary-employee, in addition to officer vs. secretary vs. teller

Each particular employee would be a member of one of permanent-employee or temporary-

employee, and also a member of one of officer, secretary, or teller

The ISA relationship also referred to as superclass -subclass relationship


Design Constraints on a Specialization/Generalization

41

Constraint on which entities can be members of a given lower-level entity set. condition-defined

E.g. all customers over 65 years are members of senior-citizen entity set; senior-citizen ISA person.

user-defined Constraint on whether or not entities may belong to

more than one lower-level entity set within a single generalization. Disjoint

an entity can belong to only one lower-level entity set Noted in E-R diagram by writing disjoint next to the ISA triangle

Overlapping an entity can belong to more than one lower-level entity set


Design Constraints on a Specialization/Generalization (Contd.)

42

Completeness constraint -- specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization. total : an entity must belong to one of the lower-level

entity sets partial: an entity need not belong to one of the lower-level

entity sets




Aggregation

43

Consider the ternary relationship works-on, which we saw earlier

Suppose we want to record managers for tasks performed by an employee at a branch


Aggregation (Cont.)

44

Relationship sets works-on and manages represent overlapping information Every manages relationship corresponds to a works-on

relationship However, some works-on relationships may not correspond to any

manages relationships So we can’t discard the works-on relationship

Eliminate this redundancy via aggregation Treat relationship as an abstract entity Allows relationships between relationships Abstraction of relationship into new entity

Without introducing redundancy, the following diagram represents: An employee works on a particular job at a particular branch An employee, branch, job combination may have an associated

manager


E-R Diagram With Aggregation


E-R Design Decisions

46

The use of an attribute or entity set to represent an object.

Whether a real-world concept is best expressed by an entity set or a relationship set.

The use of a ternary relationship versus a pair of binary relationships.

The use of a strong or weak entity set. The use of specialization/generalization –

contributes to modularity in the design. The use of aggregation – can treat the aggregate

entity set as a single unit without concern for the details of its internal structure.


47

E-R Diagram for a Banking Enterprise


Summary of Symbols Used in E-R Notation




Summary of Symbols (Cont.)


Alternative E-R Notations


Reduction of an E-R Schema to Tables

51

Primary keys allow entity sets and relationship sets to be expressed uniformly as tables which represent the contents of the database.

A database which conforms to an E-R diagram can be represented by a collection of tables.

For each entity set and relationship set there is a unique table which is assigned the name of the corresponding entity set or relationship set.

Each table has a number of columns (generally corresponding to attributes), which have unique names.

Converting an E-R diagram to a table format is the basis for deriving a relational database design from an E-R diagram.


Representing Entity Sets as Tables

52

A strong entity set reduces to a table with the same attributes.


Composite and Multivalued Attributes

53

Composite attributes are flattened out by creating a separate attribute for each component attribute E.g. given entity set customer with composite attribute name

with component attributes first-name and last-name the table corresponding to the entity set has two attributes

name.first-name and name.last-name A multivalued attribute M of an entity E is represented by

a separate table EM Table EM has attributes corresponding to the primary key of E

and an attribute corresponding to multivalued attribute M E.g. Multivalued attribute dependent-names of employee is

represented by a tableemployee-dependent-names( employee-id, dname)

Each value of the multivalued attribute maps to a separate row of the table EM E.g., an employee entity with primary key John and

dependents Johnson and Johndotir maps to two rows: (John, Johnson) and (John, Johndotir)


Representing Weak Entity Sets

54

A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set




Representing Relationship Sets as Tables

55

A many-to-many relationship set is represented as a table with columns for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set.

E.g.: table for relationship set borrower


Redundancy of Tables

56

Many-to-one and one-to-many relationship sets that are total on the many-side can be represented by adding an extra attribute to the many side, containing the primary key of the one side

E.g.: Instead of creating a table for relationship account-branch, add an attribute branch to the entity set account


Redundancy of Tables (Cont.)

57

For one-to-one relationship sets, either side can be chosen to act as the “many” side That is, extra attribute can be added to either of the tables

corresponding to the two entity sets If participation is partial on the many side, replacing a

table by an extra attribute in the relation corresponding to the “many” side could result in null values

The table corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant. E.g. The payment table already contains the information that

would appear in the loan-payment table (i.e., the columns loan-number and payment-number).


Representing Specialization as Tables

58

Method 1: Form a table for the higher level entity Form a table for each lower level entity set, include primary key of

higher level entity set and local attributes

table table attributesperson name, street, city customer name, credit-ratingemployee name, salary

Drawback: getting information about, e.g., employee requires accessing two tables


Representing Specialization as Tables (Cont.)

59

Method 2: Form a table for each entity set with all local and inherited

attributestable table attributes

person name, street, citycustomer name, street, city, credit-ratingemployee name, street, city, salary

If specialization is total, table for generalized entity (person) not required to store information Can be defined as a “view” relation containing union of

specialization tables But explicit table may still be needed for foreign key constraints

Drawback: street and city may be stored redundantly for persons who are both customers and employees


Relations Corresponding to Aggregation

60

To represent aggregation, create a table containing primary key of the aggregated relationship, the primary key of the associated entity set Any descriptive attributes




Relations Corresponding to Aggregation (Cont.)

61

E.g. to represent aggregation manages between relationship works-on and entity set manager, create a tablemanages(employee-id, branch-name, title, manager-name)

Table works-on is redundant provided we are willing to store null values for attribute manager-name in table manages




Chapter 3

The Relational database design


2

Terminology of relational model.

How tables are used to represent data.

Connection between mathematical relations and relations in the relational model.

Properties of database relations.

How to identify CK, PK, and FKs.

Meaning of entity integrity and referential integrity.

Original Slides by T. Connolly


3

The purpose of normalization.

How normalization can be used when designing a relational database.

The potential problems associated with redundant data in base relations.

The concept of functional dependency, which describes the relationship between attributes.

The characteristics of functional dependencies used in normalization.



4

How to identify functional dependencies for a given

relation.

How functional dependencies identify the primary key

for a relation.

How to undertake the process of normalization.

How normalization uses functional dependencies to

group attributes into relations that are in a known

normal form.



5

How to identify the most commonly used normal

forms, namely First Normal Form (1NF), Second

Normal Form (2NF), and Third Normal Form (3NF).

The problems associated with relations that break the

rules of 1NF, 2NF, or 3NF.

How to represent attributes shown on a form as 3NF

relations using normalization.


Relational Model Terminology

6

A relation is a table with columns and rows. Only applies to logical structure of the

database, not the physical structure.

Attribute is a named column of a relation.

Domain is the set of allowable values for one or more attributes.




Relational Model Terminology

7

Tuple is a row of a relation.

Degree is the number of attributes in a relation.

Cardinality is the number of tuples in a relation.

Relational Database is a collection of normalized relations with distinct relation names.


Instances of Branch and Staff Relations


Examples of Attribute Domains


Alternative Terminology for Relational Model


Mathematical Definition of Relation

11

Consider two sets, D1 & D2, where D1 = {2, 4} and D2 = {1, 3, 5}.

Cartesian product, D1 D2, is set of all ordered pairs, where first element is member of D1 and second element is member of D2.

D1 D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)}

Alternative way is to find all combinations of elements with first from D1 and second from D2.



12

Any subset of Cartesian product is a relation; e.g.R = {(2, 1), (4, 1)}

May specify which pairs are in relation using some condition for selection; e.g. second element is 1:

R = {(x, y) | x D1, y D2, and y = 1} first element is always twice the second:

S = {(x, y) | x D1, y D2, and x = 2y}





13

Consider three sets D1, D2, D3 with Cartesian Product D1 D2 D3; e.g.

D1 = {1, 3} D2 = {2, 4} D3 = {5, 6}D1 D2 D3 = {(1,2,5), (1,2,6), (1,4,5), (1,4,6), (3,2,5), (3,2,6), (3,4,5), (3,4,6)}

Any subset of these ordered triples is a relation.



14

Cartesian product of n sets (D1, D2, . . ., Dn) is:

D1 D2 . . . Dn = {(d1, d2, . . . , dn) | d1 D1, d2 D2, . . . , dnDn}

usually written as: n

XDii = 1

Any set of n-tuples from this Cartesian product is a relation on the n sets.


Database Relations

15

Relation schema Named relation defined by a set of attribute and

domain name pairs.

Relational database schema Set of relation schemas, each with a distinct name.


Properties of Relations

16

Relation name is distinct from all other relation names in relational schema.

Each cell of relation contains exactly one atomic (single) value.

Each attribute has a distinct name.

Values of an attribute are all from the same domain.


Properties of Relations

17

Each tuple is distinct; there are no duplicate tuples.

Order of attributes has no significance.

Order of tuples has no significance, theoretically.


Relational Keys

18

Superkey An attribute, or set of attributes, that uniquely

identifies a tuple within a relation.

Candidate Key Superkey (K) such that no proper subset is a

superkey within the relation. In each tuple of R, values of K uniquely identify that

tuple (uniqueness). No proper subset of K has the uniqueness property

(irreducibility).




Relational Keys

19

Primary Key Candidate key selected to identify tuples

uniquely within relation.

Alternate Keys Candidate keys that are not selected to be

primary key.

Foreign Key Attribute, or set of attributes, within one relation

that matches candidate key of some (possibly same) relation.


Integrity Constraints

20

Null Represents value for an attribute that is

currently unknown or not applicable for tuple.

Deals with incomplete or exceptional data. Represents the absence of a value and is not

the same as zero or spaces, which are values.



21

Entity Integrity In a base relation, no attribute of a primary key can

be null.

Referential Integrity If foreign key exists in a relation, either foreign key

value must match a candidate key value of some tuple in its home relation or foreign key value must be wholly null.



22

General Constraints Additional rules specified by users or database

administrators that define or constrain some aspect of the enterprise.


Purpose of Normalization

23

Normalization is a technique for producing a set of suitable relations that support the data requirements of an enterprise.



24

Characteristics of a suitable set of relations include: the minimal number of attributes necessary to

support the data requirements of the enterprise; attributes with a close logical relationship are found in

the same relation; minimal redundancy with each attribute represented

only once with the important exception of attributes that form all or part of foreign keys.





25

The benefits of using a database that has a suitable set of relations is that the database will be: easier for the user to access and maintain the data; take up minimal storage space on the computer.


How Normalization Supports Database Design

26

How normalization can be used to support database design.Original Slides by T. Connolly

Data Redundancy and Update Anomalies

27

Major aim of relational database design is to group attributes into relations to minimize data redundancy.



28

Potential benefits for implemented database include: Updates to the data stored in the database are

achieved with a minimal number of operations thus reducing the opportunities for data inconsistencies.

Reduction in the file storage space required by the base relations thus minimizing costs.



29

Problems associated with data redundancy are illustrated by comparing the Staff and Branch relations with the StaffBranch relation.







31

StaffBranch relation has redundant data; the details of a branch are repeated for every member of staff.

In contrast, the branch information appears only once for each branch in the Branch relation and only the branch number (branchNo) is repeated in the Staff relation, to represent where each member of staff is located.



32

Relations that contain redundant information may potentially suffer from update anomalies.

Types of update anomalies include Insertion Deletion Modification


Lossless-join and Dependency Preservation Properties

33

Two important properties of decomposition. Lossless-join property enables us to find any instance of

the original relation from corresponding instances in the smaller relations.

Dependency preservation property enables us to enforce a constraint on the original relation by enforcing some constraint on each of the smaller relations.


Functional Dependencies

34

Important concept associated with normalization.

Functional dependency describes relationship between attributes.

For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A B), if each value of A in R is associated with exactly one value of B in R.


Characteristics of Functional Dependencies

35

Property of the meaning or semantics of the attributes in a relation.

Diagrammatic representation.

The determinant of a functional dependency refers to the attribute or group of attributes on the left-hand side of the arrow.


An Example Functional Dependency




Example Functional Dependency that holds for all Time

37

Consider the values shown in staffNo and sName attributes of the Staff relation (see Slide 12).

Based on sample data, the following functional dependencies appear to hold.

staffNo → sNamesName → staffNo


Example Functional Dependency that holds for all Time

38

However, the only functional dependency that remains true for all possible values for the staffNo and sName attributes of the Staff relation is:

staffNo → sName



39

Determinants should have the minimal number of attributes necessary to maintain the functional dependency with the attribute(s) on the right hand-side.

This requirement is called full functional dependency.



40

Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A.


Example Full Functional Dependency

41

Exists in the Staff relation (see Slide 12).

staffNo, sName → branchNo

True - each value of (staffNo, sName) is associated with a single value of branchNo.

However, branchNo is also functionally dependent on a subset of (staffNo, sName), namely staffNo. Example above is a partial dependency.



42

Main characteristics of functional dependencies used in normalization: There is a one-to-one relationship between the

attribute(s) on the left-hand side (determinant) and those on the right-hand side of a functional dependency.

Holds for all time. The determinant has the minimal number of attributes

necessary to maintain the dependency with the attribute(s) on the right hand-side.




Transitive Dependencies

43

Important to recognize a transitive dependency because its existence in a relation can potentially cause update anomalies.

Transitive dependency describes a condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).


Example Transitive Dependency

44

Consider functional dependencies in the StaffBranch relation (see Slide 12).

staffNo → sName, position, salary, branchNo, bAddress

branchNo → bAddress

Transitive dependency, branchNo → bAddress exists on staffNo via branchNo.


The Process of Normalization

45

Formal technique for analyzing a relation based on its primary key and the functional dependencies between the attributes of that relation.

Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.


Identifying Functional Dependencies

46

Identifying all functional dependencies between a set of attributes is relatively simple if the meaning of each attribute and the relationships between the attributes are well understood.

This information should be provided by the enterprise in the form of discussions with users and/or documentation such as the users’ requirements specification.


Identifying Functional Dependencies

47

However, if the users are unavailable for consultation and/or the documentation is incomplete then depending on the database application it may be necessary for the database designer to use their common sense and/or experience to provide the missing information.


Example - Identifying a set of functional dependencies for the StaffBranch relation

48

Examine semantics of attributes in StaffBranch relation (see Slide 12). Assume that position held and branch determine a member of staff ’s salary.




Example - Identifying a set of functional dependencies for the StaffBranch relation

49

With sufficient information available, identify the functional dependencies for the StaffBranch relation as:

staffNo → sName, position, salary, branchNo, bAddressbranchNo → bAddressbAddress → branchNobranchNo, position → salarybAddress, position → salary


Example - Using sample data to identify functional dependencies.

50

Consider the data for attributes denoted A, B, C, D, and E in the Sample relation (see Slide 33).

Important to establish that sample data values shown in relation are representative of all possible values that can be held by attributes A, B, C, D, and E. Assume true despite the relatively small amount of data shown in this relation.





52

Function dependencies between attributes A to E in the Sample relation.

A C (fd1)C A (fd2)B D (fd3)A, B E (fd4)


Identifying the Primary Key for a Relation using Functional Dependencies

53

Main purpose of identifying a set of functional dependencies for a relation is to specify the set of integrity constraints that must hold on a relation.

An important integrity constraint to consider first is the identification of candidate keys, one of which is selected to be the primary key for the relation.


Example - Identify Primary Key for StaffBranch Relation

54

StaffBranch relation has five functional dependencies (see Slide 31).

The determinants are staffNo, branchNo, bAddress, (branchNo, position), and (bAddress, position).

To identify all candidate key(s), identify the attribute (or group of attributes) that uniquely identifies each tuple in this relation.




Example - Identifying Primary Key for StaffBranch Relation

55

All attributes that are not part of a candidate key should be functionally dependent on the key.

The only candidate key and therefore primary key for StaffBranch relation, is staffNo, as all other attributes of the relation are functionally dependent on staffNo.


Example - Identifying Primary Key for Sample Relation

56

Sample relation has four functional dependencies (see Slide 31).

The determinants in the Sample relation are A, B, C, and (A, B). However, the only determinant that functionally determines all the other attributes of the relation is (A, B).

(A, B) is identified as the primary key for this relation.



57

As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.





59

Unnormalized Form (UNF)

60

A table that contains one or more repeating groups.

To create an unnormalized table Transform the data from the information source (e.g.

form) into table format with columns and rows.




First Normal Form (1NF)

61

A relation in which the intersection of each row and column contains one and only one value.


UNF to 1NF

62

Nominate an attribute or group of attributes to act as the key for the unnormalized table.

Identify the repeating group(s) in the unnormalized table which repeats for the key attribute(s).


UNF to 1NF

63

Remove the repeating group by Entering appropriate data into the empty columns of

rows containing the repeating data (‘flattening’ the table).

Or by Placing the repeating data along with a copy of the

original key attribute(s) into a separate relation.


Second Normal Form (2NF)

64

Based on the concept of full functional dependency.

Full functional dependency indicates that if A and B are attributes of a relation, B is fully dependent on A if B is functionally dependent

on A but not on any proper subset of A.


Second Normal Form (2NF)

65

A relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key.


1NF to 2NF

66

Identify the primary key for the 1NF relation.

Identify the functional dependencies in the relation.

If partial dependencies exist on the primary key remove them by placing then in a new relation along with a copy of their determinant.




Third Normal Form (3NF)

67

Based on the concept of transitive dependency.

Transitive Dependency is a condition where A, B and C are attributes of a relation such that if A

B and B C, then C is transitively dependent on A through B.

(Provided that A is not functionally dependent on B or C).


Third Normal Form (3NF)

68

A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key.


2NF to 3NF

69

Identify the primary key in the 2NF relation.

Identify functional dependencies in the relation.

If transitive dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their dominant.


General Definitions of 2NF and 3NF

70

Second normal form (2NF) A relation that is in first normal form and every non-

primary-key attribute is fully functionally dependent on any candidate key.

Third normal form (3NF) A relation that is in first and second normal form and

in which no non-primary-key attribute is transitively dependent on any candidate key.


Example of Normalization


Unormalized table

72

More than one valuePrimary key




Repeating group

73

(propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName)

Original Slides by T. Connolly 74

clientRental ( clientNo, propertyNo, cName, pAddress, rentStart, rentFinish, rent, ownerNo, oName)

First normal form (1NF)


Alternative First normal form (1NF) (optional)

75

clientRental( clientNo, cName)PropertyRentalOwner(clientNo, propertyNo,, pAddress,

rentStart, rentFinish, rent, ownerNo, oName)Original Slides by T. Connolly

Functional dependencies of the ClientRental relation


Second normal form (2NF)

77

Client (clientNo, cName)

Rental (clientNo, propertyNo, rentStart, rentFinish)PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)


Functional dependencies for the Client, Rental, and PropertyOwner relations




Third normal form (3NF)

79

Client (clientNo, cName)

Rental (clientNo, propertyNo, rentStart, rentFinish)

PropertyOwner (propertyNo, pAddress, rent, ownerNo)Owner (ownerNo, oName)


The decomposition of the ClientRental 1NF relation into 3NF relations (optional)




Chapter 4

Introduction to Database system

Chapter 4 - Objectives Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files Data-Dictionary Storage Storage Structures for Object-Oriented Databases

Original Slides by Avi Silberschatz2

Classification of Physical Storage Media Speed with which data can be accessed Cost per unit of data Reliability data loss on power failure or system crash physical failure of the storage device

Can differentiate storage into: volatile storage: loses contents when power is switched off non-volatile storage:

Contents persist even when power is switched off. Includes secondary and tertiary storage, as well as batter-backed up

main-memory.


Physical Storage Media Cache – fastest and most costly form of storage; volatile;

managed by the computer system hardware. Main memory:

fast access (10s to 100s of nanoseconds; 1 nanosecond = 10–9 seconds) generally too small (or too expensive) to store the entire database

capacities of up to a few Gigabytes widely used currently Capacities have gone up and per-byte costs have decreased steadily and

rapidly (roughly factor of 2 every 2 to 3 years)

Volatile — contents of main memory are usually lost if a power failure or system crash occurs.


Physical Storage Media (Cont.) Flash memory

Data survives power failure Data can be written at a location only once, but location can be erased

and written to again Can support only a limited number of write/erase cycles. Erasing of memory has to be done to an entire bank of memory

Reads are roughly as fast as main memory But writes are slow (few microseconds), erase is slower Cost per unit of storage roughly similar to main memory Widely used in embedded devices such as digital cameras also known as EEPROM (Electrically Erasable Programmable Read-Only

Memory)


Physical Storage Media (Cont.) Magnetic-disk

Data is stored on spinning disk, and read/written magnetically Primary medium for the long-term storage of data; typically stores entire

database. Data must be moved from disk to main memory for access, and written

back for storage Much slower access than main memory (more on this later)

direct-access – possible to read data on disk in any order, unlike magnetic tape

Hard disks vs floppy disks Capacities range up to roughly 100 GB currently

Much larger capacity and cost/byte than main memory/flash memory Growing constantly and rapidly with technology improvements (factor of 2 to

3 every 2 years) Survives power failures and system crashes

disk failure can destroy data, but is very rare




Physical Storage Media (Cont.) Optical storage

non-volatile, data is read optically from a spinning disk using a laser CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms Write-one, read-many (WORM) optical disks used for archival storage

(CD-R and DVD-R) Multiple write versions also available (CD-RW, DVD-RW, and DVD-

RAM) Reads and writes are slower than with magnetic disk Juke-box systems, with large numbers of removable disks, a few drives,

and a mechanism for automatic loading/unloading of disks available for storing large volumes of data


Physical Storage Media (Cont.) Tape storage

non-volatile, used primarily for backup (to recover from disk failure), and for archival data

sequential-access – much slower than disk very high capacity (40 to 300 GB tapes available) tape can be removed from drive storage costs much cheaper than

disk, but drives are expensive Tape jukeboxes available for storing massive amounts of data

hundreds of terabytes (1 terabyte = 109 bytes) to even a petabyte (1 petabyte = 1012 bytes)


Storage Hierarchy


Storage Hierarchy (Cont.) primary storage: Fastest media but volatile (cache, main

memory). secondary storage: next level in hierarchy, non-volatile,

moderately fast access time also called on-line storage E.g. flash memory, magnetic disks

tertiary storage: lowest level in hierarchy, non-volatile, slow access time also called off-line storage E.g. magnetic tape, optical storage


Magnetic Hard Disk Mechanism

NOTE: Diagram is schematic, and simplifies the structure of actual disk drives


Magnetic Disks (optional) Read-write head

Positioned very close to the platter surface (almost touching it) Reads or writes magnetically encoded information.

Surface of platter divided into circular tracks Over 16,000 tracks per platter on typical hard disks

Each track is divided into sectors. A sector is the smallest unit of data that can be read or written. Sector size typically 512 bytes Typical sectors per track: 200 (on inner tracks) to 400 (on outer tracks)

To read/write a sector disk arm swings to position head on right track platter spins continually; data is read/written as sector passes under head

Head-disk assemblies multiple disk platters on a single spindle (typically 2 to 4) one head per platter, mounted on a common arm.

Cylinder i consists of ith track of all the platters




Magnetic Disks (Cont.) (optional) Earlier generation disks were susceptible to head-crashes

Surface of earlier generation disks had metal-oxide coatings which would disintegrate on head crash and damage all data on disk

Current generation disks are less susceptible to such disastrous failures, although individual sectors may get corrupted

Disk controller – interfaces between the computer system and the disk drive hardware. accepts high-level commands to read or write a sector initiates actions such as moving the disk arm to the right track and actually

reading or writing the data Computes and attaches checksums to each sector to verify that data is read

back correctly If data is corrupted, with very high probability stored checksum won’t match

recomputed checksum Ensures successful writing by reading back sector after writing it Performs remapping of bad sectors


Disk Subsystem Multiple disks connected to a computer system through a controller

Controllers functionality (checksum, bad sector remapping) often carried out by individual disks; reduces load on controller

Disk interface standards families ATA (AT adaptor) range of standards SCSI (Small Computer System Interconnect) range of standards Several variants of each standard (different speeds and capabilities)


Performance Measures of Disks Access time – the time it takes from when a read or write request is issued to

when data transfer begins. Consists of: Seek time – time it takes to reposition the arm over the correct track.

Average seek time is 1/2 the worst case seek time. Would be 1/3 if all tracks had the same number of sectors, and we ignore the time

to start and stop arm movement 4 to 10 milliseconds on typical disks

Rotational latency – time it takes for the sector to be accessed to appear under the head. Average latency is 1/2 of the worst case latency. 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)

Data-transfer rate – the rate at which data can be retrieved from or stored to the disk. 4 to 8 MB per second is typical Multiple disks may share a controller, so rate that controller can handle is also important

E.g. ATA-5: 66 MB/second, SCSI-3: 40 MB/s Fiber Channel: 256 MB/s


Performance Measures (Cont.) Mean time to failure (MTTF) – the average time the disk

is expected to run continuously without any failure. Typically 3 to 5 years Probability of failure of new disks is quite low, corresponding to a

“theoretical MTTF” of 30,000 to 1,200,000 hours for a new disk E.g., an MTTF of 1,200,000 hours for a new disk means that given 1000

relatively new disks, on an average one will fail every 1200 hours

MTTF decreases as disk ages


RAID RAID: Redundant Arrays of Independent Disks

disk organization techniques that manage a large numbers of disks, providing a view of a single disk of

high capacity and high speed by using multiple disks in parallel, and

high reliability by storing data redundantly, so that data can be recovered even if a disk fails

The chance that some disk out of a set of N disks will fail is much higher than the chance that a specific single disk will fail. E.g., a system with 100 disks, each with MTTF of 100,000 hours (approx. 11 years), will

have a system MTTF of 1000 hours (approx. 41 days)

Techniques for using redundancy to avoid data loss are critical with large numbers of disks

Originally a cost-effective alternative to large, expensive disks I in RAID originally stood for ``inexpensive’’

Today RAIDs are used for their higher reliability and bandwidth.

The “I” is interpreted as independent


Improvement of Reliability via Redundancy Redundancy – store extra information that can be used to rebuild

information lost in a disk failure E.g., Mirroring (or shadowing)

Duplicate every disk. Logical disk consists of two physical disks. Every write is carried out on both disks

Reads can take place from either disk If one disk in a pair fails, data still available in the other

Data loss would occur only if a disk fails, and its mirror disk also fails before the system is repaired Probability of combined event is very small

Except for dependent failure modes such as fire or building collapse or electrical power surges

Mean time to data loss depends on mean time to failure, and mean time to repair E.g. MTTF of 100,000 hours, mean time to repair of 10 hours gives mean time to

data loss of 500*106 hours (or 57,000 years) for a mirrored pair of disks (ignoring dependent failure modes)




Improvement in Performance via Parallelism Two main goals of parallelism in a disk system:

1. Load balance multiple small accesses to increase throughput2. Parallelize large accesses to reduce response time.

Improve transfer rate by striping data across multiple disks. Bit-level striping – split the bits of each byte across multiple disks

In an array of eight disks, write bit i of each byte to disk i. Each access can read data at eight times the rate of a single disk. But seek/access time worse than for a single disk

Bit level striping is not used much any more

Block-level striping – with n disks, block i of a file goes to disk (i mod n) + 1 Requests for different blocks can run in parallel if the blocks reside on different

disks A request for a long sequence of blocks can utilize all disks in parallel


RAID Levels

Schemes to provide redundancy at lower cost by using disk striping combined with parity bits Different RAID organizations, or RAID levels, have differing cost, performance and

reliability characteristics

RAID Level 1: Mirrored disks with block striping

Offers best write performance. Popular for applications such as storing log files in a database system.

RAID Level 0: Block striping; non-redundant.

Used in high-performance applications where data lost is not critical.


RAID Levels (Cont.)

RAID Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit striping.

RAID Level 3: Bit-Interleaved Parity a single parity bit is enough for error correction, not just detection, since we

know which disk has failed When writing data, corresponding parity bits must also be computed and written to a

parity bit disk

To recover data in a damaged disk, compute XOR of bits from other disks (including parity bit disk)


RAID Levels (Cont.) RAID Level 3 (Cont.)

Faster data transfer than with a single disk, but fewer I/Os per second since every disk has to participate in every I/O.

Subsumes Level 2 (provides all its benefits, at lower cost).

RAID Level 4: Block-Interleaved Parity; uses block-level striping, and keeps a parity block on a separate disk for corresponding blocks from Nother disks. When writing data block, corresponding block of parity bits must also be

computed and written to parity disk To find value of a damaged block, compute XOR of bits from corresponding

blocks (including parity block) from other disks.



Provides higher I/O rates for independent block reads than Level 3 block read goes to a single disk, so blocks stored on different disks can be

read in parallel

Provides high transfer rates for reads of multiple blocks than no-striping Before writing a block, parity data must be computed

Can be done by using old parity block, old value of current block and new value of current block (2 block reads + 2 block writes)

Or by recomputing the parity value using the new values of blocks corresponding to the parity block More efficient for writing large amounts of data sequentially

Parity block becomes a bottleneck for independent block writes since every block write also writes to parity disk


RAID Levels (Cont.) RAID Level 5: Block-Interleaved Distributed Parity; partitions data and

parity among all N + 1 disks, rather than storing data in N disks and parity in 1 disk. E.g., with 5 disks, parity block for nth set of blocks is stored on disk (n mod 5) +

1, with the data blocks stored on the other 4 disks.





Higher I/O rates than Level 4. Block writes occur in parallel if the blocks and their parity blocks are on different disks.

Subsumes Level 4: provides same benefits, but avoids bottleneck of parity disk.

RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores extra redundant information to guard against multiple disk failures. Better reliability than Level 5 at a higher cost; not used as widely.

Original Slides by Avi Silberschatz25 Original Slides by Avi Silberschatz26

Choice of RAID Level (optional) Factors in choosing RAID level

Monetary cost Performance: Number of I/O operations per second, and bandwidth during normal

operation Performance during failure Performance during rebuild of failed disk

Including time taken to rebuild failed disk

RAID 0 is used only when data safety is not important E.g. data can be recovered quickly from other sources

Level 2 and 4 never used since they are subsumed by 3 and 5 Level 3 is not used anymore since bit-striping forces single block reads to access all

disks, wasting disk arm movement, which block striping (level 5) avoids Level 6 is rarely used since levels 1 and 5 offer adequate safety for almost all

applications So competition is between 1 and 5 only


Choice of RAID Level (Cont.) (optional) Level 1 provides much better write performance than level 5

Level 5 requires at least 2 block reads and 2 block writes to write a single block, whereas Level 1 only requires 2 block writes

Level 1 preferred for high update environments such as log disks

Level 1 had higher storage cost than level 5 disk drive capacities increasing rapidly (50%/year) whereas disk access times have

decreased much less (x 3 in 10 years) I/O requirements have increased greatly, e.g. for Web servers When enough disks have been bought to satisfy required rate of I/O, they often

have spare storage capacity so there is often no extra monetary cost for Level 1!

Level 5 is preferred for applications with low update rate,and large amounts of data

Level 1 is preferred for all other applications


Hardware Issues Software RAID: RAID implementations done entirely in

software, with no special hardware support Hardware RAID: RAID implementations with special

hardware Use non-volatile RAM to record writes that are being executed Beware: power failure during write can result in corrupted disk

E.g. failure after writing one block but before writing the second in a mirrored system

Such corrupted data must be detected when power is restored Recovery from corruption is similar to recovery from failed disk NV-RAM helps to efficiently detected potentially corrupted blocks

Otherwise all blocks of disk must be read and compared with mirror/parity block


Hardware Issues (Cont.) (optional) Hot swapping: replacement of disk while system is running, without

power down Supported by some hardware RAID systems, reduces time to recovery, and improves availability greatly

Many systems maintain spare disks which are kept online, and used as replacements for failed disks immediately on detection of failure Reduces time to recovery greatly

Many hardware RAID systems ensure that a single point of failure will not stop the functioning of the system by using Redundant power supplies with battery backup Multiple controllers and multiple interconnections to guard against

controller/interconnection failures




Optical Disks Compact disk-read only memory (CD-ROM)

Disks can be loaded into or removed from a drive High storage capacity (640 MB per disk) High seek times or about 100 msec (optical read head is heavier and slower) Higher latency (3000 RPM) and lower data-transfer rates (3-6 MB/s) compared

to magnetic disks

Digital Video Disk (DVD) DVD-5 holds 4.7 GB , and DVD-9 holds 8.5 GB DVD-10 and DVD-18 are double sided formats with capacities of 9.4 GB and 17

GB Other characteristics similar to CD-ROM

Record once versions (CD-R and DVD-R) are becoming popular data can only be written once, and cannot be erased. high capacity and long lifetime; used for archival storage Multi-write versions (CD-RW, DVD-RW and DVD-RAM) also available


Magnetic Tapes Hold large volumes of data and provide high transfer rates

Few GB for DAT (Digital Audio Tape) format, 10-40 GB with DLT (Digital Linear Tape) format, 100 GB+ with Ultrium format, and 330 GB with Ampex helical scan format

Transfer rates from few to 10s of MB/s

Currently the cheapest storage medium Tapes are cheap, but cost of drives is very high

Very slow access time in comparison to magnetic disks and optical disks limited to sequential access. Some formats (Accelis) provide faster seek (10s of seconds) at cost of lower capacity

Used mainly for backup, for storage of infrequently used information, and as an off-line medium for transferring information from one system to another.

Tape jukeboxes used for very large capacity storage (terabyte (1012 bytes) to petabye (1015 bytes)


Data Dictionary Storage

Information about relations names of relations names and types of attributes of each relation names and definitions of views integrity constraints

User and accounting information, including passwords Statistical and descriptive data

number of tuples in each relation

Physical file organization information How relation is stored (sequential/hash/…) Physical location of relation

operating system file name or disk addresses of blocks containing records of the relation

Information about indices (Chapter 12)

Data dictionary (also called system catalog) stores metadata: that is, data about data, such as


Data Dictionary Storage (Cont.) Catalog structure: can use either

specialized data structures designed for efficient access

a set of relations, with existing system features used to ensure efficient access

The latter alternative is usually preferred

A possible catalog representation:

Relation-metadata = (relation-name, number-of-attributes, storage-organization, location)

Attribute-metadata = (attribute-name, relation-name, domain-type, position, length)

User-metadata = (user-name, encrypted-password, group)Index-metadata = (index-name, relation-name, index-type,

index-attributes)View-metadata = (view-name, definition)




Chapter 5

Database Development


2

Main components of an information system.

Main stages of database system development lifecycle.

Main phases of database design: conceptual, logical, and physical design.

2Original Slides by T. Connolly

Software Depression

Original Slides by T. Connolly3

Last few decades have seen proliferation of software applications, many requiring constant maintenance involving: correcting faults, implementing new user requirements, modifying software to run on new or upgraded

platforms. Effort spent on maintenance began to absorb resources

at an alarming rate.

Software Depression


As a result, many major software projects were late, over budget, unreliable, difficult to maintain, performed poorly.

In late 1960s, led to ‘software crisis’, now refer to as the ‘software depression’.

Software Depression


Major reasons for failure of software projects includes:- lack of a complete requirements specification;- lack of appropriate development methodology;- poor decomposition of design into manageable components.

Structured approach to development was proposed called Information Systems Lifecycle (ISLC).

Information System


Resources that enable collection, management, control, and dissemination of information throughout an organization.

Database is fundamental component of IS, and its development/usage should be viewed from perspective of the wider requirements of the organization.



Database System Development Lifecycle


Database planning

System definition

Requirements collection and analysis

Database design

DBMS selection (optional)

Database System Development Lifecycle


Application design

Prototyping (optional)

Implementation

Data conversion and loading

Testing

Operational maintenance

9

Database Planning

System Definition

Requirements correction and

analysis

Conceptual database design

Logical database design

Physical database design

Implementation

Data conversion and loading

Testing

Operational maintenance

Database design

Application design

DBMS selection (optional)


Stages of the Database System Development Lifecycle

Database Planning


Management activities that allow stages of database system development lifecycle to be realized as efficiently and effectively as possible.

Must be integrated with overall IS strategy of the organization.

Database Planning


Database planning should also include development of standards that govern: how data will be collected, how the format should be specified, what necessary documentation will be needed, how design and implementation should proceed.

System Definition


Describes scope and boundaries of database system and the major user views.

User view defines what is required of a database system from perspective of: a particular job role (such as Manager or Supervisor) or enterprise application area (such as marketing, personnel,

or stock control).



System Definition


Database application may have one or more user views.

Identifying user views helps ensure that no major users of the database are forgotten when developing requirements for new system.

User views also help in development of complex database system allowing requirements to be broken down into manageable pieces.

Representation of a Database System with Multiple User Views


Representation of a database system with multiple user views: user views (1,2, and 3) and (5 and 6) have overlapping requirements (shown as hatched areas), whereas user view 4 has distinct requirements.

Requirements Collection and Analysis


Process of collecting and analyzing information about the part of organization to be supported by the database system, and using this information to identify users’ requirements of new system.

Requirements Collection and Analysis


Information is gathered for each major user view including: a description of data used or generated; details of how data is to be used/generated; any additional requirements for new database system.

Information is analyzed to identify requirements to be included in new database system. Described in the requirements specification.

Database Design


Process of creating a design for a database that will support the enterprise’s mission statement and mission objectives for the required database system.

Database Design


Main approaches include: Top-down Bottom-up Inside-out Mixed



Database Design


Main purposes of data modeling include: to assist in understanding the meaning (semantics) of the

data; to facilitate communication about the information

requirements.

Building data model requires answering questions about entities, relationships, and attributes.

Database Design


A data model ensures we understand:- each user’s perspective of the data;- nature of the data itself, independent of its physical

representations;- use of data across user views.

Database Design


Three phases of database design:

Conceptual database design Logical database design Physical database design.

Conceptual Database Design


Process of constructing a model of the data used in an enterprise, independent of all physical considerations.

Data model is built using the information in users’ requirements specification.

Conceptual data model is source of information for logical design phase.

Logical Database Design


Process of constructing a model of the data used in an enterprise based on a specific data model (e.g. relational), but independent of a particular DBMS and other physical considerations.

Conceptual data model is refined and mapped on to a logical data model.

Physical Database Design


Process of producing a description of the database implementation on secondary storage.

Describes base relations, file organizations, and indexes used to achieve efficient access to data. Also describes any associated integrity constraints and secuirty measures.

Tailored to a specific DBMS system.



Three-Level ANSI-SPARC Architecture and Phases of Database Design


DBMS Selection (optional)


Selection of an appropriate DBMS to support the database system.

Undertaken at any time prior to logical design provided sufficient information is available regarding system requirements.

Main steps to selecting a DBMS: define Terms of Reference of study; shortlist two or three products; evaluate products; recommend selection and produce report.

Example - Evaluation of DBMS Product (optional)


Analysis of features for DBMS product evaluation.

Application Design


Design of user interface and application programs that use and process the database.

Database design and application design are parallel activities.

Includes two important activities: transaction design; user interface design.

Application Design – Transactions (optional)


An action, or series of actions, carried out by a single user or application program, which accesses or changes content of the database.

Should define and document the high-level characteristics of the transactions required.

Application Design – Transactions (optional)


Important characteristics of transactions: data to be used by the transaction; functional characteristics of the transaction; output of the transaction; importance to the users; expected rate of usage.

Three main types of transactions: retrieval, update, and mixed.





Building working model of a database system.

Purpose to identify features of a system that work well, or are

inadequate; to suggest improvements or even new features; to clarify the users’ requirements; to evaluate feasibility of a particular system design.

Implementation


Physical realization of the database and application designs. Use DDL to create database schemas and empty

database files. Use DDL to create any specified user views. Use 3GL or 4GL to create the application programs.

This will include the database transactions implemented using the DML, possibly embedded in a host programming language.

Data Conversion and Loading


Transferring any existing data into new database and converting any existing applications to run on new database.

Only required when new database system is replacing an old system. DBMS normally has utility that loads existing files into

new database. May be possible to convert and use application

programs from old system for use by new system.

Testing


Process of running the database system with intent of finding errors.

Use carefully planned test strategies and realistic data.

Testing cannot show absence of faults; it can show only that software faults are present.

Demonstrates that database and application programs appear to be working according to requirements.

Testing (optional)


Should also test usability of system. Evaluation conducted against a usability

specification.

Examples of criteria include: Learnability; Performance; Robustness; Recoverability; Adaptability.

Operational Maintenance


Process of monitoring and maintaining database system following installation.

Monitoring performance of system. if performance falls, may require tuning or

reorganization of the database. Maintaining and upgrading database application

(when required). Incorporating new requirements into database

application.



Chapter 6

Relational Algebra


2

Meaning of the term relational completeness.

How to form queries in relational algebra.


Introduction

3

Relational algebra and relational calculus are formal languages associated with the relational model.

Informally, relational algebra is a (high-level) procedural language and relational calculus a non-procedural language.

However, formally both are equivalent to one another.


Relational Algebra

4

Relational algebra operations work on one or more relations to define another relation without changing the original relations.

Both operands and results are relations, so output from one operation can become input to another operation.

Allows expressions to be nested, just as in arithmetic. This property is called closure.


Relational Algebra

5

Five basic operations in relational algebra: Selection, Projection, Cartesian product, Union, and Set Difference.

These perform most of the data retrieval operations needed.

Also have Join, Intersection, and Division operations, which can be expressed in terms of 5 basic operations.


Relational Algebra Operations




Relational Algebra Operations


Selection (or Restriction)

8

predicate (R) Works on a single relation R and defines a relation that

contains only those tuples (rows) of R that satisfy the specified condition (predicate).


Example - Selection (or Restriction)

9

List all staff with a salary greater than £10,000.

salary > 10000 (Staff)


Projection

10

col1, . . . , coln(R) Works on a single relation R and defines a relation that

contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates.


Example - Projection

11

Produce a list of salaries for all staff, showing only staffNo, fName, lName, and salary details.

staffNo, fName, lName, salary(Staff)


Union

12

R S Union of two relations R and S defines a relation that

contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated.

R and S must be union-compatible.

If R and S have I and J tuples, respectively, union is obtained by concatenating them into one relation with a maximum of (I + J) tuples.




Example - Union

13

List all cities where there is either a branch office or a property for rent.

city(Branch) city(PropertyForRent)


Set Difference

14

R – S Defines a relation consisting of the tuples that are in

relation R, but not in S. R and S must be union-compatible.


Example - Set Difference

15

List all cities where there is a branch office but no properties for rent.

city(Branch) – city(PropertyForRent)


Intersection

16

R S Defines a relation consisting of the set of all tuples

that are in both R and S. R and S must be union-compatible.

Expressed using basic operations:R S = R – (R – S)


Example - Intersection

17

List all cities where there is both a branch office and at least one property for rent.

city(Branch) city(PropertyForRent)


Cartesian product

18

R X S Defines a relation that is the concatenation of every

tuple of relation R with every tuple of relation S.




Example - Cartesian product

19

List the names and comments of all clients who have viewed a property for rent.(clientNo, fName, lName(Client)) X (clientNo, propertyNo,

comment (Viewing))


Example - Cartesian product and Selection

20

Use selection operation to extract those tuples where Client.clientNo = Viewing.clientNo.Client.clientNo = Viewing.clientNo((clientNo, fName, lName(Client))

(clientNo, propertyNo, comment(Viewing)))

Cartesian product and Selection can be reduced to a singleoperation called a Join.


Join Operations

21

Join is a derivative of Cartesian product.

Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations.

One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.


Join Operations

22

Various forms of join operation Theta join Equijoin (a particular type of Theta join) Natural join Outer join Semijoin


Theta join (-join)

23

R FS Defines a relation that contains tuples satisfying

the predicate F from the Cartesian product of R and S.

The predicate F is of the form R.ai S.bi where may be one of the comparison operators (<, , >, , =, ).


Theta join (-join)

24

Can rewrite Theta join using basic Selection and Cartesian product operations.

R FS = F(R S)

Degree of a Theta join is sum of degrees of the operand relations R and S. If predicate F contains only equality (=), the term Equijoin is used.




Example - Equijoin

25

List the names and comments of all clients who have viewed a property for rent.

(clientNo, fName, lName(Client)) Client.clientNo = Viewing.clientNo

(clientNo, propertyNo, comment(Viewing))


Natural join

26

R S An Equijoin of the two relations R and S over all

common attributes x. One occurrence of each common attribute is eliminated from the result.


Example - Natural join

27

List the names and comments of all clients who have viewed a property for rent.(clientNo, fName, lName(Client)) (clientNo, propertyNo, comment(Viewing))


Outer join

28

To display rows in the result that do not have matching values in the join column, use Outer join.

R S (Left) outer join is join in which tuples from R that

do not have matching values in common columns of S are also included in result relation.


Example - Left Outer join

29

Produce a status report on property viewings.

propertyNo, street, city(PropertyForRent) Viewing


Semijoin

30

R F S Defines a relation that contains the tuples of R that

participate in the join of R with S.

Can rewrite Semijoin using Projection and Join:

R F S = A(R F S)




Example - Semijoin

31

List complete details of all staff who work at the branch in Glasgow.

Staff Staff.branchNo=Branch.branchNo(city=‘Glasgow’(Branch))


Division

32

R S Defines a relation over the attributes C that consists

of set of tuples from R that match combination of every tuple in S.

Expressed using basic operations:T1 C(R)

T2 C((S X T1) – R)

T T1 –T2


Example - Division

33

Identify all clients who have viewed all properties with three rooms.

(clientNo, propertyNo(Viewing)) (propertyNo(rooms = 3 (PropertyForRent)))


Aggregate Operations

34

AL(R) Applies aggregate function list, AL, to R to define a

relation over the aggregate list. AL contains one or more (<aggregate_function>,

<attribute>) pairs .

Main aggregate functions are: COUNT, SUM, AVG, MIN, and MAX.


Example – Aggregate Operations

35

How many properties cost more than £350 per month to rent?

R(myCount) COUNT propertyNo (σrent > 350(PropertyForRent))

Grouping Operation

36

GAAL(R) Groups tuples of R by grouping attributes, GA, and

then applies aggregate function list, AL, to define a new relation.

AL contains one or more (<aggregate_function>, <attribute>) pairs.

Resulting relation contains the grouping attributes, GA, along with results of each of the aggregate functions.




Example – Grouping Operation

37

Find the number of staff working in each branch and the sum of their salaries.

R(branchNo, myCount, mySum)

branchNo COUNT staffNo, SUM salary (Staff)

Other Languages

38

Transform-oriented languages are non-procedural languages that use relations to transform input data into required outputs (e.g. SQL).

Graphical languages provide user with picture of the structure of the relation. User fills in example of what is wanted and system returns required data in that format (e.g. QBE).


Other Languages

39

4GLs can create complete customized application using limited set of commands in a user-friendly, often menu-driven environment.

Some systems accept a form of natural language, sometimes called a 5GL, although this development is still at an early stage.




Chapter 7

SQL: Data Manipulation and Data Definition


2

Purpose and importance of SQL. How to retrieve data from database using SELECT

and: Use compound WHERE conditions. Sort query results using ORDER BY. Use aggregate functions. Group data using GROUP BY and HAVING. Use subqueries. Join tables together. Perform set operations (UNION, INTERSECT, EXCEPT).

How to update database using INSERT, UPDATE, andDELETE.



3

Data types supported by SQL standard.

Purpose of integrity enhancement feature of SQL.

How to define integrity constraints using SQL.

How to use the integrity enhancement feature inthe CREATE and ALTERTABLE statements.


Objectives of SQL

4

Ideally, database language should allow user to: create the database and relation structures; perform insertion, modification, deletion of data from

relations; perform simple and complex queries.

Must perform these tasks with minimal usereffort and command structure/syntax must beeasy to learn.

It must be portable.


Objectives of SQL

5

SQL is a transform-oriented language with 2major components:

A DDL for defining database structure. A DML for retrieving and updating data.

Until SQL:1999, SQL did not contain flow ofcontrol commands. These had to be implementedusing a programming or job-control language, orinteractively by the decisions of user.


Objectives of SQL

6

SQL is relatively easy to learn: it is non-procedural - you specify what information you

require, rather than how to get it; it is essentially free-format.




Objectives of SQL

7

Consists of standard English words:

1) CREATETABLE Staff(staffNoVARCHAR(5),lNameVARCHAR(15),salary DECIMAL(7,2));

2) INSERT INTO StaffVALUES (‘SG16’,‘Brown’, 8300);3) SELECT staffNo, lName, salary

FROM StaffWHERE salary > 10000;


Objectives of SQL

8

Can be used by range of users including DBAs,management, application developers, and othertypes of end users.

An ISO standard now exists for SQL, making itboth the formal and de facto standard languagefor relational databases.


History of SQL

9

In 1974, D. Chamberlin (IBM San Jose Laboratory)defined language called ‘Structured English QueryLanguage’ (SEQUEL).

A revised version, SEQUEL/2, was defined in 1976but name was subsequently changed to SQL forlegal reasons.


History of SQL

10

Still pronounced ‘see-quel’, though officialpronunciation is ‘S-Q-L’.

IBM subsequently produced a prototype DBMScalled System R, based on SEQUEL/2.

Roots of SQL, however, are in SQUARE(Specifying Queries as Relational Expressions),which predates System R project.


History of SQL

11

In late 70s, ORACLE appeared and was probably firstcommercial RDBMS based on SQL.

In 1987, ANSI and ISO published an initial standard forSQL.

In 1989, ISO published an addendum that defined an‘Integrity Enhancement Feature’.

In 1992, first major revision to ISO standard occurred,referred to as SQL2 or SQL/92.

In 1999, SQL:1999 was released with support forobject-oriented data management.

In late 2003, SQL:2003 was released.


Importance of SQL

12

SQL has become part of application architectures such as IBM’s Systems Application Architecture.

It is strategic choice of many large and influential organizations (e.g. X/OPEN).

SQL is Federal Information Processing Standard (FIPS) to which conformance is required for all sales of databases to American Government.




Importance of SQL

13

SQL is used in other standards and eveninfluences development of other standards as adefinitional tool. Examples include:

ISO’s Information Resource Directory System (IRDS)Standard

Remote Data Access (RDA) Standard.


Writing SQL Commands

14

SQL statement consists of reserved words anduser-defined words.

– Reserved words are a fixed part of SQL and mustbe spelt exactly as required and cannot be splitacross lines.

– User-defined words are made up by user andrepresent names of various database objectssuch as relations, columns, views.



15

Most components of an SQL statement are caseinsensitive, except for literal character data.

More readable with indentation and lineation: Each clause should begin on a new line. Start of a clause should line up with start of other

clauses. If clause has several parts, should each appear on a

separate line and be indented under start of clause.



16

Use extended form of BNF notation:

- Upper-case letters represent reserved words.- Lower-case letters represent user-defined words.- | indicates a choice among alternatives.- Curly braces indicate a required element.- Square brackets indicate an optional element.- … indicates optional repetition (0 or more).


Literals

17

Literals are constants used in SQL statements.

All non-numeric literals must be enclosed insingle quotes (e.g.‘London’).

All numeric literals must not be enclosed inquotes (e.g. 650.00).


SELECT Statement

18

SELECT [DISTINCT | ALL]{* | [columnExpression [AS newName]] [,...] }

FROM TableName [alias] [, ...][WHERE condition][GROUP BY columnList] [HAVING condition][ORDER BY columnList]




SELECT Statement

19

FROM Specifies table(s) to be used.WHERE Filters rows.GROUP BYForms groups of rows with same

column value.HAVING Filters groups subject to some

condition.SELECT Specifies which columns are to

appear in output.ORDER BY Specifies the order of the output.


SELECT Statement

20

Order of the clauses cannot be changed.

Only SELECT and FROM are mandatory.


Example 1 All Columns, All Rows

21

List full details of all staff.

SELECT staffNo, fName, lName, address,position, sex, DOB, salary, branchNo

FROM Staff;

Can use * as an abbreviation for ‘all columns’:

SELECT *FROM Staff;


Example 1 All Columns, All Rows


Result table for Example 1

Example 2 Specific Columns, All Rows

23

Produce a list of salaries for all staff, showing onlystaff number, first and last names, and salary.

SELECT staffNo, fName, lName, salaryFROM Staff;


Example 2 Specific Columns, All Rows





Example 3 Use of DISTINCT

25

List the property numbers of all properties thathave been viewed.

SELECT propertyNoFROM Viewing;


Example 3 Use of DISTINCT

26

Use DISTINCT to eliminate duplicates:

SELECT DISTINCT propertyNoFROMViewing;


Example 4 Calculated Fields

27

Produce list of monthly salaries for all staff,showing staff number, first/last name, and salary.

SELECT staffNo, fName, lName, salary/12 FROM Staff;



Example 5 Calculated Fields

28

To name column, use AS clause:

SELECT staffNo, fName, lName, salary/12 AS monthlySalary

FROM Staff;


Example 5 Comparison Search Condition

29

List all staff with a salary greater than 10,000.

SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary > 10000;



Example 6 Compound Comparison Search Condition

30

List addresses of all branch offices in London orGlasgow.

SELECT *FROM BranchWHERE city = ‘London’ OR city = ‘Glasgow’;





Example 7 Range Search Condition

31

List all staff with a salary between 20,000 and30,000.

SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary BETWEEN 20000 AND 30000;

BETWEEN test includes the endpoints of range.






33

Also a negated version NOT BETWEEN. BETWEEN does not add much to SQL’s

expressive power. Could also write:

SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary>=20000 AND salary <= 30000;

Useful, though, for a range of values.


Example 8 Set Membership

34

List all managers and supervisors.

SELECT staffNo, fName, lName, positionFROM StaffWHERE position IN (‘Manager’,‘Supervisor’);



Example 8 Set Membership

35

There is a negated version (NOT IN). IN does not add much to SQL’s expressive power.Could have expressed this as:

SELECT staffNo, fName, lName, positionFROM StaffWHERE position=‘Manager’ OR

position=‘Supervisor’;

IN is more efficient when set contains many values.


Example 9 Pattern Matching

36

Find all owners with the string ‘Glasgow’ in theiraddress.

SELECT ownerNo, fName, lName, address, telNoFROM PrivateOwnerWHERE address LIKE ‘%Glasgow%’;





Example 9 Pattern Matching

37

SQL has two special pattern matching symbols:

%: sequence of zero or more characters; _ (underscore): any single character.

LIKE ‘%Glasgow%’ means a sequence of charactersof any length containing ‘Glasgow’.


Example 10 NULL Search Condition

38

List details of all viewings on property PG4 wherea comment has not been supplied.

There are 2 viewings for property PG4, one withand one without a comment.

Have to test for null explicitly using specialkeyword IS NULL:

SELECT clientNo, viewDateFROMViewingWHERE propertyNo = ‘PG4’ AND

comment IS NULL;


Example 10 NULL Search Condition

39

Negated version (IS NOT NULL) can testfor non-null values.

Example 11 Single Column Ordering

40

List salaries for all staff, arranged in descendingorder of salary.

SELECT staffNo, fName, lName, salaryFROM StaffORDER BY salary DESC;


Example 11 Single Column Ordering



Example 12 Multiple Column Ordering

42

Produce abbreviated list of properties in order ofproperty type.

SELECT propertyNo, type, rooms, rentFROM PropertyForRentORDER BY type;






Result table for Example 12 with one sort key


44

Four flats in this list - as no minor sort keyspecified, system arranges these rows in any orderit chooses.

To arrange in order of rent, specify minor order:

SELECT propertyNo, type, rooms, rentFROM PropertyForRentORDER BY type, rent DESC;




Result table for Example 12 with two sort keys

SELECT Statement - Aggregates

46

ISO standard defines five aggregate functions:

COUNT returns number of values in specified column.

SUM returns sum of values in specified column.

AVG returns average of values in specified column.

MIN returns smallest value in specified column.

MAX returns largest value in specified column.



47

Each operates on a single column of a table andreturns a single value.

COUNT, MIN, and MAX apply to numeric andnon-numeric fields, but SUM and AVG may beused on numeric fields only.

Apart from COUNT(*), each function eliminatesnulls first and operates only on remaining non-null values.



48

COUNT(*) counts all rows of a table, regardlessof whether nulls or duplicate values occur.

Can use DISTINCT before column name toeliminate duplicates.

DISTINCT has no effect with MIN/MAX, but mayhave with SUM/AVG.





49

Aggregate functions can be used only in SELECTlist and in HAVING clause.

If SELECT list includes an aggregate functionand there is no GROUP BY clause, SELECT listcannot reference a column out with anaggregate function. For example, the following isillegal:

SELECT staffNo, COUNT(salary)FROM Staff;


Example 13 Use of COUNT(*)

50

How many properties cost more than £350 permonth to rent?

SELECT COUNT(*) AS myCountFROM PropertyForRentWHERE rent > 350;

Example14 Use of COUNT(DISTINCT)

51

How many different properties viewed in May ‘04?

SELECT COUNT(DISTINCT propertyNo) AS myCountFROMViewingWHERE viewDate BETWEEN ‘1-May-04’

AND ‘31-May-04’;

Example 15 Use of COUNT and SUM

52

Find number of Managers and sum of theirsalaries.

SELECT COUNT(staffNo) AS myCount,SUM(salary) AS mySum

FROM StaffWHERE position = ‘Manager’;

Example 16 Use of MIN, MAX, AVG

53

Find minimum, maximum, and average staffsalary.

SELECT MIN(salary) AS myMin,MAX(salary) AS myMax,AVG(salary) AS myAvg

FROM Staff;

SELECT Statement - Grouping

54

Use GROUP BY clause to get sub-totals. SELECT and GROUP BY closely integrated: each

item in SELECT list must be single-valued pergroup, and SELECT clause may only contain: column names aggregate functions constants expression involving combinations of the above.




SELECT Statement - Grouping

55

All column names in SELECT list must appear inGROUP BY clause unless name is used only in anaggregate function.

If WHERE is used with GROUP BY, WHERE isapplied first, then groups are formed fromremaining rows satisfying predicate.

ISO considers two nulls to be equal for purposesof GROUP BY.


Example 17 Use of GROUP BY

56

Find number of staff in each branch and theirtotal salaries.

SELECT branchNo,COUNT(staffNo) AS myCount,SUM(salary) AS mySum

FROM StaffGROUP BY branchNoORDER BY branchNo;


Example 17 Use of GROUP BY


Restricted Groupings – HAVING clause

58

HAVING clause is designed for use with GROUP BY to restrict groups that appear in final result table.

Similar to WHERE, but WHERE filters individualrows whereas HAVING filters groups.

Column names in HAVING clause must alsoappear in the GROUP BY list or be containedwithin an aggregate function.


Example 18 Use of HAVING

59

For each branch with more than 1 member ofstaff, find number of staff in each branch andsum of their salaries.

SELECT branchNo,COUNT(staffNo) AS myCount,SUM(salary) AS mySum

FROM StaffGROUP BY branchNoHAVING COUNT(staffNo) > 1ORDER BY branchNo;


Example 18 Use of HAVING




Subqueries

61

Some SQL statements can have a SELECTembedded within them.

A subselect can be used in WHERE and HAVINGclauses of an outer SELECT, where it is called asubquery or nested query.

Subselects may also appear in INSERT, UPDATE,and DELETE statements.


Example 19 Subquery with Equality

62

List staff who work in branch at ‘163 Main St’.

SELECT staffNo, fName, lName, positionFROM StaffWHERE branchNo =

(SELECT branchNoFROM BranchWHERE street = ‘163 Main St’);



63

Inner SELECT finds branch number for branch at‘163 Main St’ (‘B003’).

Outer SELECT then retrieves details of all staffwho work at this branch.

Outer SELECT then becomes:

SELECT staffNo, fName, lName, positionFROM StaffWHERE branchNo = ‘B003’;





Example 20 Subquery with Aggregate

65

List all staff whose salary is greater than the averagesalary, and show by how much.

SELECT staffNo, fName, lName, position,salary – (SELECT AVG(salary) FROM Staff) As SalDiff

FROM StaffWHERE salary >

(SELECT AVG(salary)FROM Staff);



66

Cannot write ‘WHERE salary > AVG(salary)’ Instead, use subquery to find average salary

(17000), and then use outer SELECT to find thosestaff with salary greater than this:

SELECT staffNo, fName, lName, position,salary – 17000 As salDiff

FROM StaffWHERE salary > 17000;







Subquery Rules

68

ORDER BY clause may not be used in asubquery (although it may be used in outermostSELECT).

Subquery SELECT list must consist of a singlecolumn name or expression, except forsubqueries that use EXISTS.

By default, column names refer to table name inFROM clause of subquery. Can refer to a table inFROM using an alias.


Subquery Rules

69

When subquery is an operand in a comparison,subquery must appear on right-hand side.

A subquery may not be used as an operand in anexpression.


Example 21 Nested subquery: use of IN

70

List properties handled by staff at ‘163 Main St’.

SELECT propertyNo, street, city, postcode, type, rooms, rentFROM PropertyForRentWHERE staffNo IN

(SELECT staffNoFROM StaffWHERE branchNo =

(SELECT branchNoFROM BranchWHERE street = ‘163 Main St’));


Example 21 Nested subquery: use of IN



ANY and ALL

72

ANY and ALL may be used with subqueries thatproduce a single column of numbers.

With ALL, condition will only be true if it issatisfied by all values produced by subquery.

With ANY, condition will be true if it is satisfiedby any values produced by subquery.

If subquery is empty, ALL returns true, ANYreturns false.

SOME may be used in place of ANY.




Example 22 Use of ANY/SOME

73

Find staff whose salary is larger than salary of atleast one member of staff at branch B003.

SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary > SOME

(SELECT salaryFROM StaffWHERE branchNo = ‘B003’);


Example 22 Use of ANY/SOME

74

Inner query produces set {12000, 18000, 24000} andouter query selects those staff whose salaries aregreater than any of the values in this set.



Example 23 Use of ALL

75

Find staff whose salary is larger than salary ofevery member of staff at branch B003.

SELECT staffNo, fName, lName, position, salaryFROM StaffWHERE salary > ALL

(SELECT salaryFROM StaffWHERE branchNo = ‘B003’);


Example 23 Use of ALL



Multi-Table Queries

77

Can use subqueries provided result columnscome from same table.

If result columns come from more than one tablemust use a join.

To perform join, include more than one table inFROM clause.

Use comma as separator and typically includeWHERE clause to specify join column(s).


Multi-Table Queries

78

Also possible to use an alias for a table named inFROM clause.

Alias is separated from table name with a space.

Alias can be used to qualify column names whenthere is ambiguity.




Example 24 Simple Join

79

List names of all clients who have viewed aproperty along with any comment supplied.

SELECT c.clientNo, fName, lName,propertyNo, comment

FROM Client c,Viewing vWHERE c.clientNo = v.clientNo;


Example 24 Simple Join

80

Only those rows from both tables that have identicalvalues in the clientNo columns (c.clientNo =v.clientNo) are included in result.

Equivalent to equi-join in relational algebra.


Alternative JOIN Constructs

81

SQL provides alternative ways to specify joins:

FROM Client c JOIN Viewing v ON c.clientNo =v.clientNoFROM Client JOINViewing USING clientNoFROM Client NATURAL JOINViewing

In each case, FROM replaces original FROM andWHERE. However, first produces table with twoidentical clientNo columns.


Example 25 Sorting a join

82

For each branch, list numbers and names ofstaff who manage properties, and propertiesthey manage.

SELECT s.branchNo, s.staffNo, fName, lName,propertyNo

FROM Staff s, PropertyForRent pWHERE s.staffNo = p.staffNoORDER BY s.branchNo, s.staffNo, propertyNo;


Example 25 Sorting a join



Example 26 Three Table Join

84

For each branch, list staff who manage properties,including city in which branch is located andproperties they manage.

SELECT b.branchNo, b.city, s.staffNo, fName, lName,propertyNo

FROM Branch b, Staff s, PropertyForRent pWHERE b.branchNo = s.branchNo AND

s.staffNo = p.staffNoORDER BY b.branchNo, s.staffNo, propertyNo;




Example 26 Three Table Join

85

Alternative formulation for FROM and WHERE:

FROM (Branch b JOIN Staff s USING branchNo) ASbs JOIN PropertyForRent p USING staffNo


Example 27 Multiple Grouping Columns

86

Find number of properties handled by each staffmember.

SELECT s.branchNo, s.staffNo, COUNT(*) AS myCountFROM Staff s, PropertyForRent pWHERE s.staffNo = p.staffNoGROUP BY s.branchNo, s.staffNoORDER BY s.branchNo, s.staffNo;


Example 27 Multiple Grouping Columns


Computing a Join

88

Procedure for generating results of a join are:

1. Form Cartesian product of the tables named inFROM clause.

2. If there is a WHERE clause, apply the search conditionto each row of the product table, retaining those rowsthat satisfy the condition.

3. For each remaining row, determine value of each itemin SELECT list to produce a single row in result table.


Computing a Join

89

4. If DISTINCT has been specified, eliminate anyduplicate rows from the result table.

5. If there is an ORDER BY clause, sort result table asrequired.

SQL provides special format of SELECT for Cartesianproduct:

SELECT [DISTINCT | ALL] {* | columnList}FROMTable1 CROSS JOINTable2


Outer Joins

90

If one row of a joined table is unmatched, row isomitted from result table.

Outer join operations retain rows that do not satisfythe join condition.

Consider following tables:




Outer Joins

91

The (inner) join of these two tables:

SELECT b.*, p.*FROM Branch1 b, PropertyForRent1 pWHERE b.bCity = p.pCity;


Result table for inner join of Branch1 and PropertyForRent1 tables

Outer Joins

92

Result table has two rows where cities are same. There are no rows corresponding to branches in

Bristol and Aberdeen. To include unmatched rows in result table, use an

Outer join.


Example 28 Left Outer Join

93

List branches and properties that are in samecity along with any unmatched branches.

SELECT b.*, p.*FROM Branch1 b LEFT JOIN

PropertyForRent1 p ON b.bCity = p.pCity;


Example 28 Left Outer Join

94

Includes those rows of first (left) table unmatchedwith rows from second (right) table.

Columns from second table are filled with NULLs.


Example 29 Right Outer Join

95

List branches and properties in same city and anyunmatched properties.

SELECT b.*, p.*FROM Branch1 b RIGHT JOIN



Example 29 Right Outer Join

96

Right Outer join includes those rows of second (right)table that are unmatched with rows from first (left)table.

Columns from first table are filled with NULLs.




Example 30 Full Outer Join

97

List branches and properties in same city and anyunmatched branches or properties.

SELECT b.*, p.*FROM Branch1 b FULL JOIN



Example 30 Full Outer Join

98

Includes rows that are unmatched in both tables. Unmatched columns are filled with NULLs.


EXISTS and NOT EXISTS

99

EXISTS and NOT EXISTS are for use only withsubqueries.

Produce a simple true/false result.

True if and only if there exists at least one row inresult table returned by subquery.

False if subquery returns an empty result table.

NOT EXISTS is the opposite of EXISTS.


EXISTS and NOT EXISTS

100

As (NOT) EXISTS check only for existence ornon-existence of rows in subquery result table,subquery can contain any number of columns.

Common for subqueries following (NOT) EXISTSto be of form:

(SELECT * ...)


Example 31 Query using EXISTS

101

Find all staff who work in a London branch.

SELECT staffNo, fName, lName, positionFROM Staff sWHERE EXISTS

(SELECT *FROM Branch bWHERE s.branchNo = b.branchNo AND

city = ‘London’);








103

Note, search condition s.branchNo = b.branchNo isnecessary to consider correct branch record for eachmember of staff.

If omitted, would get all staff records listed outbecause subquery:SELECT * FROM Branch WHERE city=‘London’

would always be true and query would be:SELECT staffNo, fName, lName, position FROM StaffWHERE true;



104

Could also write this query using join construct:

SELECT staffNo, fName, lName, positionFROM Staff s, Branch bWHERE s.branchNo = b.branchNo AND

city = ‘London’;


Union, Intersect, and Difference (Except)

105

Can use normal set operations of Union, Intersection,and Difference to combine results of two or morequeries into a single result table.

Union of two tables, A and B, is table containing allrows in either A or B or both.

Intersection is table containing all rows common toboth A and B.

Difference is table containing all rows in A but not inB.

Two tables must be union compatible.



106

Format of set operator clause in each case is:

op [ALL] [CORRESPONDING [BY {column1 [, ...]}]]

If CORRESPONDING BY specified, set operation performed onthe named column(s).

If CORRESPONDING specified but not BY clause, operationperformed on common columns.

If ALL specified, result can include duplicate rows.




Example 32 Use of UNION

108

List all cities where there is either a branch officeor a property.

(SELECT cityFROM BranchWHERE city IS NOT NULL) UNION(SELECT cityFROM PropertyForRentWHERE city IS NOT NULL);





109

Or

(SELECT *FROM BranchWHERE city IS NOT NULL)UNION CORRESPONDING BY city(SELECT *FROM PropertyForRentWHERE city IS NOT NULL);



110

Produces result tables from both queries andmerges both tables together.


Example 33 Use of INTERSECT

111

List all cities where there is both a branch officeand a property.

(SELECT city FROM Branch)INTERSECT(SELECT city FROM PropertyForRent);



112

Or

(SELECT * FROM Branch)INTERSECT CORRESPONDING BY city(SELECT * FROM PropertyForRent);



113

Could rewrite this query without INTERSECToperator:

SELECT b.cityFROM Branch b PropertyForRent pWHERE b.city = p.city;

Or:SELECT DISTINCT city FROM Branch bWHERE EXISTS

(SELECT * FROM PropertyForRent pWHERE p.city = b.city);


Example 34 Use of EXCEPT

114

List of all cities where there is a branch officebut no properties.

(SELECT city FROM Branch)EXCEPT(SELECT city FROM PropertyForRent);

Or

(SELECT * FROM Branch)EXCEPT CORRESPONDING BY city(SELECT * FROM PropertyForRent);




Example 34 Use of EXCEPT

115

Could rewrite this query without EXCEPT:SELECT DISTINCT city FROM BranchWHERE city NOT IN

(SELECT city FROM PropertyForRent); Or

SELECT DISTINCT city FROM Branch bWHERE NOT EXISTS

(SELECT * FROM PropertyForRent pWHERE p.city = b.city);


INSERT

116

INSERT INTOTableName [ (columnList) ]VALUES (dataValueList)

columnList is optional; if omitted, SQL assumes a list ofall columns in their original CREATETABLE order.

Any columns omitted must have been declared asNULL when table was created, unless DEFAULT wasspecified when creating column.


INSERT

117

dataValueList must match columnList as follows: number of items in each list must be same; must be direct correspondence in position of items

in two lists; data type of each item in dataValueList must be

compatible with data type of corresponding column.


Example 35 INSERT … VALUES

118

Insert a new row into Staff table supplying datafor all columns.

INSERT INTO StaffVALUES (‘SG16’, ‘Alan’, ‘Brown’, ‘Assistant’, ‘M’,

Date‘1957-05-25’, 8300,‘B003’);


Example 36 INSERT using Defaults

119

Insert a new row into Staff table supplying data forall mandatory columns.

INSERT INTO Staff (staffNo, fName, lName,position, salary, branchNo)

VALUES (‘SG44’,‘Anne’,‘Jones’,‘Assistant’, 8100,‘B003’);

OrINSERT INTO StaffVALUES (‘SG44’,‘Anne’,‘Jones’,‘Assistant’, NULL,

NULL, 8100,‘B003’);


INSERT … SELECT

120

Second form of INSERT allows multiple rows tobe copied from one or more tables to another:

INSERT INTOTableName [ (columnList) ]SELECT ...




Example 37 INSERT … SELECT

121

Assume there is a table StaffPropCount thatcontains names of staff and number of propertiesthey manage:

StaffPropCount(staffNo, fName, lName, propCnt)

Populate StaffPropCount using Staff andPropertyForRent tables.



122

INSERT INTO StaffPropCount(SELECT s.staffNo, fName, lName, COUNT(*)FROM Staff s, PropertyForRent pWHERE s.staffNo = p.staffNoGROUP BY s.staffNo, fName, lName)UNION(SELECT staffNo, fName, lName, 0FROM StaffWHERE staffNo NOT IN

(SELECT DISTINCT staffNoFROM PropertyForRent));



123

If second part of UNION is omitted, excludes those staff who currently do not manage any properties.


UPDATE

124

UPDATETableNameSET columnName1 = dataValue1

[, columnName2 = dataValue2...][WHERE searchCondition]

TableName can be name of a base table or anupdatable view.

SET clause specifies names of one or more columnsthat are to be updated.


UPDATE

125

WHERE clause is optional: if omitted, named columns are updated for all rows

in table; if specified, only those rows that satisfy

searchCondition are updated.

New dataValue(s) must be compatible withdata type for corresponding column.


Example 38/39 UPDATE All Rows

126

Give all staff a 3% pay increase.

UPDATE StaffSET salary = salary*1.03;

Give all Managers a 5% pay increase.

UPDATE StaffSET salary = salary*1.05WHERE position = ‘Manager’;




Example 40 UPDATE Multiple Columns

127

Promote David Ford (staffNo=‘SG14’) toManager and change his salary to £18,000.

UPDATE StaffSET position = ‘Manager’, salary = 18000WHERE staffNo = ‘SG14’;


DELETE

128

DELETE FROMTableName[WHERE searchCondition]

TableName can be name of a base table or anupdatable view.

searchCondition is optional; if omitted, all rows aredeleted from table. This does not delete table. Ifsearch_condition is specified, only those rows thatsatisfy condition are deleted.


Example 41/42 DELETE Specific Rows

129

Delete all viewings that relate to property PG4.

DELETE FROMViewingWHERE propertyNo = ‘PG4’;

Delete all records from theViewing table.

DELETE FROMViewing;


ISO SQL Data Types


Integrity Enhancement Feature

131

Consider five types of integrity constraints:

required data domain constraints entity integrity referential integrity general constraints.



132

Required Dataposition VARCHAR(10) NOT NULL

Domain Constraints(a) CHECK

sex CHAR NOT NULLCHECK (sex IN (‘M’,‘F’))





133

(b) CREATE DOMAIN

CREATE DOMAIN DomainName [AS] dataType[DEFAULT defaultOption][CHECK (searchCondition)]

For example:

CREATE DOMAIN SexType AS CHARCHECK (VALUE IN (‘M’,‘F’));

sex SexType NOT NULL



134

searchCondition can involve a table lookup:

CREATE DOMAIN BranchNo AS CHAR(4)CHECK (VALUE IN (SELECT branchNo

FROM Branch));

Domains can be removed using DROP DOMAIN:

DROP DOMAIN DomainName[RESTRICT | CASCADE]


IEF - Entity Integrity

135

Primary key of a table must contain a unique, non-null value for each row.

ISO standard supports FOREIGN KEY clause inCREATE and ALTERTABLE statements:

PRIMARY KEY(staffNo)PRIMARY KEY(clientNo, propertyNo)

Can only have one PRIMARY KEY clause per table.Can still ensure uniqueness for alternate keys usingUNIQUE:

UNIQUE(telNo)


IEF - Referential Integrity

136

FK is column or set of columns that links each row inchild table containing foreign FK to row of parent tablecontaining matching PK.

Referential integrity means that, if FK contains a value,that value must refer to existing row in parent table.

ISO standard supports definition of FKs withFOREIGN KEY clause in CREATE and ALTERTABLE:

FOREIGN KEY(branchNo) REFERENCES Branch



137

Any INSERT/UPDATE attempting to create FK valuein child table without matching CK value in parent isrejected.

Action taken attempting to update/delete a CK valuein parent table with matching rows in child isdependent on referential action specified using ONUPDATE and ON DELETE subclauses:

CASCADE - SET NULL SET DEFAULT - NO ACTION



138

CASCADE: Delete row from parent and deletematching rows in child, and so on in cascading manner.SET NULL: Delete row from parent and set FKcolumn(s) in child to NULL. Only valid if FK columns areNOT NULL.SET DEFAULT: Delete row from parent and set eachcomponent of FK in child to specified default. Only validif DEFAULT specified for FK columns.NO ACTION: Reject delete from parent. Default.





139

FOREIGN KEY (staffNo) REFERENCES Staff ON DELETE SET NULL

FOREIGN KEY (ownerNo) REFERENCES Owner ON UPDATE CASCADE


IEF - General Constraints

140

Could use CHECK/UNIQUE in CREATE and ALTERTABLE.

Similar to the CHECK clause, also have:

CREATE ASSERTION AssertionNameCHECK (searchCondition)


IEF - General Constraints

141

CREATE ASSERTION StaffNotHandlingTooMuchCHECK (NOT EXISTS (SELECT staffNo

FROM PropertyForRentGROUP BY staffNoHAVING COUNT(*) > 100))


Data Definition

142

SQL DDL allows database objects such as schemas,domains, tables, views, and indexes to be created anddestroyed.

Main SQL DDL statements are:CREATE SCHEMA DROP SCHEMACREATE/ALTER DOMAIN DROP DOMAINCREATE/ALTER TABLE DROP TABLECREATE VIEW DROP VIEW

Many DBMSs also provide:CREATE INDEX DROP INDEX


Data Definition

143

Relations and other database objects exist in anenvironment.

Each environment contains one or more catalogs, andeach catalog consists of set of schemas.

Schema is named collection of related databaseobjects.

Objects in a schema can be tables, views, domains,assertions, collations, translations, and character sets.All have same owner.


CREATE SCHEMA

144

CREATE SCHEMA [Name |AUTHORIZATION CreatorId ]

DROP SCHEMA Name [RESTRICT | CASCADE ]

With RESTRICT (default), schema must be empty oroperation fails.

With CASCADE, operation cascades to drop all objectsassociated with schema in order defined above. If any ofthese operations fail, DROP SCHEMA fails.




CREATE TABLE

145

CREATETABLETableName{(colName dataType [NOT NULL] [UNIQUE][DEFAULT defaultOption][CHECK searchCondition] [,...]}[PRIMARY KEY (listOfColumns),]{[UNIQUE (listOfColumns),] […,]}{[FOREIGN KEY (listOfFKColumns)REFERENCES ParentTableName [(listOfCKColumns)],[ON UPDATE referentialAction][ON DELETE referentialAction ]] [,…]}

{[CHECK (searchCondition)] [,…] })


CREATE TABLE

146

Creates a table with one or more columns of the specifieddataType.

With NOT NULL, system rejects any attempt to insert anull in the column.

Can specify a DEFAULT value for the column.

Primary keys should always be specified as NOT NULL.

FOREIGN KEY clause specifies FK along with thereferential action.


Example 43 - CREATE TABLE

147

CREATE DOMAIN OwnerNumber ASVARCHAR(5)CHECK (VALUE IN (SELECT ownerNo FROM PrivateOwner));

CREATE DOMAIN StaffNumber ASVARCHAR(5)CHECK (VALUE IN (SELECT staffNo FROM Staff));

CREATE DOMAIN PNumber ASVARCHAR(5);

CREATE DOMAIN PRooms AS SMALLINT;CHECK(VALUE BETWEEN 1 AND 15);

CREATE DOMAIN PRent AS DECIMAL(6,2)CHECK(VALUE BETWEEN 0 AND 9999.99);


Example 43 - CREATE TABLE

148

CREATETABLE PropertyForRent (propertyNo PNumber NOT NULL, ….rooms PRooms NOT NULL DEFAULT 4, rent PRent NOT NULL, DEFAULT 600, ownerNo OwnerNumber NOT NULL, staffNo StaffNumber

Constraint StaffNotHandlingTooMuch ….branchNo BranchNumber NOT NULL,PRIMARY KEY (propertyNo),FOREIGN KEY (staffNo) REFERENCES Staff ON DELETE SET NULL ON UPDATE CASCADE ….);


ALTER TABLE

149

Add a new column to a table. Drop a column from a table. Add a new table constraint. Drop a table constraint. Set a default for a column. Drop a default for a column.


Example 44(a) - ALTER TABLE

150

Change Staff table by removing default of ‘Assistant’for position column and setting default for sexcolumn to female (‘F’).

ALTERTABLE StaffALTER position DROP DEFAULT;

ALTERTABLE StaffALTER sex SET DEFAULT ‘F’;




Example 44(b) - ALTER TABLE

151

Remove constraint from PropertyForRent that staff are not allowed to handle more than 100 properties at a time. Add new column to Client table.

ALTERTABLE PropertyForRentDROP CONSTRAINT StaffNotHandlingTooMuch;

ALTERTABLE ClientADD prefNoRooms PRooms;


DROP TABLE

152

DROPTABLETableName [RESTRICT | CASCADE]

e.g. DROPTABLE PropertyForRent;

Removes named table and all rows within it. With RESTRICT, if any other objects depend for their

existence on continued existence of this table, SQLdoes not allow request.

With CASCADE, SQL drops all dependent objects(and objects dependent on these objects).




Chapter 8

Fundamental Database and Information System


2

Function and importance of transactions. Properties of transactions. Concurrency Control Recovery Control Distributed DBMS



3

Data Warehouse Business Intelligent OLAP Data Mining


Transaction Support

4

Transaction Action, or series of actions, carried out by user or application,

which reads or updates contents of database.

Logical unit of work on the database. Application program is series of transactions with non-

database processing in between. Transforms database from one consistent state to

another, although consistency may be violated during transaction.


Example Transaction


Transaction Support

6

Can have one of two outcomes: Success - transaction commits and database reaches a new

consistent state. Failure - transaction aborts, and database must be restored to

consistent state before it started. Such a transaction is rolled back or undone.

Committed transaction cannot be aborted. Aborted transaction that is rolled back can be restarted

later.




State Transition Diagram for Transaction


Properties of Transactions

8

Four basic (ACID) properties of a transaction are:

Atomicity ‘All or nothing’ property. Consistency Must transform database from one

consistent state to another. Isolation Partial effects of incomplete transactions

should not be visible to other transactions. Durability Effects of a committed transaction are

permanent and must not be lost because of later failure.


DBMS Transaction Subsystem


Concurrency Control

10

Process of managing simultaneous operations on the database without having them interfere with one another.

Prevents interference when two or more users are accessing database simultaneously and at least one is updating data.

Although two transactions may be correct in themselves, interleaving of operations may produce an incorrect result.


Concurrency Control Techniques

11

Two basic concurrency control techniques: Locking, Timestamping.

Both are conservative approaches: delay transactions in case they conflict with other transactions.

Optimistic methods assume conflict is rare and only check for conflicts at commit.


Locking

12

Transaction uses locks to deny access to other transactions and so prevent incorrect updates.

Most widely used approach to ensure serializability. Generally, a transaction must claim a shared (read) or

exclusive (write) lock on a data item before read or write.

Lock prevents another transaction from modifying item or even reading it, in the case of a write lock.

Dead Lock might occurs.




Timestamping

13

Transactions ordered globally so that older transactions, transactions with smaller timestamps, get priority in the event of conflict.

Conflict is resolved by rolling back and restarting transaction.

No locks so no deadlock.


Database Recovery

14

Process of restoring database to a correct state in the event of a failure.

Need for Recovery Control Two types of storage: volatile (main memory) and nonvolatile. Volatile storage does not survive system crashes. Stable storage represents information that has been replicated

in several nonvolatile storage media with independent failure modes.


Types of Failures

15

System crashes, resulting in loss of main memory. Media failures, resulting in loss of parts of secondary

storage. Application software errors. Natural physical disasters. Carelessness or unintentional destruction of data or

facilities. Sabotage.


Transactions and Recovery

16

Transactions represent basic unit of recovery. Recovery manager responsible for atomicity and

durability. If failure occurs between commit and database buffers

being flushed to secondary storage then, to ensure durability, recovery manager has to redo (rollforward) transaction’s updates.


Transactions and Recovery

17

If transaction had not committed at failure time, recovery manager has to undo (rollback) any effects of that transaction for atomicity.

Partial undo - only one transaction has to be undone. Global undo - all transactions have to be undone.


Distributed Database Concepts

18

Distributed Database A logically interrelated collection of shared data (and a

description of this data), physically distributed over a computer network.

Distributed DBMS Software system that permits the management of the

distributed database and makes the distribution transparent to users.




Concepts

19

Collection of logically-related shared data. Data split into fragments. Fragments may be replicated. Fragments/replicas allocated to sites. Sites linked by a communications network. Data at each site is under control of a DBMS. DBMSs handle local applications autonomously. Each DBMS participates in at least one global application.


Distributed DBMS


DistributedProcessing

A centralized database that can be accessed over a computer network.


Advantages of DDBMSs

22

Reflects organizational structure Improved shareability and local autonomy Improved availability Improved reliability Improved performance Economics Modular growth


Disadvantages of DDBMSs

23

Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex


Data Warehousing Concepts

24

A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process (Inmon, 1993).




Benefits of Data Warehousing

25

Potential high returns on investment

Competitive advantage

Increased productivity of corporate decision-makers


Comparison of OLTP Systems and Data Warehousing


Data Mart

27

A subset of a data warehouse that supports the requirements of a particular department or business function.

Characteristics include Focuses on only the requirements of one department or

business function. Do not normally contain detailed operational data unlike data

warehouses. More easily understood and navigated.

Original Slides by T. Connolly 28

Typical Data Warehouse and Data Mart Architecture


Business Intelligence Technologies

29

Accompanying the growth in data warehousing is an ever-increasing demand by users for more powerful access tools that provide advanced analytical capabilities.

There are two main types of access tools available to meet this demand, namely Online Analytical Processing (OLAP) and data mining.


Business Intelligence Technologies

30

OLAP and Data Mining differ in what they offer the user and because of this they are complementary technologies.

An environment that includes a data warehouse (or more commonly one or more data marts) together with tools such as OLAP and /or data mining are collectively referred to as Business Intelligence (BI) technologies.




Online Analytical Processing (OLAP)

31

The dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data, Codd (1993).

Describes a technology that uses a multi-dimensional view of aggregate data to provide quick access to strategic information for the purposes of advanced analysis.



32

Enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a wide variety of possible views of the data.

Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise.



33

Can easily answer ‘who?’ and ‘what?’ questions, however, ability to answer ‘what if?’ and ‘why?’ type questions distinguishes OLAP from general-purpose query tools.

Types of analysis ranges from basic navigation and browsing (slicing and dicing) to calculations, to more complex analyses such as time series and complex modeling.


Examples of OLAP applications in various functional areas


Multi-dimensional Data as Three-field table versus Two-dimensional Matrix


Multi-dimensional Data as Four-field Table versus Three-dimensional Cube




OLAP Benefits

37

Increased productivity of end-users. Reduced backlog of applications development for IT staff. Retention of organizational control over the integrity of

corporate data. Reduced query drag and network traffic on OLTP

systems or on the data warehouse. Improved potential revenue and profitability.


Data Mining

38

The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions, (Simoudis,1996).

Involves the analysis of data and the use of software techniques for finding hidden and unexpected patterns and relationships in sets of data.


Data Mining

39

Reveals information that is hidden and unexpected, as little value in finding patterns and relationships that are already intuitive.

Patterns and relationships are identified by examining the underlying rules and features in the data.


Data Mining

40

Tends to work from the data up and most accurate results normally require large volumes of data to deliver reliable conclusions.

Starts by developing an optimal representation of structure of sample data, during which time knowledge is acquired and extended to larger sets of data.


Data Mining

41

Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing.

Relatively new technology, however already used in a number of industries.


Examples of Applications of Data Mining

42

Retail / Marketing Identifying buying patterns of customers Finding associations among customer demographic

characteristics Predicting response to mailing campaigns Market basket analysis




Examples of Applications of Data Mining

43

Insurance Claims analysis Predicting which customers will buy new policies

Medicine Characterizing patient behavior to predict surgery visits Identifying successful medical therapies for different illnesses


Data Mining and Data Warehousing

44

A data warehouse is well equipped for providing data for mining.

Data quality and consistency is a pre-requisite for mining to ensure the accuracy of the predictive models. Data warehouses are populated with clean, consistent data.


Data Mining Operations and Associated Techniques


Example of Classification using Tree Induction


Example of Database Segmentation using a Scatterplot


Example of Database Segmentation using a Visualization




fivedots.coe.psu.ac.thfivedots.coe.psu.ac.th/~suthon/database/booklet.pdf · chapter 1 introduction...

Documents