lecture 1 database systems - walailak...
TRANSCRIPT
DATABASE SYSTEMS Lecture 1: Introduction to DBs 1
Lecture 1
Database Systems
ITM661 – Database Systems
• T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 5th edition,
Addison-Wesley, 2009. 6th edition, Addison-Wesley, 2014, ISBN: 0-132-94326-3, (International Edition).
• R. Elmasri and S. B. Navathe, “Fundamentals of Database Systems”, 5th ed., Pearson, 2007, ISBN: 0-321-41506-X.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 2
TextbooksDatabase Systems: A Practical Approach to Design,
Implementation, and Management
By T. Connolly, and C. Begg,
6th edition, Addison-Wesley, 2014
ISBN: 0-132-94326-3
OR
5th edition, Addison-Wesley, 2009
ISBN-10: 0-321-60110-6,
ISBN-13: 978-0-321-60110-0
Fundamentals of Database Systems
By R. Elmasri and S. B. Navathe,
6th edition, Pearson (Addison & Wesley), 2010,
ISBN: 0-136-08620-9
DATABASE SYSTEMS Lecture 1: Introduction to DBs 3
Office Hours and Grading Content
Database and applications;
Database system development lifecycle;
Data modeling;
Relational model;
Database languages;
Design methodology;
Normalization;
Monitoring and tuning the operational systems.
Grading: Attendance 8%
Assignment/project 20% (TT)
Project 12% (SB)
Class activities 10% (SB)
Midterm 25% (TT)
Final 25% (SB)
ITS322 - DBMSs Lecture 1: Introduction to DBs and DB Env. 4
Course Outline
DATABASE SYSTEMS 4Lecture 1: Introduction to DBs
DATABASE SYSTEMS Lecture 1: Introduction to DBs 5
Lecture 1 Introduction to
Databases Systems
ITM661 – Database Systems
• T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 5th edition,
Addison-Wesley, 2009. 6th edition, Addison-Wesley, 2014, ISBN: 0-132-94326-3, (International Edition).
• R. Elmasri and S. B. Navathe, “Fundamentals of Database Systems”, 5th ed., Pearson, 2007, ISBN: 0-321-41506-X.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 6
Objectives
Some common uses of database systems
The characteristics of file-based systems
The problems with the file-based approach
The benefits of database approach
The meaning of the terms database, database
systems, database management system (DBMS)
The typical functions of a DBMS
The advantages and disadvantages of DBMSs
DATABASE SYSTEMS Lecture 1: Introduction to DBs 7
Objectives
The major components of the DBMS environment
The personnel involved in the DBMS environment
Difference between data administration and database administration
Types of database systems
System Catalog and Information Resource Dictionary System (IRDS)
Purposes and the origin of the 3-level database architecture
Concepts and types of data models
Functions and components of a DBMS
DATABASE SYSTEMS Lecture 1: Introduction to DBs 8
Data Versus Information
Data constitute building blocks of information
Information produced by processing data
Information reveals meaning of data
Good, timely, relevant information key to
decision making
Good decision making key to organizational
survival
DATABASE SYSTEMS Lecture 1: Introduction to DBs 9
Where is Database?
The database (DB) is now such an integral part our day-to-day life that often we are not aware we are using one.
Ex: supermarket, credit card, travel agent, library, insurance, security systems, university.First applications focused on clerical tasks
Requests for information quickly followed
File systems developed to address needs
Data organized according to expected use
Data Processing (DP) specialists computerized manual file systems
DATABASE SYSTEMS Lecture 1: Introduction to DBs 10
Types of Databases and DB Applications
Traditional Applications:
Numeric and Textual Databases
More Recent Applications:
Multimedia Databases
Geographic Information Systems (GIS)
Data Warehouses
Real-time and Active Databases
Many other applications
DATABASE SYSTEMS Lecture 1: Introduction to DBs 11
File-based Systems
The file-based system is the predecessor of the
database system. Decentralized
A collection of application programs that perform
services for the end users (e.g. reports).
Each program defines and manages its own data.
File-based systems were an early attempt to
computerize the manual filing system.
The related topics: storage, security, indexing,
cross-reference, processing
DATABASE SYSTEMS Lecture 1: Introduction to DBs 12
Simple File-based System
DATABASE SYSTEMS Lecture 1: Introduction to DBs 13
File-based Processing
DATABASE SYSTEMS Lecture 1: Introduction to DBs 14
File-based System Critique (I)
File-based System Data Management
Requires extensive programming in third-generation
language (3GL)
Time consuming
Makes ad hoc queries impossible
Leads to islands of information
Data Raw Facts
Field Group of characters with specific meaning
Record Logically connected fields that describe a person, place, or thing
File Collection of related records
DATABASE SYSTEMS Lecture 1: Introduction to DBs 15
File-based System Critique (II)
Data Dependence
File structure is defined in the program code.
Change in file‟s data characteristics requires
modification of data access programs
Must tell program what to do and how
Makes file systems cumbersome from programming
and data management views
Structural Dependence
Change in file structure requires modification of related
programs
DATABASE SYSTEMS Lecture 1: Introduction to DBs 16
File-based System Critique (III)
Field Definitions and Naming Conventions
Flexible record definition anticipates reporting requirements
Selection of proper field names important
Attention to length of field names
Use of unique record identifiers
Data Redundancy
Different and conflicting versions of same data
Results of uncontrolled data redundancy
Data anomalies
Data inconsistency
DATABASE SYSTEMS Lecture 1: Introduction to DBs 17
File-based System Critique (IV)
Separation and isolation of data
Each program maintains its own set of data. Users of
one program may be unaware of potentially useful data
held by other programs.
Incompatible file formats
Programs are written in different languages, and so
cannot easily access each others files.
Fixed Queries/Proliferation of application
programs
Programs are written to satisfy particular functions.
Any new requirement needs a new program.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 18
Database Approach
Arose because:
Definition of data was embedded in application
programs, rather than being stored separately and
independently.
No control over access and manipulation of data
beyond that imposed by application programs.
Result - the database and Database Management
System (DBMS).
DATABASE SYSTEMS Lecture 1: Introduction to DBs 19
Database Management
Database is shared, integrated computer structure
housing:
End user data
Metadata
Database Management System (DBMS)
Manages Database structure
Controls access to data
Contains query language
DATABASE SYSTEMS Lecture 1: Introduction to DBs 20
Database
A shared collection of logically related data (and
a description of this data), designed to meet the
information needs of an organization.
System catalog (data dictionary or metadata)
provides the description of the data to enable
program–data independence.
Logically related data comprises entities,
attributes, and relationships of an organization's
information.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 21
Database Systems & DBMS
Database System
A system that occupies a database as a basic storage
Provides the following advantages over file-based
systems
Eliminates inconsistency, data anomalies, data dependency,
and structural dependency problems
Stores data structures, relationships, and access paths
Database Management Systems (DBMS)
A software system that enables users to define, create,
and maintain the database and which provides
controlled access to this database.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 22
Simplified Database System Environment
DATABASE SYSTEMS Lecture 1: Introduction to DBs 23
Database vs. File Systems
DATABASE SYSTEMS Lecture 1: Introduction to DBs 24
DBMS Manages Interaction
DATABASE SYSTEMS Lecture 1: Introduction to DBs 25
Database Management System (DBMS)
DATABASE SYSTEMS Lecture 1: Introduction to DBs 26
Typical DBMS Functionality
Define a particular database in terms of its data types,
structures, and constraints
Construct or Load the initial database contents on a
secondary storage medium
Manipulating the database:
Retrieval: Querying, generating reports
Modification: Insertions, deletions and updates to its content
Accessing the database through Web applications
Processing and Sharing by a set of concurrent users and
application programs – yet, keeping all data valid and
consistent
DATABASE SYSTEMS Lecture 1: Introduction to DBs 27
Typical DBMS Functionality
Other features:
Protection or Security measures to prevent
unauthorized access
“Active” processing to take internal actions on data
Presentation and Visualization of data
Maintaining the database and associated programs over
the lifetime of the database application
Called database, software, and system maintenance
DATABASE SYSTEMS Lecture 1: Introduction to DBs 28
Functions of a DBMS (I)
Data Storage, Retrieval and Update.
Must furnish users with the ability to store, retrieve,
and update data in the database.
A User-Accessible Catalog.
Must furnish a catalog in which descriptions of data
items are stored and which is accessible to users.
Transaction Support
Must furnish a mechanism to ensure that either
(1) all the updates corresponding to a given transaction are made
or
(2) none of them are made.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 29
Functions of a DBMS (II)
Concurrency Control Services
Must furnish a mechanism to ensure that database is
updated correctly when multiple users are updating the
database concurrently.
Recovery Services
Must furnish a mechanism for recovering the database
in the event that the database is damaged in any way.
Authorization Services & Security management
Must furnish a mechanism to ensure that only
authorized users can access the database.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 30
Functions of a DBMS (III)
Support for Data Communication
Must be capable of integrating with communication software.
Integrity Services & Security management Must furnish a means to ensure that both the data in the
database and changes to the data follow certain rules.
Services to Promote Data Independence
Must include facilities to support the independence of programs from the actual structure of the database.
Utility Services
Should provide a set of utility services.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 31
Functions of a DBMS (IV)
Data transformation and presentation
Backup and recovery management
Database language and application
programming interfaces
A view mechanism.
Provides users with only the data they want
or need to use.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 32
Components of a DBMS
1. Query processor
2. Database manager (DM)
3. File manager
4. DML preprocessor
5. DDL compiler
6. Catalog manager
DATABASE SYSTEMS Lecture 1: Introduction to DBs 33
Components of Database Manager (DM)
1. Authorization control
2. Command processor
3. Integrity checker
4. Query optimizer
5. Transaction manager
6. Scheduler
7. Recovery manager
8. Buffer manager
DATABASE SYSTEMS Lecture 1: Introduction to DBs 34
Advantages of Using DB Approach (I)
Controlling redundancy in data storage and in
development and maintenance efforts.
Sharing of data among multiple users.
Restricting unauthorized access to data.
Providing persistent storage for program Objects
Object-oriented DBMSs
Providing Storage Structures (e.g. indexes) for
efficient Query Processing
DATABASE SYSTEMS Lecture 1: Introduction to DBs 35
Advantages of Using DB Approach (II)
Providing backup and recovery services.
Providing multiple interfaces to different classes
of users.
Representing complex relationships among data.
Enforcing integrity constraints on the database.
Drawing inferences and actions from the stored
data using deductive and active rules
DATABASE SYSTEMS Lecture 1: Introduction to DBs 36
Additional Implications of DB Approach (I)
Potential for enforcing standards:
This is very crucial for the success of database
applications in large organizations.
Standards refer to data item names, display formats,
screens, report structures, meta-data (description of
data), Web page layouts, etc.
Reduced application development time:
Incremental time to add each new application is
reduced.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 37
Additional Implications of DB Approach (II)
Flexibility to change data structures:
Database structure may evolve as new requirements are
defined.
Availability of current information:
Extremely important for on-line transaction systems
such as airline, hotel, car reservations.
Economies of scale:
Wasteful overlap of resources and personnel can be
avoided by consolidating data and applications across
departments.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 38
Disadvantages of DBMS
Complexity
Size
Cost of DBMS
Additional hardware costs
Cost of conversion
Performance
Higher impact of a failure
DATABASE SYSTEMS Lecture 1: Introduction to DBs 39
When not to use a DBMS Main inhibitors (costs) of using a DBMS:
High initial investment and possible need for additional hardware.
Overhead for providing generality, security, concurrency control,
recovery, and integrity functions.
When a DBMS may be unnecessary:
If the database and applications are simple, well defined, and not
expected to change.
If there are stringent real-time requirements that may not be met
because of DBMS overhead.
If access to data by multiple users is not required.
If the database system is not able to handle the complexity of data
because of modeling limitations
If the DB users need special operations not supported by the DBMS.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 40
Example of a DB
Some mini-world relationships:
SECTIONs are of specific COURSEs
STUDENTs take SECTIONs
COURSEs have prerequisite COURSEs
INSTRUCTORs teach SECTIONs
COURSEs are offered by DEPARTMENTs
STUDENTs major in DEPARTMENTs
Note: The above entities and relationships are typically
expressed in a conceptual data model, such as the
ENTITY-RELATIONSHIP data model (learn more later)
DATABASE SYSTEMS Lecture 1: Introduction to DBs 41
A Simple Database (I)
DATABASE SYSTEMS Lecture 1: Introduction to DBs 42
A Simple Database (II)
DATABASE SYSTEMS Lecture 1: Introduction to DBs 43
Main Characteristics of the DB Approach
Self-describing nature of a database system:
A DBMS catalog stores the description of a particular
database (e.g. data structures, types, and constraints)
The description is called meta-data.
This allows the DBMS software to work with different
database applications.
Insulation between programs and data:
Called program-data independence.
Allows changing data structures and storage organization
without having to change the DBMS access programs.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 44
A Simplified Database Catalog
DATABASE SYSTEMS Lecture 1: Introduction to DBs 45
Main Characteristics of DB Approach (I)
Data Abstraction:
A data model is used to hide storage details and
present the users with a conceptual view of the
database.
Programs refer to the data model constructs rather
than data storage details
Support of multiple views of the data:
Each user may see a different view of the database,
which describes only the data of interest to that
user.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 46
Main Characteristics of DB Approach (II)
Sharing of data and multi-user transaction
processing: Allowing a set of concurrent users to retrieve from and to
update the database.
Concurrency control within the DBMS guarantees that each
transaction is correctly executed or aborted
Recovery subsystem ensures each completed transaction has
its effect permanently recorded in the database
OLTP (Online Transaction Processing) is a major part of
database applications. This allows hundreds of concurrent
transactions to execute per second.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 47
Hardware
Can range from a PC to a network of computers.
Software
DBMS, operating system, network software (if necessary) and also
the application programs.
Data
Used by the organization and a description of this data called the
schema.
Procedures
Instructions and rules that should be applied to the design and use
of the database and DBMS.
People
DBMS Environment(Major components)
DATABASE SYSTEMS Lecture 1: Introduction to DBs 48
Database Users Users may be divided into
Actors on the Scene
Those who actually use and control the database content,
and those who design, develop and maintain database
applications.
Data Administrator (DA)
Database Administrator (DBA)
Database Designers (Logical and Physical)
Application Programmers
End Users (native and sophisticated)
Workers Behind the Scene
Those who design and develop the DBMS software and
related tools, and the computer systems operators.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 49
Users in DB System Environment
DATABASE SYSTEMS Lecture 1: Introduction to DBs 50
Data Administration vs. Database Administration
Data Administration
The management of the data resource, which includes
database planning, development and maintenance of
standards, policies and procedures, and conceptual and
logical database design.
Database Administration
The management of the physical realization of a
database application, which includes physical database
design and implementation, setting security and
integrity controls, monitoring system performance, and
reorganizing the database, as necessary.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 51
Data Administration Tasks
DATABASE SYSTEMS Lecture 1: Introduction to DBs 52
Database Administration Tasks
DATABASE SYSTEMS Lecture 1: Introduction to DBs 53
DA and DBA – Main Task Differences.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 54
Database System Types
Single-user vs. Multi-user Database
Desktop
Workgroup
Enterprise
Centralized vs. Distributed
Usage Purpose
Production or transactional
Decision support or data warehouse
Multi-user DBMS Architecture
Teleprocessing
File-server
Client-server
DATABASE SYSTEMS Lecture 1: Introduction to DBs 55
Teleprocessing
Traditional architecture.
Single mainframe with a number of terminals
Trend is now towards downsizing.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 56
File-server
File-server is connected to several workstations
across a network.
Database resides on file-server.
DBMS and applications run on each workstation.
Disadvantages include:
Significant network traffic.
Copy of DBMS on each workstation.
Concurrency, recovery and integrity control more
complex.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 57
File-server Architecture
DATABASE SYSTEMS Lecture 1: Introduction to DBs 58
Client-server
Server holds the database and the DBMS.
Client manages the user interface and runs
applications.
Advantages include:
Wider access to existing databases.
Increased performance.
Possible reduction in hardware costs.
Reduction in communication costs.
Increased consistency.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 59
Client-server Architecture
DATABASE SYSTEMS Lecture 1: Introduction to DBs 60
Alternative Client-server Topologies
DATABASE SYSTEMS Lecture 1: Introduction to DBs 61
Summary of Client-server Functions
DATABASE SYSTEMS Lecture 1: Introduction to DBs 62
DBMS Server
Provides database query and transaction services to the clients
Relational DBMS servers are often called SQL servers, query servers, or transaction servers
Applications running on clients utilize an Application Program Interface (API) to access server databases via standard interface such as:
ODBC: Open Database Connectivity standard
JDBC: for Java programming access
Client and server must install appropriate client module and server module software for ODBC or JDBC
DATABASE SYSTEMS Lecture 1: Introduction to DBs 63
Two Tier Client-Server Architecture
A client program may connect to several DBMSs,
sometimes called the data sources.
In general, data sources can be files or other non-
DBMS software that manages data.
Other variations of clients are possible: e.g., in
some object DBMSs, more functionality is
transferred to clients including data dictionary
functions, optimization and recovery across
multiple servers, etc.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 64
Three Tier Client-Server Architecture
Common for Web applications
Intermediate Layer called Application Server or Web
Server:
Stores the web connectivity software and the business logic part
of the application used to access the corresponding data from the
database server
Acts like a conduit for sending partially processed data between
the database server and the client.
Three-tier Architecture Can Enhance Security:
Database server only accessible via middle tier
Clients cannot directly access database server
DATABASE SYSTEMS Lecture 1: Introduction to DBs 65
Three-tier client-server architecture
Three-Tier Client-Server
Client side presented two problems preventing
true scalability:
„Fat‟ client, requiring considerable resources on client‟s
computer to run effectively.
Significant client side administration overhead.
By 1995, three layers proposed, each potentially
running on a different platform.
Pearson Education © 2014 66
Three-Tier Client-Server
Advantages:
„Thin‟ client, requiring less expensive hardware.
Application maintenance centralized.
Easier to modify or replace one tier without affecting others.
Separating business logic from database functions makes it
easier to implement load balancing.
Maps quite naturally to Web environment.
Pearson Education © 2014 67
Three-Tier Client-Server
Pearson Education © 2014 68
Transaction Processing Monitors
Program that controls data transfer between
clients and servers in order to provide a consistent
environment, particularly for Online Transaction
Processing (OLTP).
69Pearson Education © 2014
TPM as middle tier of 3-tier client-server
Pearson Education © 2014 70
DATABASE SYSTEMS Lecture 1: Introduction to DBs 71
System Catalog
A repository of information (metadata) describing
the data in the database.
Typically stores:
Names of authorized users.
Names of data items in the database.
Constraints on each data item.
Data items accessible by a user and the type of access.
It is used by modules such as:
Authorization Control.
Integrity Checker.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 72
Information Resource Dictionary System (IRDS)
Response to an attempt to standardize data dictionary
interfaces.
An IRDS is a software tool that can be used to control and
document an organization‟s information resources.
It provides a definition for the tables that comprise the
data dictionary and the operations that can be used to
access these tables.
Objectives:
Extensibility of data
Integrity of data
Controlled access to data
DATABASE SYSTEMS Lecture 1: Introduction to DBs 73
IRDS Services Interface
DATABASE SYSTEMS Lecture 1: Introduction to DBs 74
Three-Level Architecture of a DB system (Objective)
All users should be able to access same data.
A user's view is immune to changes made in other views.
Users should not need to know physical database storage
details.
DBA should be able to change database storage structures
without affecting the users' views.
Internal structure of database should be unaffected by
changes to physical aspects of storage.
DBA should be able to change conceptual structure of
database without affecting all users.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 75
ANSI-SPARC Three-level Architecture
DATABASE SYSTEMS Lecture 1: Introduction to DBs 76
ANSI-SPARC Three-level Architecture
External Level
Users' view of the database. Describes that part of
database that is relevant to a particular user.
Conceptual Level
Community view of the database. Describes what data
is stored in database and relationships among the data.
Internal Level
Physical representation of the database on the
computer. Describes how the data is stored in the
database.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 77
Difference between Three levels
DATABASE SYSTEMS Lecture 1: Introduction to DBs 78
Data Independence and the ANSI-SPARC 3-level Architecture
DATABASE SYSTEMS Lecture 1: Introduction to DBs 79
Data Independence
Logical Data Independence
The capacity to change the conceptual schema without having to
change the external schemas and their associated application
programs.
Conceptual schema changes e.g. addition/removal of entities.
Should not require changes to external schema or rewrites of
application programs.
Physical Data Independence
The capacity to change the internal schema without having to
change the conceptual schema.
Internal schema changes e.g. using different file organizations,
storage structures/devices.
Should not require change to conceptual or external schemas.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 80
Historical Development of DB Tech. (I)
Early Database Applications:
The Hierarchical and Network Models were introduced in mid
1960s and dominated during the seventies.
A bulk of the worldwide database processing still occurs using
these models, particularly, the hierarchical model.
Relational Model based Systems:
Relational model was originally introduced in 1970, was heavily
researched and experimented within IBM Research and several
universities.
Relational DBMS Products emerged in the early 1980s.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 81
Historical Development of DB Tech. (II)
Object-oriented and emerging applications:
Object-Oriented Database Management Systems (OODBMSs)
were introduced in late 1980s and early 1990s to cater to the need
of complex data processing in CAD and other applications.
Their use has not taken off much.
Many relational DBMSs have incorporated object database
concepts, leading to a new category called object-relational
DBMSs (ORDBMSs)
Extended relational systems add further capabilities (e.g. for
multimedia data, XML, and other data types)
DATABASE SYSTEMS Lecture 1: Introduction to DBs 82
Historical Development of DB Tech. (III)
Data on the Web and E-commerce Applications:
Web contains data in HTML (Hypertext markup language) with links among pages.
This has given rise to a new set of applications and E-commerce is using new standards like XML (eXtended Markup Language).
Script programming languages such as PHP and JavaScript allow generation of dynamic Web pages that are partially generated from a database.
Also allow database updates through Web pages
DATABASE SYSTEMS Lecture 1: Introduction to DBs 83
Extending Database Capabilities
New functionality is being added to DBMSs in the following areas:
Scientific Applications
XML (eXtensible Markup Language)
Image Storage and Management
Audio and Video Data Management
Data Warehousing and Data Mining
Spatial Data Management
Time Series and Historical Data Management
The above gives rise to new research and development in incorporating new data types, complex data structures, new operations and storage and indexing schemes in database systems.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 84
Data Models (I)
Data Model:
A set of concepts to describe the structure of a database, the
operations for manipulating these structures, and certain
constraints that the database should obey.
Data Model Structure and Constraints:
Constructs are used to define the database structure
Constructs typically include elements (and their data types)
as well as groups of elements (e.g. entity, record, table), and
relationships among such groups
Constraints specify some restrictions on valid data; these
constraints must be enforced at all times
DATABASE SYSTEMS Lecture 1: Introduction to DBs 85
Data Models (II)
Collection of concepts for describing data, relationships
between data and constraints on the data in an
organization.
Data Model comprises:
A structural part
Consisting of a set of rules according to which databases can be
constructed.
A manipulative part
Defining the types of operations that are allowed on the data
(update/retrieving data from the DB or changing the DB structure).
Possibly a set of integrity rules
Ensuring that the data is accurate.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 86
Data Models (III)
Data Model Operations:
These operations are used for specifying database
retrievals and updates by referring to the constructs of
the data model.
Operations on the data model may include basic model
operations (e.g. generic insert, delete, update) and user-
defined operations (e.g. compute_student_gpa,
update_inventory)
DATABASE SYSTEMS Lecture 1: Introduction to DBs 87
Data Models (IV) – Levels of Data Models
Conceptual (high-level, semantic) data models:
Provide concepts that are close to the way users perceive data.
(Also called entity-based or object-based data models.)
Physical (low-level, internal) data models:
Provide concepts that describe details of how data is stored in the
computer.
These are usually specified in an ad-hoc manner through DBMS
design and administration manuals
Implementation (representational, logical) data models:
Provide concepts that fall between the above two, used by many
commercial DBMS implementations (e.g. relational data models
used in many commercial systems).
DATABASE SYSTEMS Lecture 1: Introduction to DBs 88
Data Models (V) – Types of Data Models
Types of Data Models
Record-based Data Models
Object-based Data Models
Physical Data Models
The first two models are used to describe data at
the conceptual and Logical levels, the latter is for
the internal level.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 89
Data Models (VI)
Record-based Data Models Hierarchical Data Model
Network Data Model
Relational Data Model
Object-based Data Models Entity-Relationship
Object-Oriented
Semantic or Functional
Physical Data Models Physical data models describe how data is stored in the computer,
representing information such as record structures, record orderings, and access paths
There are not as many physical data models as logical data models, the most common ones being the unifying model and the frame memory
DATABASE SYSTEMS Lecture 1: Introduction to DBs 90
Implementation Data Models
1st generation
2nd generation
3nd generation
DATABASE SYSTEMS Lecture 1: Introduction to DBs 91
Hierarchical Data Model (I)
Initially implemented in a joint effort by IBM and
North American Rockwell around 1965. Resulted in
the IMS family of systems.
IBM‟s IMS product had (and still has) a very large
customer base worldwide
Hierarchical model was formalized based on the IMS
system
Other systems based on this model: System 2k (SAS
inc.)
DATABASE SYSTEMS Lecture 1: Introduction to DBs 92
Hierarchical Data Model (II)
Logically represented by an upside down tree Each parent can have many children
Each child has only one parent
DATABASE SYSTEMS Lecture 1: Introduction to DBs 93
Hierarchical Data Model (III)
Advantages:
Conceptual simplicity, simple to construct and operate
Corresponds to a number of natural hierarchically
organized domains, e.g., organization (“org”) chart
Language is simple:
Uses constructs like GET, GET UNIQUE, GET NEXT,
GET NEXT WITHIN PARENT, etc.
Database security and integrity, Data independence,
Efficiency
DATABASE SYSTEMS Lecture 1: Introduction to DBs 94
Hierarchical Data Model (IV)
Disadvantages:
Navigational and procedural nature of processing
Database is visualized as a linear arrangement of
records
Little scope for "query optimization“
Complex implementation, programming and use
complexity
Difficult to manage and lack of standards
Lacks structural independence
Implementation limitations
DATABASE SYSTEMS Lecture 1: Introduction to DBs 95
Network Data Models (I)
The first network DBMS was implemented by
Honeywell in 1964-65 (IDS System).
Adopted heavily due to the support by CODASYL
(Conference on Data Systems Languages)
(CODASYL - DBTG report of 1971).
Later implemented in a large variety of systems -
IDMS (Cullinet - now Computer Associates), DMS
1100 (Unisys), IMAGE (H.P. (Hewlett-Packard)),
VAX -DBMS (Digital Equipment Corp., next
COMPAQ, now H.P.).
DATABASE SYSTEMS Lecture 1: Introduction to DBs 96
Network Data Model (II)
Each record can have multiple parents Composed of sets
Each set has owner record and member record
Member may have several owners
DATABASE SYSTEMS Lecture 1: Introduction to DBs 97
Network Data Model (III) – Another Example
DATABASE SYSTEMS Lecture 1: Introduction to DBs 98
Network Data Model (IV)
Advantages:
Able to model complex relationships and represents semantics of add/delete on the relationships.
Can handle most situations for modeling using record types and relationship types.
Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET, etc.
Programmers can do optimal navigation through the database.
Conceptual simplicity
Data access flexibility
Promotes database integrity
Data independence
Conformance to standards
DATABASE SYSTEMS Lecture 1: Introduction to DBs 99
Network Data Model (V)
Disadvantages:
Lack of structural independence
Navigational and procedural nature of processing
System complexity, Database contains a complex array
of pointers that thread through a set of records.
Little scope for automated “query optimization”
DATABASE SYSTEMS Lecture 1: Introduction to DBs 100
Relational Data Model (I)
Proposed in 1970 by E.F. Codd (IBM), first commercial
system in 1981-82.
Now in several commercial products (e.g. DB2,
ORACLE, MS SQL Server, SYBASE, INFORMIX).
Several free open source implementations, e.g. MySQL,
PostgreSQL
Currently most dominant for developing database
applications.
SQL relational standards: SQL-89 (SQL1), SQL-92
(SQL2), SQL-99, SQL3, …
DATABASE SYSTEMS Lecture 1: Introduction to DBs 101
Relational Data Model (II)
Represented by a collection of tables (row/column)
Tables related by sharing common entity characteristic(s)
DATABASE SYSTEMS Lecture 1: Introduction to DBs 102
Relational Data Model (III)
Advantages
Structural independence
Improved conceptual simplicity
Easier database design, implementation, management, and use
Ad hoc query capability with SQL
Powerful database management system
Disadvantages
Substantial hardware and system software overhead
Poor design and implementation is made easy
May promote “islands of information” problems
DATABASE SYSTEMS Lecture 1: Introduction to DBs 103
Entity Relationship Data Model
Complements the relational data model concepts
Represented in an entity relationship diagram (ERD)
Based on entities, attributes, and relationships
DATABASE SYSTEMS Lecture 1: Introduction to DBs 104
Entity Relationship Data Model
Advantages
Exceptional conceptual simplicity
Visual representation
Effective communication tool
Integrated with the relational database model
Disadvantages
Limited constraint representation
Limited relationship representation
No data manipulation language
Loss of information content
DATABASE SYSTEMS Lecture 1: Introduction to DBs 105
Object-Oriented Data Model (I)
Several models have been proposed for implementing in a
database system.
One set comprises models of persistent O-O
Programming Languages such as C++ (e.g., in
OBJECTSTORE or VERSANT), and Smalltalk (e.g., in
GEMSTONE).
Additionally, systems like O2, ORION (at MCC - then
ITASCA), IRIS (at H.P.- used in Open OODB).
Object Database Standard: ODMG-93, ODMG-version
2.0, ODMG-version 3.0.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 106
Object-Oriented Data Model (II)
Objects or abstractions of real-world entities are
stored
Attributes describe properties
Collection of similar objects is a class
Methods represent real world actions of classes
Classes are organized in a class hierarchy
Inheritance is ability of object to inherit attributes and
methods of classes above it
DATABASE SYSTEMS Lecture 1: Introduction to DBs 107
Object-Oriented Data Model (III)
Advantages
Adds semantic content
Visual presentation includes semantic content
Database integrity
Both structural and data independence
Disadvantages
Lack of OODM
Complex navigational data access
Steep learning curve
High system overhead slows transactions
DATABASE SYSTEMS Lecture 1: Introduction to DBs 108
OO Model vs. ER Model
DATABASE SYSTEMS Lecture 1: Introduction to DBs 109
Object-Relational Data Model
Most Recent Trend. Started with Informix
Universal Server.
Relational systems incorporate concepts from
object databases leading to object-relational.
Exemplified in the latest versions of Oracle-10i,
DB2, and SQL Server and other DBMSs.
Standards included in SQL-99 and expected to be
enhanced in future SQL standards.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 110
Database Languages
Data definition language (DDL)
Permits specification of data types, structures and any data
constraints. All specifications are stored in the database.
Allows users to describe and name entitles, attributes and
relationships required for the application.
Data manipulation language (DML)
General enquiry facility (query language) of the data.
Provides basic data manipulation operations on data held in the
database.
Procedural DML - allows user to tell system exactly how to
manipulate data.
Non-Procedural DML - allows user to state what data is needed
rather than how it is to be retrieved.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 111
Database Languages
Fourth Generation Language (4GL)
Query Languages
Forms Generators
Report Generators
Graphics Generators
Application Generators
There is no consensus about what constitutes a 4GL.
Compared with a 3GL, which is procedural, a 4 GL is
non-procedural.
The user defines what is to be done, not how.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 112
DBMS Languages (DDL vs. DML)
Data Definition Language (DDL):
Used by the DBA and database designers to specify the
conceptual schema of a database.
In many DBMSs, the DDL is also used to define
internal and external schemas (views).
In some DBMSs, separate storage definition language
(SDL) and view definition language (VDL) are used to
define internal and external schemas.
SDL is typically realized via DBMS commands provided to
the DBA and database designers
DATABASE SYSTEMS Lecture 1: Introduction to DBs 113
DBMS Languages
Data Manipulation Language (DML):
Used to specify database retrievals and updates
DML commands (data sublanguage) can be embedded
in a general-purpose programming language (host
language), such as COBOL, C, C++, or Java.
A library of functions can also be provided to access the
DBMS from a programming language
Alternatively, stand-alone DML commands can be
applied directly (called a query language).
DATABASE SYSTEMS Lecture 1: Introduction to DBs 114
Types of DML
High Level or Non-procedural Language:
For example, the SQL relational language
Are “set”-oriented and specify what data to retrieve
rather than how to retrieve it.
Also called declarative languages.
Low Level or Procedural Language:
Retrieve data one record-at-a-time;
Constructs such as looping are needed to retrieve
multiple records, along with positioning pointers.
DATABASE SYSTEMS Lecture 1: Introduction to DBs 115
DBMS Interfaces
Stand-alone query language interfaces
Example: Entering SQL queries at the DBMS
interactive SQL interface (e.g. SQL*Plus in ORACLE)
Programmer interfaces for embedding DML in
programming languages
User-friendly interfaces
Menu-based, forms-based, graphics-based, etc.
Middleware
Middleware is a generic term used to describe
software that mediates with other software and
allows for communication between disparate
applications in a heterogeneous system.
The need for middleware arises when
distributed systems become too complex to
manage efficiently without a common
interface.
116
Cloud Computing
The National Institute of Standards and
Technology (NIST) provided a definition.
Defined as “A model for enabling ubiquitous,
convenient, on-demand network access to a
shared pool of configurable computing
resources (e.g. networks, servers, storage,
applications, and services) that can be rapidly
provisioned and released with minimal
management effort or service provider
interaction.”
117
Transaction Processing Monitors TP monitor is a program that controls data transfer between
clients and servers in order to provide a consistent environment,
particularly for online transaction processing (OLTP).
118
Web Services and Service-Oriented Architectures
Web service is a software system designed to
support interoperable machine-to-web service
machine interaction over a network.
Web services share business logic, data, and
processes through a programmatic interface
across a network.
Developers can add the Web service to a Web
page (or an executable program) to offer
specific functionality to users.
119
Web Services/Service-Oriented Architectures
Web services approach uses accepted
technologies and standards, such as:
XML (extensible Markup Language).
SOAP (Simple Object Access Protocol) is a
communication protocol for exchanging structured
information over the Internet and uses a message
format based on XML. It is both platform- and
language-independent.
WSDL (Web Services Description Language) protocol,
again based on XML, is used to describe and locate a
Web service.
120
Web Services/Service-Oriented Architectures
UDDI (Universal Discovery, Description, and
Integration) protocol is a platform independent, XML-
based registry for businesses to list themselves on the
Internet.
121
Service-Oriented Architectures (SOA)
A business-centric software architecture for
building applications that implement business
processes as sets of services published at a
granularity relevant to the service consumer.
Services can be invoked, published, and
discovered, and are abstracted away from the
implementation using a single standards-based
form of interface.
122
Distributed DBMSs
A distributed database is a logically
interrelated collection of shared data (and a
description of this data), physically distributed
over a computer network.
A distributed DBMS is the software system
that permits the management of the
distributed database and makes the
distribution transparent to users.
123
Distributed DBMSs
A DDBMS consists of a single logical database
split into a number of fragments.
Each fragment is stored on one or more computers
(replicas) under the control of a separate DBMS,
with the computers connected by a network.
Each site is capable of independently processing
user requests that require access to local data (that
is, each site has some degree of local autonomy)
and is also capable of processing data stored on
other computers in the network.
124
Data Warehousing
A data warehouse was deemed the solution to meet the
requirements of a system capable of supporting decision
making, receiving data from multiple operational data sources.
125
Cloud Computing
The National Institute of Standards and
Technology (NIST) provided a definition.
Defined as “A model for enabling ubiquitous,
convenient, on-demand network access to a
shared pool of configurable computing
resources (e.g. networks, servers, storage,
applications, and services) that can be rapidly
provisioned and released with minimal
management effort or service provider
interaction.”
126
Cloud Computing – Key Characteristics
On-demand self-service
Consumers can obtain, configure and deploy cloud
services without help from provider.
Broad network access
Accessible from anywhere, from any standardized
platform (e.g. desktop computers, laptops, mobile
devices).
127
Cloud Computing – Key Characteristics
Resource pooling
Provider’s computing resources are pooled to serve
multiple consumers, with different physical and
virtual resources dynamically assigned and
reassigned according to consumer demand.
Examples of resources include storage, processing,
memory, and network bandwidth.
128
Cloud Computing – Key Characteristics
Rapid elasticity
Provider’s capacity caters for customer’s spikes in
demand and reduces risk of outages and service
interruptions. Capacity can be automated to scale
rapidly based on demand.
Measured service
Provider uses a metering capability to measure
usage of service (e.g. storage, processing,
bandwidth, and active user accounts).
129
Cloud Computing – Service Models
Software as a Service (SaaS):
Software and data hosted on cloud. Accessed
through using thin client interface (e.g. web
browser). Consumer may be offered limited user
specific application configuration settings.
Examples include Salesforce.com sales management
applications, NetSuite’s integrated business
management software, Google’s Gmail and
Cornerstone OnDemand.
130
Cloud Computing – Service Models
Platform as a Service (PaaS)
Allows creation of web applications without
buying/maintaining the software and underlying
infrastructure. Provider manages the
infrastructure including network, servers, OS and
storage, while customer controls deployment of
applications and possibly configuration.
Examples include Salesforce.com’s Force.com,
Google’s App Engine, and Microsoft’s Azure.
131
Cloud Computing – Service Models
Infrastructure as a Service (IaaS)
Provider’s offer servers, storage, network and
operating systems – typically a platform
virtualization environment – to consumers as an on-
demand service, in a single bundle and billed
according to usage.
A popular use of IaaS is in hosting websites.
Examples Amazon’s Elastic Compute Cloud (EC2),
Rackspace and GoGrid.
132
Cloud Computing – Comparison of Services
Models
133
Benefits of Cloud Computing
Cost-Reduction: Avoid up-front capital expenditure.
Scalability/Agility: Organisations set up resources on an as-
needs basis.
Improved Security: Providers can devote expertise &
resources to security; not affordable by customer.
Improved Reliability: Providers can devote expertise &
resources on reliability of systems; not affordable by
customer.
Access to new technologies: Through use of provider’s
systems, customers may access latest technology.
134
Benefits of Cloud Computing
Faster development: Provider’s platforms can
provide many of the core services to accelerate
development cycle.
Large scale prototyping/load testing: Providers
have the resources to enable this.
More flexible working practices: Staff can access
files using mobile devices.
Increased competitiveness: Allows organizations to
focus on their core competencies rather than their
IT infrastructures.
135
Risks of Cloud Computing
Network Dependency: Power outages, bandwidth issues and
service interruptions.
System Dependency: Customer’s dependency on
availability and reliability of provider’s systems.
Cloud Provider Dependency: Provider could became
insolvent or acquired by competitor, resulting in the
service suddenly terminating.
Lack of control: Customers unable to deploy technical or
organisational measures to safeguard the data. May result
in reduced availability, integrity, confidentiality,
intervenability and isolation.
Lack of information on processing transparency
136
Cloud-based database solutions
As a type of Software as a Service (SaaS),
cloud-based database solutions fall into two
basic categories:
Data as a Service (DaaS) and
Database as a Service (DBaaS).
• Key difference between the two options is
mainly how the data is managed.
137
Cloud-based database solutions
DBaaS
Offers full database functionality to application
developers.
Provides a management layer that provides
continuous monitoring and configuring of the
database to optimized scaling, high availability,
multi-tenancy (that is, serving multiple client
organizations), and effective resource allocation in
the cloud, thereby sparing the developer from
ongoing database administration tasks.
138
Cloud-based database solutions
DaaS:
Services enables data definition in the cloud and
subsequently querying.
Does not implement typical DBMS interfaces (e.g.
SQL) but instead data is accessed via common APIs.
Enables organization with valuable data to offer
access to others. Examples Urban Mapping
(geography data service), Xignite (financial data
service) and Hoovers (business data service.)
139
Cloud-based database solutions
Multi-tenant cloud database-shared server,
separate database server process architecture.
140
Cloud-based database solutions
Multi-tenant cloud database-shared DBMS
server, separate databases.
141
Cloud-based database solutions
Multi-tenant cloud database-shared DBMS
server, separate databases.
142
Cloud-based database solutions
Multi-tenant cloud database–shared database,
separate schema architecture.
143
Components of a DBMS
A DBMS is partitioned into several software
components (or modules), each of which is
assigned a specific operation. As stated
previously, some of the functions of the DBMS
are supported by the underlying operating
system.
The DBMS interfaces with other software
components, such as user queries and access
methods (file management techniques for
storing and retrieving data records).
144
145
Components of a DBMS
Components of a DBMS (Continued)
Query processor is a major DBMS component
that transforms queries into a series of low-
level instructions directed to the database
manager.
Database manager (DM) interfaces with user-
submitted application programs and queries.
The DM examines the external and conceptual
schemas to determine what conceptual records
are required to satisfy the request. The DM
then places a call to the file manager to
perform the request.
146
Components of a DBMS (Continued)
File manager manipulates the underlying
storage files and manages the allocation of
storage space on disk. It establishes and
maintains the list of structures and indexes
defined in the internal schema.
DML preprocessor converts DML statements
embedded in an application program into
standard function calls in the host language.
The DML preprocessor must interact with the
query processor to generate the appropriate
code.
147
Components of a DBMS (Continued)
DDL compiler converts DDL statements into a
set of tables containing metadata. These tables
are then stored in the system catalog while
control information is stored in data file
headers.
Catalog manager manages access to and
maintains the system catalog. The system
catalog is accessed by most DBMS components.
148
149
Components of DB Manager (DM)
Components of the DB Manager
Authorization control to confirm whether the
user has the necessary permission to carry out
the required operation.
Command processor on confirmation of user
authority, control is passed to the command
processor.
Integrity checker ensures that requested
operation satisfies all necessary integrity
constraints (e.g. key constraints) for an
operation that changes the database.
150
Components of the DB Manager
Query optimizer determines an optimal strategy
for the query execution.
Transaction manager performs the required
processing of operations that it receives from
transactions.
Scheduler ensures that concurrent operations
on the database proceed without conflicting
with one another. It controls the relative order
in which transaction operations are executed.
151
Components of the DB Manager
Recovery manager ensures that the database
remains in a consistent state in the presence of
failures. It is responsible for transaction
commit and abort.
Buffer manager responsible for the transfer of
data between main memory and secondary
storage, such as disk and tape.
The recovery manager and the buffer manager
also known as (aka) the data manager. The
buffer manager aka the cache manager.
152