© michael lang, national university of ireland, galway 1 introduction to database systems
Post on 11-Jan-2016
214 Views
Preview:
TRANSCRIPT
© Michael Lang, National University of Ireland, Galway1
Introduction to
Database Systems
© Michael Lang, National University of Ireland, Galway2
Today’s session• Some history
• Origins of DB
• Organisational structure
• Organisational information systems
• Data storage
• DBMS
• Data models
© Michael Lang, National University of Ireland, Galway3
What is “Data”?• ANSI definition:
Data A representation of facts, concepts, or instructions in a
formalized manner suitable for communication, interpretation, or processing by humans or by automatic means.
Any representation such as characters or analog quantities to which meaning is or might be assigned. Generally, we perform operations on data or data items to supply some information about an entity.
• Volatile vs. persistent data Our concern is primarily with persistent data
© Michael Lang, National University of Ireland, Galway4
Early Data Management - Ancient History
• Data are not stored on disk• Programmer defines both logical data structure and physical
structure (storage structure, access methods, I/O modes, etc)• One data set per program. High data redundancy.
PROGRAM 2
DataManagement
PROGRAM 3
DataManagement
PROGRAM 1
DataManagement
DATA SET 1
DATA SET 2
DATA SET 3
© Michael Lang, National University of Ireland, Galway5
Problems• There is no persistence.
All data is transient and disappears when the program terminates.
• Random access memory (RAM) is expensive and limited All data may not fit available memory
• Programmer productivity low The programmer has to do a lot of tedious work.
© Michael Lang, National University of Ireland, Galway6
File Processing - Recent (and Current)
History• Data are stored in files with interface between programs and files.• Various access methods exist (e.g., sequential, indexed, random)• One file corresponds to one or several programs.
PROGRAM 1
DataManagement FILE 1
FILE 2 Red
unda
nt D
ata
PROGRAM 2
DataManagement
PROGRAM 3
DataManagement
FileSystemServices
© Michael Lang, National University of Ireland, Galway7
File System Functions
• Mapping between logical files and physical files Logical files: a file viewed by users and programs.
• Data may be viewed as a collection of bytes or as a collection of records (collection of bytes with a particular structure)
• Programs manipulate logical files Physical files: a file as it actually exists on a storage
device.
• Data usually viewed as a collection of bytes located at a physical address on the device
• Operating systems manipulate physical files.
• A set of services and an interface (usually called application independent interface – API)
© Michael Lang, National University of Ireland, Galway8
Problems With File Systems
• Data are highly redundant sharing limited and at the file level
• Data is unstructured “flat” files
• High maintenance costs data dependence; accessing data is difficult ensuring data consistency and controlling access
to data
• Sharing granularity is very coarse
• Difficulties in developing new applications
© Michael Lang, National University of Ireland, Galway9
PROGRAM 1
PROGRAM 1
PROGRAM 2
IntegratedDatabase
DBMS
Query Processor
Transaction Mgr
…
Database Approach
© Michael Lang, National University of Ireland, Galway10
Origins of Databases• c1455: Gutenberg invents printing. Explosive interest
in publication of books (analogous with explosive growth of Web in early 1990s) leads to public libraries
• Libraries were first to introduce standards for information storage and retrieval
• These paper-based systems were extended and enhanced, and filing, indexing and classification schemes were developed
• Second World War: accelerated R&D in computing technologies spawned capability to computerise maintenance of records
© Michael Lang, National University of Ireland, Galway11
Computerised Data Storage
• Advantages of computerised data storage over paper-based systems include: ability to store data compactly (e.g Britannica CD) enhanced data retrieval ability to access data remotely, e.g. from a mobile
workstation, off-site location, or distant branch ability to share data amongst multiple users with
concurrent access facility to automate regular, speedy back-ups enhanced data editing
• Most significant disadvantage is vulnerability e.g system crash, corrupted data, viruses, hackers, etc.
© Michael Lang, National University of Ireland, Galway12
Information: A Corporate Asset
• Information is a vital corporate asset. Without accurate, current, relevant information, mistakes and misjudgements may be made
• Data management is an essential capability in the modern business environment / information society
• A knowledge organisation is one in which the primary asset is information; its competitive advantage is derived from effective use of documented knowledge. Examples: accounting firms, marketing companies, software houses
• Organisational memory extends and amplifies information / knowledge by capturing, organising, disseminating, and reusing it
© Michael Lang, National University of Ireland, Galway13
Organisational MemoryOrganisational Memory
Field(part of a record reserved for a particular data item)
File / Table(collection of related records)
Database(logical grouping of related files)
Record(collection of related fields, bound together as single units)
Byte(group of eight binary bits - can represent a single character)
© Michael Lang, National University of Ireland, Galway14
Organisational Memory
Organisational memory
Data
Improved products and
services
Informed decisions
© Michael Lang, National University of Ireland, Galway15
Attributes of Organisational Memory
• Current
• Timely real-time systems are commonplace in modern
business environment
• Relevant data is only useful if relevant to task in hand
• Shareable
• Complete
• Accurate and consistent
© Michael Lang, National University of Ireland, Galway16
Attributes of Organisational Memory
• Transportable authorised personnel should have access to data
anywhere, anytime
• Secure against unauthorised access
© Michael Lang, National University of Ireland, Galway17
Organisational Information Systems
• Generally, there is an alignment between business units and core operational systems
• Typical core systems are: Sales & Marketing Department: Customer
management system, Order processing system
Operations Unit: Purchasing system, Inventory control system
Finance Department: Accounts payable and receivable systems, Credit Management system
• There are interdependencies between these systems; hence the need for an integrated data management approach
© Michael Lang, National University of Ireland, Galway18
Data Storage• Information systems create, read, update and
delete data
• Data can be stored in conventional files or databases Files are collections of similar records
Databases are collections of interrelated files. • Records can be linked through specified relationships to
records in other files
© Michael Lang, National University of Ireland, Galway19
Conventional Files• In the file environment, data
storage is built around the applications that will use the files
• Essentially, the file “belongs” to a specific application. This is termed program-data dependence
• As applications are developed, customised files are created which may be unusable by other applications
InformationSystem
FileFile
File
File
© Michael Lang, National University of Ireland, Galway20
Conventional Files• First attempts at computerised storage of records
followed traditional paper-based metaphors (“Flat file” systems)
• Flat files were inefficient for data retrieval: it might be necessary to search entire file for a record (which, it may transpire, does not exist). Remedy: index files
• Indexing improved data retrieval, but conventional files have other disadvantages: Program-Data dependence
Proprietary file formats (closed systems)
Poor scalability
© Michael Lang, National University of Ireland, Galway21
Conventional Files• Disadvantages (Cont’d):
Duplicated and redundant data• ambiguity: same thing being referred to by different names in different
places• inconsistency: conflicting / unsynchronised data• wasted effort
Separation and isolation of data• data dispersed amongst many files complicates processing
Inflexibility• cumbersome data structures and report layouts• not responsive to ad hoc queries• excessive program maintenance
Development environment• procedural -v- non-procedural (3GL -v- 4GL)
© Michael Lang, National University of Ireland, Galway22
Conventional Files• Advantages:
Historically, conventional files have been faster to process than DBMS applications
• As legacy file-based systems become candidates for reengineering, the trend is to replace them with database systems
© Michael Lang, National University of Ireland, Galway23
Database Management Systems
• A database is a large, integrated collection of data which models a real-world enterprise
• A Database Management System (DBMS) is a software package designed to store and manage databases
• In a DBMS environment, applications are built around an integrated adaptable database
Information System
Database
Information System
Information System
Information System
© Michael Lang, National University of Ireland, Galway24
• Advantages: ability to share the same data across multiple applications and
systems
data independence
control of redundancy
enforced data integrity
improved data security
uniform data administration
concurrent access
improved backup and recovery facility
flexible data structures
Database Management Systems
© Michael Lang, National University of Ireland, Galway25
Database Management Systems
• Advantages (Cont’d): databases allow the use of the data in ways not
originally specified by the end-users (ad hoc queries)
database definition can be extended without impacting existing programs that use it
economies of scale
© Michael Lang, National University of Ireland, Galway26
• Disadvantages: database technology is more complex than file
technology• requires more sophisticated hardware and software (DBMS)
DBMS’s can still be slower than file-based systems
database technology requires a significant investment• database administration• operating costs and ongoing maintenance• end-user training
higher impact of system failure
Database Management Systems
© Michael Lang, National University of Ireland, Galway27
Database Management Systems
• Roles in a DBMS environment Data Administrator
Database Administrator
Database Designer
Application Programmer
End-User
© Michael Lang, National University of Ireland, Galway28
Database Architectures
• Hierarchical Data Model
• Network Data Model
• Relational Data Model
• Object-Relational Model
© Michael Lang, National University of Ireland, Galway29
Data Relationships• One-to-One (1:1)
Example: Bank Manager manages one and only one Bank Branch; Bank Branch is managed by one and only one Bank Manager
• One-to-Many (1:M) Example: An Employee works in a designated
Department; in any Department, there may be many Employees
• Many-to-Many (M:M) Example: A Student registers for one or more
Courses; for any Course, there may be one or more registered Students
© Michael Lang, National University of Ireland, Galway30
Hierarchical Data Model
© Michael Lang, National University of Ireland, Galway31
Hierarchical Data Model
• Arose as a solution to a practical problem Managing millions of parts for the space program
(standard “Bill of Materials” problem)
• The basic structure is a hierarchy or tree Parent-child relationship
• Relationships are represented by pointers
• Restrictions: Each segment has at most one parent
All relationships are 1:M
© Michael Lang, National University of Ireland, Galway32
Hierarchical Data Model
• Problem: how to represent a M:M relationship ? A hierarchical structure (tree) can only support
1:M relationships. Therefore, to represent M:M, must create multiple hierarchies
… but, this means that records are duplicated in different hierarchies
duplication gives rise to data anomalies
duplication can be eliminated using pointers; must decide in which table to store data, and in which table to store pointer. This is a very awkward means of implementing M:M
© Michael Lang, National University of Ireland, Galway33
Network Data Model
© Michael Lang, National University of Ireland, Galway34
Network Data Model• Objective was to overcome shortcomings of the
hierarchical data model, in particular, representation of M:M relationship
• Like the hierarchical model, it can be likened to trees; unlike hierarchical model, several trees can share branches
• In practice, enjoyed little commercial success Too complex: suited to use by programmers as
opposed to end-users
Overtaken by the relational model
No clear theoretical base
© Michael Lang, National University of Ireland, Galway35
Network Model: Structures
• Data item a field or attribute
• Record a collection of data items
• Vectors (repeating structures) are permitted
• Relationships are represented by sets
• Sets have owners (parent) and members (children)
• A member cannot have two parents in the one set cannot directly represent M:M relationships
• Member records of a set can be ordered
© Michael Lang, National University of Ireland, Galway36
Shortcomings of Early Models
• Languages of both hierarchical and network are procedural and record-at-a-time
• To retrieve data … you must navigate (find a path) to the required
record
issue multiple statements directing the system to traverse that path
• It was necessary to issue multiple requests to the DBMS to retrieve a data item
• It was necessary to have a detailed understanding of how data was stored and structured this is contrary to data independence principle
© Michael Lang, National University of Ireland, Galway37
Relational Data Model• Relational model, developed by E. F. Codd, has a strong
theoretical basis and overcomes shortcomings of network / hierarchical models
• Relational model is non-procedural (What?, not How?), and set-at-a-time
• Navigation is automatic• Relations
2-dimensional data set consisting of N columns (fields / attributes) and M rows (records)
• All rows in a relation are unique• A relational database is a set of relations
© Michael Lang, National University of Ireland, Galway38
Relational Model: Structures
© Michael Lang, National University of Ireland, Galway39
Relational Model: Structures
© Michael Lang, National University of Ireland, Galway40
Relational Model: Structures
• Primary key A unique identifier of a row in a relation; can be
composite
• Candidate key An attribute that could be a primary key
• Alternative key A candidate key that is not selected as the primary key
• Foreign key An attribute of a relation that is the primary key of
another relation; can be composite
© Michael Lang, National University of Ireland, Galway41
Relational Algebra and SQL
• A major strength of the relational model is that it supports SQL, a flexible data retrieval language which facilitates ad hoc queries
• Relational algebra is a standard for judging a data retrieval language
Relational Algebra SQL
Restrict
ProjectProduct
Union
Difference
SELECT * FROM AWHERE condition
SELECT X FROM A
SELECT * FROM A, B
SELECT * FROM A UNIONSELECT * FROM B
SELECT * FROM AWHERE NOT EXISTS (SELECT * FROM B WHERE A.X = B.X AND A.Y=B.Y AND ...
A where condition
A times B
A union B
A minus B
A [X]
© Michael Lang, National University of Ireland, Galway42
Relational Databases• A truly relational database supports
structures (domains and relations)
integrity rules
a manipulation language
• The word “relational” is sometimes used too freely; many commercial systems are not fully relational because they do not support domains and integrity rules
• E. F. Codd has set forward 12 rules that a database must satisfy before it can be said to be truly relational
© Michael Lang, National University of Ireland, Galway43
Codd’s 12 Rules• Information representation
All data, including metadata (data definition, constraints, user names, etc.), is represented solely and explicitly as values in a table
• Guaranteed access Every value in a database is accessed by specifying a
combination of table name, column name, and value of the primary key of the row in which it is stored
No artificial paths, such as linked lists
• Systematic treatment of null values There must be a distinct representation for unknown data,
irrespective of data type
Null values are not equivalent to zero or the empty string
© Michael Lang, National University of Ireland, Galway44
Codd’s 12 Rules• Dynamic on-line catalog based on the relational model
database description is represented at the logical level in the same way as ordinary data
thus, only one manipulation language needs to be learned
• Comprehensive data sublanguage rule A relational system may support several languages and
various modes of terminal use
However, there must be at least one language that supports data definition, data manipulation, security and integrity constraints, and transaction processing operations
• View updating If the base tables of a view are updated, then the view itself
should be updated
© Michael Lang, National University of Ireland, Galway45
Codd’s 12 Rules• High-level insert, update, and delete
The system must support set-at-a-time operations; for example, a set of rows can be deleted by a single statement
• Physical data independence Changes to storage representation or access methods will
not affect application programs
• Logical data independence Implementation of changes to base tables will not affect
application programs; for example, if a table is restructured or split, applications should be immune to change (views are beneficial here)
© Michael Lang, National University of Ireland, Galway46
Codd’s 12 Rules• Integrity independence
Integrity constraints should be part of a database's definition rather than embedded within application programs
It must be possible to change integrity constraints without affecting any existing application programs
• Distribution independence Introduction of a distributed DBMS or redistributing existing
distributed data should have no impact on existing applications
• Nonsubversion It must not be possible to use low-level record-at-a-time
interface to by-pass high-level set-at-a-time security or integrity constraints
© Michael Lang, National University of Ireland, Galway47
Codd’s Rule 0• A relational DBMS must be able to manage
databases entirely through its relational capacities
• A DBMS is either totally relational or it is not relational
© Michael Lang, National University of Ireland, Galway48
Tasks in the seminars 1 - 3
• In order to do MMDB you need to be able to use SQL – self study sessions with tutor support
top related