cs 319: theory of databases: c1
DESCRIPTION
CS 319: Theory of Databases: C1. Dr. Alexandra I. Cristea http://www.dcs.warwick.ac.uk/~acristea/. Lecturers. Alexandra I. Cristea Hugh Darwen: [email protected] Other invited talks?: TBA. Schedule. Usual: Mo 16-17 Wed 9-10 Thu 16-17 Exceptions: week 2: 11.10.06: no course - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/1.jpg)
Dr. Alexandra I. Cristea
http://www.dcs.warwick.ac.uk/~acristea/
CS 319: Theory of Databases: C1
![Page 3: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/3.jpg)
Schedule• Usual:
– Mo 16-17– Wed 9-10– Thu 16-17
• Exceptions:– week 2: 11.10.06: no course– week 3: 16.10.06: 2 hours– week 3: 18.10.06: no course– Others: TBA: check forum, website, course
![Page 4: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/4.jpg)
Slides• Thanks to:
– Dr. Paul Goldberg:• http://www.dcs.warwick.ac.uk/people/academic/Paul.Goldberg/cs319/
cs319index.html
– Dr. Meurig Beynon:• http://www.dcs.warwick.ac.uk/people/staff/Meurig.Beynon/
– Dr. Ad Aerts:• http://wwwis.win.tue.nl/~aaerts/
– Prof. Dr. Paul De Bra:• http://wwwis.win.tue.nl/~debra/
– Others: mentioned directly
![Page 5: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/5.jpg)
Contact
• Forum:
http://forums.warwick.ac.uk/wf/browse/category.jsp?cat=24 • IF (and ONLY IF) a question is personal, you
might address it to [email protected]
– FORMAT: subject of email should contain ‘CS319’ and topic of the email (otherwise it will be filtered out)
![Page 6: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/6.jpg)
Course site(s):• Current:
– http://www.dcs.warwick.ac.uk/~acristea/courses/CS319/ – Will contain current slides, as taught at the course– Will contain notifications: check BEFORE & AFTER the
course
• Official:– http://www.dcs.warwick.ac.uk/undergraduate/modules/cs319.html
• Past:– http://www.dcs.warwick.ac.uk/people/academic/Paul.Goldberg/cs319/cs
319index.html
![Page 7: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/7.jpg)
(provisionary) Content: Topics1. Generalities DB2. Integrity constraints (FD revisited)3. Temporal Data4. Relational Algebra (revisited)5. Query optimisation6. Tuple calculus7. Domain calculus8. Query equivalence9. LLJ, DP and applications10. The Askew Wall11. Datalog
![Page 8: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/8.jpg)
Books
• Korth and Silberschatz, Database System Concepts, McGraw-Hill,1991.
• Ullman J D, Principles of Database Systems (Vols 1 and 2), Computer Science Press,1988.
![Page 9: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/9.jpg)
Purpose of this course
• More in-depth information on the theory of databases
• How (some of) the existing db languages fit in the theory (and how they don’t)
![Page 10: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/10.jpg)
Overlaps and sequencing
• Form: optional
• Prerequisites – CS252: Fundamentals of Database Systems– Optional: CS253: Topics in Database Systems – (actual previous CS233: Database Systems)
![Page 11: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/11.jpg)
Organization of the course• 15 CATS• CS, CSE, CBS, Mathematics• ~ 30 1-hour lectures• Exam at the end: 3 hours• Rules of the game:
– Read also comments on the slides.– Presence is optional, but beware: slides-only are
NOT ENOUGH to learn from for the exam; you need to participate, take your own notes, so self-study!
![Page 12: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/12.jpg)
Scope of CS 319
• Expressive power of db query languages• Algorithms for the computational problems • Limitations to classical relational theory
• A central contribution of theory is to say what cannot be done, not just what can.
• Consider how this observation is relevant to the above topics.
![Page 13: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/13.jpg)
A theme of CS319• How do theory and computing practice relate to
specific reference to databases?• Key ideas:• Theory: based on Codd's relational theory.
– There is an excellent correspondence between relational theory and practical database application of a certain kind.
• Relational databases can be seen as a precursor of 2 principal kinds of computer application: – environments for end-user programming and – computer-based models of real-world state.
![Page 14: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/14.jpg)
1. Database Generalities
![Page 15: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/15.jpg)
DB generalities : What is a database?
• Chris Date:– Database = computer-based record keeping system
• R.W. Engles: "A Tutorial on DB Organisation" (1974)– Db = collection of stored operational data used by
the applications system of a particular enterprise– enterprise: hospital, university, bank, company etc– operational data:
• data on products, accounts, patients etc• typically persistent cf conventional program
IO data
![Page 16: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/16.jpg)
DB generalities : Why use a db?• Case-study: Banking (Korth &Silberschatz Chap. 1)• ? How to meet needs using a traditional file-
processing system supported by a conventional OS• Files: permanent records of customers, accounts• Applications programs (APs): enable user to modify
files– to credit or debit an account– to add a new account– to find the balance in an account– to generate monthly statements
• APs written by systems programmers as required• new requirements new files + new programs
![Page 17: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/17.jpg)
Original context for data modelling 1• 1970s style applications• unsophisticated computer users• batch mode interaction• modest response times• no visualisation or GUI• modest expectations for ease-of-use• programming perceived as technical• simple infrastructure and environment• no PC, web etc• no live feeds of data• textual interaction the norm
![Page 18: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/18.jpg)
Original context for data modelling 2• 1970s style applications
– Business context• simple business model, limited automation, access
etc• low volume of data• not initially distributed
– Computing context• existing/emerging DB proposals unconvincing• computers not very powerful• human and computing resources very expensive
![Page 19: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/19.jpg)
Summary of issues for data management
1. Data redundancy and inconsistency
2. Difficulty in accessing data
3. Data isolation
4. Concurrent access anomalies
5. Security problems
6. Integrity problems
![Page 20: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/20.jpg)
DB generalities : +’s and –’s of DB use
• Conventional file systems have certain characteristics
• we will review the key issues for data management:+ indicates a positive impact of using a database
– indicates a potentially negative impact of using a DB
![Page 21: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/21.jpg)
Data redundancy & inconsistency programmer uses ≠ format file, developed at
≠ stage in history of enterprise
=> data duplicated:
+ in a DB rationalise and standardise data
[rationalise: shared source for data]
… rationalise doesn't necessarily mean centralise
![Page 22: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/22.jpg)
– compromises are needed
• where users suit themselves => efficient results perfect data organisation to suit all users
– duplication: insurance against info loss
Data redundancy & inconsistency
![Page 23: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/23.jpg)
Difficulty in accessing data
• unforeseen requests, new functionality
• new programs, possibly new data structures
+ in a DB, simplify access & manipulation by intelligent organisation of data cf. modelling approach to requirements
e.g. in use of UML in OOSE
![Page 24: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/24.jpg)
Data isolation
• data to retrieve from many sources in APs
• + in DB, hide source physical data : higher level of abstraction
– automation: less human interaction with data– risk of corrupted data
![Page 25: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/25.jpg)
Concurrent access anomalies
• would like multiple access (efficiency & faster response)e.g. simultaneous withdrawal
+ concurrency can't be managed without a form of overall control
![Page 26: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/26.jpg)
Security problems
• to restrict access to (un)authorised users for confidential info
+ security needs a form of overall control
– issues: where should the control be? inside or outside computer system
* e.g., non-trivial problem to determine what can be inferred from responses to queries that aren't explicit
![Page 27: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/27.jpg)
Integrity problems
• integrity constraints– may arise dynamically: – difficult to modify programs to cope; – hard to guarantee if data stored in ≠ files
+ automated management demands a form of overall control
– automation reduces scope for human intervention / interpretation
![Page 28: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/28.jpg)
Conclusions: Issues for data management
• For many commercial applications, a good solution is offered by a database management system (DBMS).
• A DBMS is an unconventional OS operating over a structured file system.
![Page 29: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/29.jpg)
Motivating the DBMS concept
• devise an abstract model of the entire corpus of operational data that simplifies the data processing activity, so that– simple queries can be handled without writing
new application programs– if APs => accessing & manipulating operational
data consistently and efficiently is greatly simplified
![Page 30: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/30.jpg)
The ingredients of a database
• Data• integrated• shared• (possibly distributed)• Hardware• storage• Software• database management system: DBMS• protects users from hardware level detail• serves the needs of all users
![Page 31: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/31.jpg)
DB Users• end-user:
– non-specialist accessing data via a query language
– naïve user accessing data via a special-purpose interface
performs data retrieval and update (extend / modify)
• applications programmer:– writes programs that use the DB by embedding queries to
the DB in a HLL
– develops interfaces for the naïve user
![Page 32: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/32.jpg)
DB Users • Database Administrator (DBA):• responsible for overall control• decides what data is to be stored• designs the conceptual scheme
used to represent the operational data• implements authorisation checks• decides strategy for backup and recovery• monitors performance• oversees modification to suit user requirements
![Page 33: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/33.jpg)
Data abstraction in a db
• addresses issues of design, use, management and implementation
• Data model describes (formally) 3 different levels of abstraction:
• physical level
• conceptual level
• view level
![Page 34: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/34.jpg)
3 Levels• physical level:• how is the data actually represented in the hardware?• bits, bytes• conceptual level:• what meaningful relationships are expressed by the
physical data?• entities, and relationships between entities• view level:• what particular relationships are required by users?• more abstract because partial typically very
high-level knowledge constitutes the view
![Page 35: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/35.jpg)
Illustrating data abstraction:
• DB w. date of birth of a client (bit string).
• senior citizens?: clients aged > 65
• Representations at ≠ abstraction levels:
– Conceptual: date of birth
– Physical: bit string
– View: age (not stored in DB!!)
![Page 36: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/36.jpg)
Data abstraction in a database
views conceptual model
physical layout
external schemas
internal schemas
sub-schemes
conceptual scheme
physical scheme
USE
DESIGN&
MANAGEMENT IMPLEMENTATION
![Page 37: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/37.jpg)
Role of abstraction (1)
• Internal & external translation schemas:
* protect conceptual model from change
• when physical organisation changes / new views are required
* protect user from a need to change views
* protect altering physical organisation
• if conceptual model is modified
![Page 38: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/38.jpg)
Role of data abstraction (2)
• physical data independence:
protecting conceptual model from change when
physical organisation changes
• logical data independence:
protecting user from need to change views when
conceptual model changes
![Page 39: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/39.jpg)
Characteristics of electronic data 1970
• “Abstract model of the entire corpus of operational data”• Demands of the abstract model in 1970 quite low …
– small volumes of data, modest performance– limited levels of volatility and automation tolerated
• Today different, BUT – (subject to viewing human agency as a metaphor for any
agency, )– key issues of a classical database are still vital
• Any DB modelling paradigm must handle 70s problems
![Page 40: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/40.jpg)
db models
• 2 principal kinds of abstract data model:– object-based models– record-based models
• earliest• reflects file system culture they displaced
![Page 41: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/41.jpg)
Object-based models
• main models:– entity-relationship models– object-oriented data models
• Others: semantic & functional data models.
• E-R model widely used to model data abstractly
• OO model gaining acceptance in practice: effectively represents data + operations on it.
![Page 42: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/42.jpg)
Record-based Logical Models
• Used at the conceptual and view levels. Specify both– overall logical structure of the database– higher-level description of the implementation.
• Record-based: uses records in fixed-format of several types. simplifies implementation <> trend towards richness
and variety in structures used to implement OODBs
![Page 43: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/43.jpg)
Varieties of record-based logical model
• hierarchical model
• records & links organised as a family of trees
• network model
• records & links organised as a family of graphs
• relational model
• uses tables to record relationships between data
![Page 44: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/44.jpg)
Physical Data Models
• not our primary concern in this module.• Relevant issues:
– are data tables stored using hashing?– how are data tables indexed?– how are entries in data tables encoded and ordered?– what algorithms are used to retrieve and update?
![Page 45: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/45.jpg)
Classical database features• Instances and Schemes• State of DB changes over time: structure vs. current state.
• overall design of DB = database schema• current content of DB = instance of the DB
• Useful analogy with procedural variables:• database schema type definition for variable• instance of database value of the variable
![Page 46: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/46.jpg)
Data abstraction and schemas
• physical scheme at lowest level
• conceptual scheme at intermediate level
• several sub-schemes (possibly user-defined) at highest level (views of the DB)
![Page 47: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/47.jpg)
Data Definition Language (DDL)
• for database schema • compiling DDL: Data Dictionary• the storage & access methods: specified in storage &
definition languageImplementation details usually hidden from users
![Page 48: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/48.jpg)
Data Manipulation Language (DML)
• data manipulation: accessing DB to retrieve, insert, delete, or modify data
• data retrieval– most common – "querying the DB"
• retrieval component of DML = query language (abusively: ‘query language’ ~ synonym DML)
![Page 49: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/49.jpg)
Varieties of Data Manipulation Language
• There is a tension between• efficiency at physical level• intelligibility / ease of use at higher level• Have both procedural and non-procedural DMLs• procedural: requires knowledge of data
implementation• non-procedural: need only specify what data is
needed
![Page 50: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/50.jpg)
Data Manipulation Languages (DML) for typical data models
• Procedural: – object-based, hierarchical, network models – user explicit responsibility: optimising queries,
• needs knowledge of data organisation
• Non-procedural: – relational models– formulate queries without above,
• implementation has to be optimised
![Page 51: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/51.jpg)
Optimisation• Database Manager (program module!)
– interfaces between low-level data in DB and application programs & user queries.
• Large volumes of data (available technology)– gigabytes thousand megabytes = 1 billion bytes [!]– terabytes million megabytes = 1 trillion bytes
• Requires auxiliary storage media, such as disk, CD etc.
• Optimisation is primarily concerned with eliminating data transfers between main and auxiliary memory.
![Page 52: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/52.jpg)
Functions of the DB manager program module
• query processing– interacting with the file manager modules doing actual
operations on physical data• integrity enforcement
– checking that data in the DB conforms to specified constraints
• security enforcement– ensuring that authorisation is given for access to data
• backup and recovery– coping with failure, and recovery to consistent DB state
• concurrency control– ensuring that simultaneous transactions do not interfere.
![Page 53: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/53.jpg)
Role of Database Administrator
• scheme definition using DDL
• specification of storage structure & access method
• modification of schemes and physical storage as
required
• granting authorisation for data access
• specifying integrity constraints
![Page 54: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/54.jpg)
Overall DB system structure• Processing components
– file manager– database manager– query processor– DML precompiler (to process DML embedded in APs)– DDL compiler
• Data structures– data files: actual content of db– data dictionary meta-data– Indices: auxiliary files to assist fast access
![Page 55: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/55.jpg)
Summary
• We have revisited the generalities of database systems– Background of appearance– Main problems they try to solve– Components & languages
![Page 56: CS 319: Theory of Databases: C1](https://reader034.vdocuments.site/reader034/viewer/2022051216/56814e64550346895dbc01ab/html5/thumbnails/56.jpg)
… to follow
How to reason with Integrity Enforcements (Constraints):Functional Dependencies (FDs) applied