computing & information sciences kansas state university cis 560: database system concepts...
TRANSCRIPT
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Lecture 01 of 42
Wednesday, 27 August 2008
William H. Hsu
Department of Computing and Information Sciences, KSU
KSOL course page: http://snipurl.com/va60
Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS560
Instructor home page: http://www.cis.ksu.edu/~bhsu
Reading for Next Class:
Sections 2.1 – 2.3, Silberschatz et al., 5th edition
Syllabus and Introductory Handouts
Database Architecture:Client-Server Models, Relational DB Definitions
Wedneday, 27 Aug 2008
Computing & Information SciencesKansas State University
Monday, 25 Aug 2008CIS 560: Database System Concepts
Lecture Outline
Reading for Next Class: Sections 2.1 – 2.3, Silberschatz et al. 5e
Today and Friday: Basic Relational DB Principles Relations
Database definitionsRecords
Fields
Tables
Relations
Next Week: Relational Algebra and SQL Intro Relational operators: PROJECT, SELECT, JOIN
Variations on relational joins
Implementation
Examples and Exercises Look at Chapter 2 examples
Exercises: relational algebra formulas
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Course Administration
Official Course Page (KSOL): http://snipurl.com/va60
Class Web Page: http://www.kddresearch.org/Courses/Fall-2008/CIS560
Instructional E-Mail Addresses [email protected] (always use this to reach instructor)
[email protected] (this goes to everyone)
Instructor: William Hsu, Nichols 213 Office phone: +1 785 532 7905; home phone: +1 785 539 7180
IM: AIM/YIM/MSN hsuwh & rizanabsith, ICQ 28651394 & 191317559
Office hours: after class Mon/Wed/Fri; Tue; other times by appointment
Graduate Teaching Assistant: TBD Office location: Nichols 124
Office hours: to be announced on class web board
Grading Policy Hour exams: 15% each (in-class, closed-book); final (open-book): 30%
Machine problems, problem sets (8 of 10): 16%; term project: 17%
Class participation: 7% (3% attendance, 2% questions, 2% answers)
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Review: Relational Databases,Architectures, Platforms
This Course: Relational DBs, Queries, Indexing, Concurrency Tools: (My)SQL & ORACLE, PHP & JSP Database Management Systems (DBMS)
Storage and retrieval of information Accessed using queries
Data Manipulation Languages (DMLs) Data Description Languages (DDLs) Client-Server Architecture Relational Databases
Based on relations (tuples of entities) Entities: objects organized into sets Relationships
Links between entitiesCorrespond to mathematical relations
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Review:Overall System Structure
Review:Overall System Structure
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Review:Relations Defined
Review:Relations Defined
• A relation on sets S1, S2, …, Sn is a subset of S1 S2 … Sn
• It consists of those tuples (s1, s2, …, sn) that are related
• Finite example: set of pairs (a, b) in {1, 2, 3) such that a > b• Infinite example: >, set of pairs (a, b) of nonnegative integers
aka natural numbers such that a > b
1
2
3
A
1
2
3
B
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Review:Functions Defined
Review:Functions Defined
R Zf
4.3 4
DomainCo-domain
(range)
Pre-image of 4 Image of 4.3
f maps R to Z
f(4.3)
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
A function is a relation that takes an element from a set and maps it to a unique element in another set
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Review:Function Example
Review:Function Example
1
2
3
4
5
“a”
“bb“
“cccc”
“dd”
“e”
A string length function
A
B
C
D
F
Alice
Bob
Chris
Dave
Emma
A class grade function
Domain Co-domainA pre-image
of 1
The imageof A
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
Monday, 25 Aug 2008CIS 560: Database System Concepts
Review:Functions and Non-Function Relations
Review:Functions and Non-Function Relations
1
2
3
4
5
“a”
“bb“
“cccc”
“dd”
“e”
Not a valid function!Also not a valid function!
1
2
3
4
5
a
e
i
o
u
Some function…
Range
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
Monday, 25 Aug 2008CIS 560: Database System Concepts
Review:One-to-One (Injective) Functions
Review:One-to-One (Injective) Functions
1
2
3
4
5
a
e
i
o
A one-to-one function
1
2
3
4
5
a
e
i
o
A function that is not one-to-one
• A function is one-to-one if each element in the co-domain (range) has a unique pre-image
• aka into function
• Every element y that is mapped into has an inverse f-1(y)… but this is not necessarily every element in the co-domain
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
11
More on one-to-oneMore on one-to-one
Injective is synonymous with one-to-one “A function is injective”
A function is an injection if it is one-to-one
Note that there can be un-used elements in the co-domain
1
2
3
4
5
a
e
i
o
A one-to-one functionAdapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
Monday, 25 Aug 2008CIS 560: Database System Concepts
Review:Onto (Surjective) Functions
Review:Onto (Surjective) Functions
1
2
3
4
5
a
e
i
o
A function that is not
onto
1
2
3
4
a
e
i
o
u
An onto function
• A function is onto if each element in the co-domain (range) is an image of some pre-image
• Every element y with a unique pre-image has an inverse f-1(y)… but this is not necessarily every element in the co-domain
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
13
1
2
3
4
a
e
i
o
u
An onto function
More on ontoMore on onto
Surjective is synonymous with onto “A function is surjective”
A function is an surjection if it is onto
Note that there can be multiply used elements in the co-domain
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
14
Onto vs. one-to-oneOnto vs. one-to-one Are the following functions onto, one-to-one, both, or neither?
1
2
3
4
a
b
c
1
2
3
a
b
c
d
1
2
3
4
a
b
c
d
1
2
3
4
a
b
c
d
1
2
3
4
a
b
c
1-to-1, not onto
Onto, not 1-to-1
Both 1-to-1 and onto Not a valid function
Neither 1-to-1 nor onto
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
Monday, 25 Aug 2008CIS 560: Database System Concepts
Review:BijectionsReview:
Bijections
Consider a function that is both one-to-one and onto
Such a function is a one-to-one correspondence, or a bijection
aka permutation aka invertible function
1
2
3
4
a
b
c
d
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
16
Inverse functionsInverse functions
R Rf
4.3 8.6
Let f(x) = 2*x
f-1
f(4.3)
f-1(8.6)
Then f-1(x) = x/2
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
17
More on inverse functionsMore on inverse functions
Can we define the inverse of the following functions?
An inverse function can ONLY be defined on a bijection
1
2
3
4
a
b
c
1
2
3
a
b
c
d
What is f-1(2)?Not onto!
What is f-1(2)?Not 1-to-1!
Adapted from slides © 2005 A. Bloomfield, University of VirginiaCS202 Discrete Mathematicshttp://www.cs.virginia.edu/~asb/teaching/cs202-spring07/
Computing & Information SciencesKansas State University
Wednesday, 24 Jan 2008CIS 560: Database System Concepts
Query ProcessingQuery Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Computing & Information SciencesKansas State University
Wednesday, 24 Jan 2008CIS 560: Database System Concepts
Query Processing (Cont.)Query Processing (Cont.)
Alternative ways of evaluating a given query Equivalent expressions Different algorithms for each operation
Cost difference between a good and a bad way of evaluating a query can be enormous
Need to estimate the cost of operations Depends critically on statistical information about relations which the
database must maintain Need to estimate statistics for intermediate results to compute cost of
complex expressions
Computing & Information SciencesKansas State University
Wednesday, 24 Jan 2008CIS 560: Database System Concepts
Transaction ManagementTransaction Management
A transaction is a collection of operations that performs a single logical function in a database application
Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
Computing & Information SciencesKansas State University
Wednesday, 24 Jan 2008CIS 560: Database System Concepts
Database ArchitectureDatabase Architecture
The architecture of a database systems is greatly influenced by
the underlying computer system on which the database is running: Centralized Client-server Parallel (multi-processor) Distributed
Computing & Information SciencesKansas State University
Wednesday, 24 Jan 2008CIS 560: Database System Concepts
History of Database SystemsHistory of Database Systems
1950s and early 1960s: Data processing using magnetic tapes for storage
Tapes provide only sequential access
Punched cards for input
Late 1960s and 1970s: Hard disks allow direct access to data Network and hierarchical data models in widespread use Ted Codd defines the relational data model
Would win the ACM Turing Award for this work IBM Research begins System R prototypeUC Berkeley begins Ingres prototype
High-performance (for the era) transaction processing
Computing & Information SciencesKansas State University
Wednesday, 24 Jan 2008CIS 560: Database System Concepts
History (cont.)History (cont.)
1980s: Research relational prototypes evolve into commercial systems
SQL becomes industrial standard
Parallel and distributed database systems Object-oriented database systems
1990s: Large decision support and data-mining applications Large multi-terabyte data warehouses Emergence of Web commerce
2000s: XML and XQuery standards Automated database administration
Computing & Information SciencesKansas State University
Wednesday, 24 Jan 2008CIS 560: Database System Concepts
Chapter 2: Relational ModelChapter 2: Relational Model
Structure of Relational Databases Fundamental Relational-Algebra-Operations Additional Relational-Algebra-Operations Extended Relational-Algebra-Operations Null Values Modification of the Database
Computing & Information SciencesKansas State University
Wednesday, 24 Jan 2008CIS 560: Database System Concepts
Example of a RelationExample of a Relation
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Basic StructureBasic Structure Formally, given sets D1, D2, …. Dn a relation r is a subset of
D1 x D2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di
Example: If
customer_name = {Jones, Smith, Curry, Lindsay}
customer_street = {Main, North, Park}
customer_city = {Harrison, Rye, Pittsfield}
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) }
is a relation over customer_name x customer_street x customer_city
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Attribute TypesAttribute Types
Each attribute of a relation has a name The set of allowed values for each attribute is called the domain
of the attribute Attribute values are (normally) required to be atomic; that is,
indivisible Note: multivalued attribute values are not atomic Note: composite attribute values are not atomic
The special value null is a member of every domain The null value causes complications in the definition of many
operations We shall ignore the effect of null values in our main presentation
and consider their effect later
Computing & Information SciencesKansas State University
Wednesday, 24 Jan 2008CIS 560: Database System Concepts
Relation InstanceRelation Instance
The current values (relation instance) of a relation are specified by a table
An element t of r is a tuple, represented by a row in a table
JonesSmithCurryLindsay
customer_name
MainNorthNorthPark
customer_street
HarrisonRyeRyePittsfield
customer_city
customer
attributes(or columns)
tuples(or rows)
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
Relations are UnorderedRelations are Unordered
Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
Example: account relation with unordered tuples
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
DatabaseDatabase
A database consists of multiple relations
Information about an enterprise is broken up into parts, with each relation storing one part of the information
account : stores information about accounts depositor : stores information about which customer owns which account customer : stores information about customers
Storing all information as a single relation such as bank(account_number, balance, customer_name, ..)results in
repetition of information (e.g., two customers own an account)
the need for null values (e.g., represent a customer without an account)
Normalization theory (Chapter 7) deals with how to design relational schemas
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
SummarySummary
RDB Overview Mathematical Relations
Based on sets (of entities) Relation R: subset of Cartesian product of sets Each member of R: one tuple R specifies “which combinations are related”
Relational Algebra Based on mathematical relations
Computing & Information SciencesKansas State University
CIS 560: Database System Concepts
TerminologyTerminology
Database Management Systems (DBMS) Data Manipulation Languages (DMLs) Data Description Languages (DDLs) Client-Server Architecture Relational Databases
Entity Relationship
Relations Subsets of Cartesian product of two or more sets Functions
Functions One-to-one (into function, injection) Onto (surjection) One-to-one & onto (bijection, permutation, invertible function)