cisc 7610 lecture 2 review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf ·...
TRANSCRIPT
CISC 7610 Lecture 2Review of relational databases
Topics:Relational database management systems
Example data modeling problemSchema normalization
SQL queries
Relational database management systems
A relational database management system (RDBMS)
● Uses relational data structures
● Has a declarative data manipulation language at least as powerful as the relational algebra
● Not required, but typically also– Supports ACID transactions
– Uses SQL as the data manipulation language
Uses relational data structures
● Relation: table with rows and columns
● Attribute: column
● Tuple: row
● Key: combination of attributes that uniquely identifies each row
● Integrity rules: Constraints imposed upon the database
Uses relational data structures
Artist Album Track Num Released Track Dur
David Bowie
Space Oddity
1 1969 Space Oddity
5:15
David Bowie
… Ziggy Stardust ...
10 1972 Suffragette city
3:25
David Bowie
Best of Bowie
1 2002 Space Oddity
5:15
David Bowie
Best of Bowie
8 2002 Suffragette city
3:25
Queen Hot space 11 1982 Under pressure
4:02
Uses relational data structures
Artist Album Track Num Released Track Dur
David Bowie
Space Oddity
1 1969 Space Oddity
5:15
David Bowie
… Ziggy Stardust ...
10 1972 Suffragette city
3:25
David Bowie
Best of Bowie
1 2002 Space Oddity
5:15
David Bowie
Best of Bowie
8 2002 Suffragette city
3:25
Queen Hot space 11 1982 Under pressure
4:02
Relation
Tuple
Attribute Key
Has a declarative data manipulation language
● Declarative: says what, not how to manipulate data– Other examples of declarative programming languages?
● Relational algebra– Selection: extract a subset of tuples
– Projection: extract a subset of attributes
– Cartesian product: extract all combinations of pairs of tuples from two relations
– Union: combine two sets of tuples
– Set difference: remove one set of tuples from another
Supports ACID transactions
● Transaction: A sequence of DB operations that represents a single real-world operation
● ACID properties – Guaranteed by RDBMSs– Atomicity: all operations happen or none
– Consistency: transaction moves DB from one state that meets integrity constraints to another
– Isolation: concurrent transactions have the same effect as serial
– Durability: once committed, transaction’s effects are permanent
● Example: bank account transfer
● Relaxed by NoSQL databases in various combinations
Structured query language (SQL)
● Data definition language– Define relational schemata (plural of “schema”)
– Create/alter/delete tables and their attributes
● Data manipulation language– Insert/delete/modify tuples in relations
– Query one or more tables
● Can implement relational algebra, but also takes some liberties with it
Example data modeling problem
Example data: Music collection
● Artists: Name
● Albums: Name, Release date
● Tracks: Name, Duration, Number
● Each album has one artist
● Tracks can appear on multiple albums (compilations)
Entity-relationship diagrams
EntityAttribute
Relationship
Cardinality
Entity2
Cardinality2
Do: Draw ER diagram for ex data
● Artists: Name
● Albums: Name, Release date
● Tracks: Name, Duration, Number
● Each album has one artist
● Tracks can appear on multiple albums (compilations)
Entity-relationship diagrams
Artist
NameContains
many
Album
Name
Release date
Album
Created
Name
Duration
Track number
many
many
one
Translating ER diagrams to schema
● Entities become tables
● Attributes become their attributes
● Many-to-many relationships become join tables– Can have additional attributes
● Other relationships become foreign keys– One-to-one, many-to-one, one-to-many
– Attributes added to table
Do: Translate ER diagram to schema for example data
Translating ER diagrams to schema
Artists
Id Name
Albums
Id Name Release ArtistId
AlbumsHaveTracks
AlbumId TrackId Number
Track
Id Name Duration
SQL CREATE statement
CREATE TABLE table_name
(
column_name1 data_type(size),
column_name2 data_type(size),
column_name3 data_type(size),
....
);
SQL INSERT statement
INSERT INTO table_name
(column1,column2,column3,...)
VALUES
(value1,value2,value3,...);
Do: Populate tables with ex data
Artists
Id Name
1 David Bowie
2 Queen
Albums
Id Name Release ArtistId1 Space oddity 1969 1
2 … Ziggy startdust ...
1972 1
3 Best of Bowie 2002 1
4 Hot space 1982 2
AlbumsHaveTracks
AlbumId TrackId Number
1 1 1
2 2 10
3 1 1
3 2 8
4 3 11
Track
Id Name Duration
1 Space oddity
5:15
2 Suffragette city
3:25
3 Under pressure
4:02
Schema normalization
Schema normalization:Unnormalized data
Artist Album Released Track Num Track Dur
David Bowie Space Oddity
1969 1 Space Oddity
5:15
David Bowie … Ziggy Stardust ...
1972 10 Suffragette city
3:25
David Bowie Best of Bowie
2002 1 Space Oddity
5:15
David Bowie Best of Bowie
2002 8 Suffragette city
3:25
Queen Hot space 1982 11 Under pressure
4:02
Schema normalization:Anomalies in unnormalized data
● The above example unnormalized schema can suffer from three types of “anomalies”–
Schema normalization:Anomalies in unnormalized data
● The above example unnormalized schema can suffer from three types of “anomalies”– Update anomaly: repeated data could be inconsistent
between rows
– Insertion anomaly: can’t add info on artist or album that doesn’t have a track
– Deletion anomaly: deleting the last track deletes an album or artist
Schema normalization:Normal forms
● Schema normalization factors logically independent data into independent relations
● And links them using foreign key relationships● Projection is the process of factoring an unnormalized
relation into separate normalized relations● Boyce-Codd normal form: there are only non-trivial
functional dependencies from superkeys (sets of attributes that uniquely identify entities) to other attributes
Schema normalization:Unnormalized data
Artist Album Released Track Num Track Dur
David Bowie Space Oddity
1969 1 Space Oddity
5:15
David Bowie … Ziggy Stardust ...
1972 10 Suffragette city
3:25
David Bowie Best of Bowie
2002 1 Space Oddity
5:15
David Bowie Best of Bowie
2002 8 Suffragette city
3:25
Queen Hot space 1982 11 Under pressure
4:02
Schema normalization:Normalized data
Artists
Id Name
1 David Bowie
2 Queen
Albums
Id Name Release ArtistId1 Space oddity 1969 1
2 … Ziggy startdust ...
1972 1
3 Best of Bowie 2002 1
4 Hot space 1982 2
AlbumsHaveTracks
AlbumId TrackId Number
1 1 1
2 2 10
3 1 1
3 2 8
4 3 11
Track
Id Name Duration
1 Space oddity
5:15
2 Suffragette city
3:25
3 Under pressure
4:02
Reminder: Main question of course
How can systems process and store multimedia data so that users can find what they are looking for in
the future?
SQL queries
Queries: find what they are looking for
● Search through the data
● Search through complex relationships
● Aggregate over the data for reporting
● And do all of this efficiently...
SQL SELECT, single table
SELECT attribute1, attribute2
FROM relation
WHERE attribute1 = 'condition'
ORDER BY attribute2;
Do: Write a select query to answer
What is the duration of “Suffragette City”?
SQL SELECT, multiple tables
SELECT r1.attribute1, r2.attribute1
FROM relation1 AS r1,
Relation2 AS r2
WHERE attribute1 = 'condition'
AND r1.attribute1 = r2.attribute2
ORDER BY r1.attribute1;
Do: Write a select query to answer
Find the AlbumIds of all of David Bowie's albums
Do: Write a select query to answer
Find the TrackIds of all of David Bowie's tracks
How would you write a select query to answer
● Find all songs containing David Bowie's vocals
● Find all songs at 120 beats per minute
● Find all songs sampled by other artists– These all require further modeling or analysis of the
audio...
How do we make databases that are
● Effective (correct, durable, coherent, ...)– Transactions
● Efficient– Concurrency
– Memory hierarchy
– Indexing
– Query optimization