cisc 7610 lecture 2 review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf ·...

37
CISC 7610 Lecture 2 Review of relational databases Topics: Relational database management systems Example data modeling problem Schema normalization SQL queries

Upload: duongdung

Post on 07-Sep-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

CISC 7610 Lecture 2Review of relational databases

Topics:Relational database management systems

Example data modeling problemSchema normalization

SQL queries

Page 2: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Relational database management systems

Page 3: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

A relational database management system (RDBMS)

● Uses relational data structures

● Has a declarative data manipulation language at least as powerful as the relational algebra

● Not required, but typically also– Supports ACID transactions

– Uses SQL as the data manipulation language

Page 4: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Uses relational data structures

● Relation: table with rows and columns

● Attribute: column

● Tuple: row

● Key: combination of attributes that uniquely identifies each row

● Integrity rules: Constraints imposed upon the database

Page 5: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Uses relational data structures

Artist Album Track Num Released Track Dur

David Bowie

Space Oddity

1 1969 Space Oddity

5:15

David Bowie

… Ziggy Stardust ...

10 1972 Suffragette city

3:25

David Bowie

Best of Bowie

1 2002 Space Oddity

5:15

David Bowie

Best of Bowie

8 2002 Suffragette city

3:25

Queen Hot space 11 1982 Under pressure

4:02

Page 6: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Uses relational data structures

Artist Album Track Num Released Track Dur

David Bowie

Space Oddity

1 1969 Space Oddity

5:15

David Bowie

… Ziggy Stardust ...

10 1972 Suffragette city

3:25

David Bowie

Best of Bowie

1 2002 Space Oddity

5:15

David Bowie

Best of Bowie

8 2002 Suffragette city

3:25

Queen Hot space 11 1982 Under pressure

4:02

Relation

Tuple

Attribute Key

Page 7: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Has a declarative data manipulation language

● Declarative: says what, not how to manipulate data– Other examples of declarative programming languages?

● Relational algebra– Selection: extract a subset of tuples

– Projection: extract a subset of attributes

– Cartesian product: extract all combinations of pairs of tuples from two relations

– Union: combine two sets of tuples

– Set difference: remove one set of tuples from another

Page 8: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Supports ACID transactions

● Transaction: A sequence of DB operations that represents a single real-world operation

● ACID properties – Guaranteed by RDBMSs– Atomicity: all operations happen or none

– Consistency: transaction moves DB from one state that meets integrity constraints to another

– Isolation: concurrent transactions have the same effect as serial

– Durability: once committed, transaction’s effects are permanent

● Example: bank account transfer

● Relaxed by NoSQL databases in various combinations

Page 9: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Structured query language (SQL)

● Data definition language– Define relational schemata (plural of “schema”)

– Create/alter/delete tables and their attributes

● Data manipulation language– Insert/delete/modify tuples in relations

– Query one or more tables

● Can implement relational algebra, but also takes some liberties with it

Page 10: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Example data modeling problem

Page 11: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Example data: Music collection

● Artists: Name

● Albums: Name, Release date

● Tracks: Name, Duration, Number

● Each album has one artist

● Tracks can appear on multiple albums (compilations)

Page 12: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Entity-relationship diagrams

EntityAttribute

Relationship

Cardinality

Entity2

Cardinality2

Page 13: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Do: Draw ER diagram for ex data

● Artists: Name

● Albums: Name, Release date

● Tracks: Name, Duration, Number

● Each album has one artist

● Tracks can appear on multiple albums (compilations)

Page 14: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Entity-relationship diagrams

Artist

NameContains

many

Album

Name

Release date

Album

Created

Name

Duration

Track number

many

many

one

Page 15: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Translating ER diagrams to schema

● Entities become tables

● Attributes become their attributes

● Many-to-many relationships become join tables– Can have additional attributes

● Other relationships become foreign keys– One-to-one, many-to-one, one-to-many

– Attributes added to table

Page 16: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Do: Translate ER diagram to schema for example data

Page 17: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Translating ER diagrams to schema

Artists

Id Name

Albums

Id Name Release ArtistId

AlbumsHaveTracks

AlbumId TrackId Number

Track

Id Name Duration

Page 18: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

SQL CREATE statement

CREATE TABLE table_name

(

column_name1 data_type(size),

column_name2 data_type(size),

column_name3 data_type(size),

....

);

Page 19: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

SQL INSERT statement

INSERT INTO table_name

(column1,column2,column3,...)

VALUES

(value1,value2,value3,...);

Page 20: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Do: Populate tables with ex data

Artists

Id Name

1 David Bowie

2 Queen

Albums

Id Name Release ArtistId1 Space oddity 1969 1

2 … Ziggy startdust ...

1972 1

3 Best of Bowie 2002 1

4 Hot space 1982 2

AlbumsHaveTracks

AlbumId TrackId Number

1 1 1

2 2 10

3 1 1

3 2 8

4 3 11

Track

Id Name Duration

1 Space oddity

5:15

2 Suffragette city

3:25

3 Under pressure

4:02

Page 21: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Schema normalization

Page 22: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Schema normalization:Unnormalized data

Artist Album Released Track Num Track Dur

David Bowie Space Oddity

1969 1 Space Oddity

5:15

David Bowie … Ziggy Stardust ...

1972 10 Suffragette city

3:25

David Bowie Best of Bowie

2002 1 Space Oddity

5:15

David Bowie Best of Bowie

2002 8 Suffragette city

3:25

Queen Hot space 1982 11 Under pressure

4:02

Page 23: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Schema normalization:Anomalies in unnormalized data

● The above example unnormalized schema can suffer from three types of “anomalies”–

Page 24: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Schema normalization:Anomalies in unnormalized data

● The above example unnormalized schema can suffer from three types of “anomalies”– Update anomaly: repeated data could be inconsistent

between rows

– Insertion anomaly: can’t add info on artist or album that doesn’t have a track

– Deletion anomaly: deleting the last track deletes an album or artist

Page 25: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Schema normalization:Normal forms

● Schema normalization factors logically independent data into independent relations

● And links them using foreign key relationships● Projection is the process of factoring an unnormalized

relation into separate normalized relations● Boyce-Codd normal form: there are only non-trivial

functional dependencies from superkeys (sets of attributes that uniquely identify entities) to other attributes

Page 26: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Schema normalization:Unnormalized data

Artist Album Released Track Num Track Dur

David Bowie Space Oddity

1969 1 Space Oddity

5:15

David Bowie … Ziggy Stardust ...

1972 10 Suffragette city

3:25

David Bowie Best of Bowie

2002 1 Space Oddity

5:15

David Bowie Best of Bowie

2002 8 Suffragette city

3:25

Queen Hot space 1982 11 Under pressure

4:02

Page 27: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Schema normalization:Normalized data

Artists

Id Name

1 David Bowie

2 Queen

Albums

Id Name Release ArtistId1 Space oddity 1969 1

2 … Ziggy startdust ...

1972 1

3 Best of Bowie 2002 1

4 Hot space 1982 2

AlbumsHaveTracks

AlbumId TrackId Number

1 1 1

2 2 10

3 1 1

3 2 8

4 3 11

Track

Id Name Duration

1 Space oddity

5:15

2 Suffragette city

3:25

3 Under pressure

4:02

Page 28: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Reminder: Main question of course

How can systems process and store multimedia data so that users can find what they are looking for in

the future?

Page 29: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

SQL queries

Page 30: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Queries: find what they are looking for

● Search through the data

● Search through complex relationships

● Aggregate over the data for reporting

● And do all of this efficiently...

Page 31: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

SQL SELECT, single table

SELECT attribute1, attribute2

FROM relation

WHERE attribute1 = 'condition'

ORDER BY attribute2;

Page 32: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Do: Write a select query to answer

What is the duration of “Suffragette City”?

Page 33: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

SQL SELECT, multiple tables

SELECT r1.attribute1, r2.attribute1

FROM relation1 AS r1,

Relation2 AS r2

WHERE attribute1 = 'condition'

AND r1.attribute1 = r2.attribute2

ORDER BY r1.attribute1;

Page 34: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Do: Write a select query to answer

Find the AlbumIds of all of David Bowie's albums

Page 35: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

Do: Write a select query to answer

Find the TrackIds of all of David Bowie's tracks

Page 36: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

How would you write a select query to answer

● Find all songs containing David Bowie's vocals

● Find all songs at 120 beats per minute

● Find all songs sampled by other artists– These all require further modeling or analysis of the

audio...

Page 37: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002

How do we make databases that are

● Effective (correct, durable, coherent, ...)– Transactions

● Efficient– Concurrency

– Memory hierarchy

– Indexing

– Query optimization