csc443 database management course introduction professor pepper adapted from presentations given by...

55
CSC443 Database Management Course Introduction Professor Pepper adapted from presentations given by Professor Juliana Freire & Karl Aberer & Yan Chen & Silberschatz, Korth and Sudarshan

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

CSC443 Database Management

Course Introduction

Professor Pepperadapted from presentations given by

Professor Juliana Freire &

Karl Aberer

& Yan Chen

& Silberschatz, Korth and Sudarshan

Today’s Goals

Course OverviewWhy study databases?Why use databases?Intro to Databases

Major Course Objectives

Design and diagram relational databases Create Access and Oracle databasesUse SQL commandsBe able to design a good relational

databaseKnow how to get information out of a

database to answer any question

Diagramming

Use CaseClass DiagramEntity Relationship DiagramAlgebraic Relation Model

Tools

Panther Unix Oracle 9.2.0.1.0

FTP Explorer – register for trialMS Access

BooksDatabase System Concepts 5th Ed

Theory Cross Reference for fourth ed

Oracle 9i Programming - A Primer Practical examples

See course syllabusAvailable in Library

Learning ResourcesBlackboard: my.adelphi.eduWeb site Database System Concepts:

www.db-book.com/My office hours:

Tuesday & Thursday 12:15-1:30; Wed 12-12:30 Alumni 114 or Science Lab

My email: [email protected] phone: 516-747-2362My Web: www.adelphi.edu/~pepperk

Adelphi Account Setup

PantherOracle BlackboardE-mailSignin Sheet

Projects / Grading

Projects: 40% Access – 15 Oracle - 25

Homework assignments: 20%Midterm: 20%Final: 20%.

Assignments

2% dropped for anything 1 day late.10% dropped for anything 2 weeks late.

Delivering assignments

Email ftpdrop boxdiscussion boardmailbox in math department E-mail me if making a change in delivery place. forward your email from Adelphi

What is a Database Management System?

Database Management System = DBMSA collection of files that store the dataA big program written by someone else that

accesses and updates those files for you

Relational DBMS = RDBMSData files are structured as relations (tables)

Why Study Databases?

What is behind this Web Site?

http://www.ticketmaster.com/Search on a large databaseSpecify search conditionsMany usersUpdatesAccess through a web interface

Central to Modern Computer Science

Database Systems: Then

Database Systems: Today

From Friendster.com on-line tour

Field is developing quickly

Current Commercial OutlookA major part of the software industry:

Oracle, IBM, Microsoft, Sybase also Informix (now IBM), Teradata smaller players: java-based dbms, devices, OO, …

Well-known benchmarks (esp. TPC)Lots of related industries

data warehouse, document management, storage, backup, reporting, business intelligence, app integration

Relational products dominant and evolving adapting for extensibility (user-defined types), adding

native XML support.

Open Source coming on strong MySQL, PostgreSQL, BerkeleyDB

Why Study Databases??

Need exploded Corporate: retail swipe/clickstreams, “customer

relationship mgmt”, “supply chain mgmt”, “data warehouses”, etc.

Scientific: digital libraries, Human Genome project, NASA Mission to Planet Earth, physical sensors, grid physics network

?

Why study databases?

Data is valuable:bank account records, tax records,

student records…Protect It! - no matter what

• Hurricane• Flood• Human error

Why study databases?Data often structured:Example: Bank account records all

follow the same structureWe can exploit this regular

structure To retrieve data in useful ways (that

is, we can use a query language) To store data efficiently

Why Study Databases Summary

Central to modern computer scienceDatabases are everywhereCommercially successfulFast moving technologyPlethora of structured data that business and

people need

What is a database?

Whiteboard Exercise

Database Definition

Database – a very large, integrated collection of data. (the stuff)

Models a real-world enterprise Entities (e.g., teams, games) Relationships

(e.g., The Forty-Niners are playing in The Superbowl)

Database Management System – software that stores and manages databases (the tools)

Database is better than simple file system because:

Data redundancy, inconsistency and isolation

Difficult to accessIntegrity problemsAtomicity of updates (change one file and

die before the other completes)Multiple user issues

So a Database Has:representing information

data modeling languages and systems for querying data

complex queries with real semantics* over massive data sets

concurrency control for data manipulation controlling concurrent access ensuring transactional semantics

reliable data storage maintain data semantics even if you pull the plug

• * semantics: the meaning or relationship of meanings of a sign or set of signs

Why Use a Database

Why use a database presentation

What is in a database?

Describing Data: Data ModelsA data model is a collection of concepts for

describing data.A schema is a description of a particular collection

of data, using a given data model.A relation is the data stored in a certain schemaThe relational model of data is the most widely

used model today. Entities and relations among them Integrity constraints and business rules Perspective dependent (warehouse & sales view item

differently)

Database DesignThe process of designing the general structure of the

database:Logical Design – Deciding on the database

schema. Business decision – What attributes Computer Science decision – What relation schemas

Physical Design – Deciding on the physical layout of the database

Data ModelsA collection of tools for describing Data Data relationships Data semantics Data constraints

Relational modelEntity-Relationship data model (mainly for database

design) Object-based data models (Object-oriented and

Object-relational)Semistructured data model (XML)Other older models:

Network model Hierarchical model

The Entity-Relationship Model Models an enterprise as a collection of entities and relationships

Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects

• Described by a set of attributes Relationship: an association among several entities

Represented diagrammatically by an entity-relationship diagram:

Relational Model

ER for concept map to Algebraic Relational Model

Relations (tables of possible data)Instance (actual data at a given time)Schema (description of those tables, their

relations)

Relational Model Terminology

Relational Model Look Notation: p(r) p is called the selection predicate Defined as:

p(r) = {t | t r and p(t)}

Where p is a formula in propositional calculus consisting of terms connected by : (and), (or), (not)Each term is one of:

<attribute>op <attribute> or <constant> where op is one of: =, , >, . <.

Example of selection:

branch_name=“Perryridge”(account)

Object-Relational Data ModelsExtend the relational data model by including

object orientation and constructs to deal with added data types.

Allow attributes of tuples to have complex types, including non-atomic values such as nested relations.

Preserve relational foundations, in particular the declarative access to data, while extending modeling power.

Provide upward compatibility with existing relational languages.

Design Goals

Design Goals:Avoid redundant dataEnsure that relationships among

attributes representedEnsure constraints are properly

modeled: updatescheck for violation of database

integrity constraints.

Bad Design

Queries

What the programmer sees

Some Basic SQL Commands

Select – Get rows of data* - everythingFrom – the name of the table (relation) will followWhere – Only get the stuff that matchesExample: Select * from movies where theater =

LoewsExercise –

Write down the query to select all of your friends that live in NY State

Example: University DatabaseConceptual schema:

Students(sid: string, name: string, login: string, age: integer, gpa:real)

Courses(cid: string, cname:string, credits:integer)

Enrolled(sid:string, cid:string, grade:string)

External Schema (View): Course_info(cid:string,enrollment:integer)

Physical schema: Relations stored as unordered files. Index on first column of Students. Key to good performance

Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

Data Independence (levels of abstraction)

Applications insulated from how data is structured and stored.

Logical data independence: Protection from changes in logical structure of data – stablize views.

Physical data independence: Protection from changes in physical structure of data.

Q: Why are these particularly important for DBMS?

Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

Queries

Change and get data from a databaseRun over data modelEasy & efficientNot good for complex calculationsDML and DDL

Data Manipulation Language (DML)

Language for accessing and manipulating the data organized by the appropriate data model

DML also known as query languageTwo classes of languages

Procedural – user specifies what data is required and how to get those data

Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data

SQL is the most widely used query language

Data Definition Language (DDL) Specification notation for defining the database schema

Example:create table account ( account-number char(10), balance integer)

DDL compiler generates a set of tables stored in a data dictionary Data dictionary contains metadata (i.e., data about data)

Database schema Data storage and definition language

• Specifies the storage structure and access methods used Integrity constraints

• Domain constraints• Referential integrity (references constraint in SQL)• Assertions

Authorization

Queries - What does it look like?

System handles query plan generation & optimization; ensures correct execution.

SELECT eid, ename, title

FROM Emp EWHERE E.sal > $50K

SELECT E.loc, AVG(E.sal)

FROM Emp EGROUP BY E.locHAVING Count(*) > 5

SELECT COUNT DISTINCT (E.eid)FROM Emp E, Proj P, Asgn AWHERE E.eid = A.eid

AND P.pid = A.pidAND E.loc <> P.loc

Issues: view reconciliation, operator ordering, physical operator choice, memory management, access path (index) use, …

EmployeesEmployeesProjectsProjects

AssignmentsAssignments

EmpEmp

SelectSelect

EmpEmp

Group(agg)Group(agg)

HavingHaving

EmpEmp

Count distinctCount distinct

AsgnAsgn

JoinJoin

JoinJoin

ProjProj

SQL

SQL: widely used non-procedural language Example: Find the name of the customer with customer-id 192-83-7465

select customer.customer_namefrom customerwhere customer.customer_id = ‘192-83-7465’

Example: Find the balances of all accounts held by the customer with customer-id 192-83-7465

select account.balancefrom depositor, accountwhere depositor.customer_id = ‘192-83-7465’ and

depositor.account_number = account.account_number Application programs generally access databases through one of

Language extensions to allow embedded SQL Application program interface (e.g., ODBC/JDBC) which allow SQL

queries to be sent to a database For us: Oracle and Access SQL languages

A Look underneath

Concurrency ControlConcurrent execution of user programs: key to good

DBMS performance. Disk accesses frequent, pretty slow Keep the CPU working on several programs concurrently.

Interleaving actions of different programs: trouble! e.g., account-transfer & print statement at same time

DBMS ensures such problems don’t arise. Users/programmers can pretend they are using a single-user

system. (called “Isolation”) Thank goodness! Don’t have to program “very, very

carefully”.

Transactions: ACID PropertiesKey concept is a transaction: a sequence of database

actions (reads/writes).

DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle.

Each transaction, executed completely, must take the DB between consistent states or must not run at all.

DBMS ensures that concurrent transactions appear to run in isolation.

DBMS ensures durability of committed Xacts even if system crashes.

DBMS can enforce simple integrity constraints on the data.

Structure of a DBMS

A typical DBMS has a layered architecture.

The figure does not show the concurrency control and recovery components.

Each database system has its own variations.

Query Optimizationand Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

These layersmust considerconcurrencycontrol andrecovery

Overall System Structure

…must understand how a DBMS works

Databases make these folks happy ... DBMS vendors, programmers $20 million industry

Oracle, IBM, MS, Sybase, … End users Business, education, science, … DB application programmers

Eg smart webmasters Build web services that run off DBMSs

Database administrators (DBAs) Design logical/physical schemas Handle security and authorization Data availability, crash recovery Database tuning as needs evolve

SummaryWhat is a database – lots of data organized into entities and schemes with a manager

Why study databases? – common use, needed for programming apps

Why use databases? – all the advantages over flat file systems

Intro to Databases

Logical layer:

Query language, data models, transactions

Physical layer

Actual files with indexes, query processing, concurrency, recovery & logs