it420: database management and organization adina crăiniceanu [email protected]

50
IT420: Database Management and Organization Adina Crăiniceanu [email protected]

Upload: lesley-small

Post on 25-Dec-2015

226 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

IT420: Database Management and Organization

Adina Crăiniceanu

[email protected]

Page 2: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Instructor

Adina Crainiceanu

M.S. and Ph.D. Cornell University Area of Specialization: Databases Research: search in peer-to-peer systems

Page 3: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Database Management and Organization How does Wal-Mart manage its 200 TB

data warehouse? What is the database technology

behind ebay’s website? How do you build an Oracle 9i, IBM DB2

or Microsoft SQL Server database?

Page 4: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Course Goals

Understand the functionality of modern database systems

Understand where database systems fit into an enterprise data management infrastructure

Design and build data-driven applications websites

Learn several important technologies: SQL, PHP, XML, XQuery, web services

Page 5: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Course Workload

Labs + Lectures Grade:

25%: Final Exam 30%: 6-Week and 12-Week Exams 20%: Homeworks, Labs, Quizes 20%: Projects 5%: Class Participation

Page 6: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Evaluation Policies

Assignments: No late submissions

Exams: comprehensive, closed book/ closed notes

Re-grade requests: up to 7 days after grade

Page 7: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Academic Integrity - Honor

Honor Concept of the Brigade of Midshipmen Policies Concerning Graded Academic Work

USNA CS

http://www.cs.usna.edu/academics/honor.htm Collaboration on homeworks is possible, but

submitted work should be your own. Cite any assistance, from any sources

Collaboration on projects, exams, quizzes is prohibited

Page 8: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Resources

Textbook: Database Processing by David Kroenke

Database Management Systems by R. Ramakrishnan and J. Gehrke

MySQL/PHP Database Applications by B. Bulger Microsoft Access reference book Lecture slides Course website:

www.cs.usna.edu/~adina/teaching/it420spring2006

Page 9: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Classroom

No food permitted in classroom No use of computer equipment for any

purpose other than as outlined in the class activity

Page 10: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Course Topics

Database design Relational model SQL Normalization Database administration PHP, MySQL XML Three-tier concepts

Page 11: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Database Management Systems (DBMS) Information is one of the most valuable

resources in this information age How do we effectively and efficiently

manage this information? Relational database management systems

Dominant data management paradigm today

6 billion dollars a year industry!

Page 12: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Why not Files?

Page 13: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Classes

string JobName class Contractorclass Equipment class Date

double charge

Page 14: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

‘Query Processing’class Rental{ public: string job; Contractor Con_data; Equipment Equip_data; Date rent_data; double charge;};

Q: All jobs with Charge > x?A: Rental allRentals[10]; changesGreaterThan(double x){ for(i…){ if (allRentals[i].charge > x) cout << … }}

Page 15: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Problems Changes to Data Data inconsistencies Access Control Security of information (views) Loss of info due to deletion “on the fly” Queries?

Page 16: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Why Database Management Systems? Benefits

Transactions (concurrent data access, recovery from system crashes)

High-level abstractions for data access, manipulation, and administration

Data integrity and security Performance and scalability

Page 17: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

What is a Transaction?

The execution of a program that performs a function by accessing a database.

Examples: Reserve an airline seat. Buy an airline ticket. Withdraw money from an ATM. Verify a credit card sale. Order an item from an Internet retailer.

Page 18: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Transactions

A transaction is an atomic sequence of actions Each transaction must leave the system in a

consistent state (if system is consistent when the transaction starts).

The ACID Properties: Atomicity Consistency Isolation Durability

Page 19: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Example Transaction: Online Store

Your purchase transaction: Atomicity: Either the complete purchase

happens, or nothing Consistency: The inventory and internal

accounts are updated correctly Isolation: It does not matter whether other

customers are also currently making a purchase Durability: Once you have received the order

confirmation number, your order information is permanent, even if the site crashes

Page 20: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Transactions (cont.)

A transaction will commit after completing all its actions, or it could abort (or be aborted by the DBMS) after executing some actions.

Page 21: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Example Transactions: ATM

You withdraw money from the ATM machine Atomicity Consistency Isolation Durability

Commit versus Abort?

Page 22: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

What Makes Transaction Processing Hard? Reliability - system should rarely fail Availability - system must be up all the time Response time - within a few seconds Throughput - thousands of transactions/second Scalability - start small, ramp up to Internet-scale Security – for confidentiality and high finance Configurability - for above requirements + low cost Atomicity - no partial results Durability - a transaction is a legal contract Distribution - of users and data

Page 23: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

What Makes TP Important?

It is at the core of electronic commerce Most medium-to-large businesses use TP

for their production systems. It is a huge slice of the computer system

market

Page 24: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Why Database Management Systems? Benefits

Transactions (concurrent data access, recovery from system crashes)

High-level abstractions for data access, manipulation, and administration

Data integrity and security Performance and scalability

Page 25: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Data Model

A data model is a collection of concepts for describing data.

Examples: ER model (used for conceptual modeling) Relational model, object-oriented model,

object-relational model (actually implemented in current DBMS)

Page 26: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

The Relational Data Model

A relational database is a set of relations. Turing Award (“Nobel Prize” in CS) for Codd in

1980 Example relation:

Student(cid: integer, name: string, byear: integer, state: string)

Page 27: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

The Relational Model: Terminology

Relation instance and schema (table) Field (column) Record or tuple (row) Primary key Foreign key

Page 28: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

The Object-Oriented Data Model

Richer data model. Goal: Bridge mismatch between programming languages and the database system.

Example components of the data model: Relationships between objects directly as pointers.

Result: Can store abstract data types directly in the DBMS Pictures Geographic coordinates Movies CAD objects

Page 29: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Object-Oriented DBMS

Advantages: Engineering applications (CAD and CAM and

CASE computer aided software engineering), multimedia applications.

Disadvantages: Querying is much harder

Page 30: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Object-Relational DBMS

Mixture between the object-oriented and the object-relational data model Combines ease of querying with ability to

store abstract data types Conceptually, the relational model, but every

field

All major relational vendors are currently extending their relational DBMS to the object-relational model

Page 31: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Query Languages

We need a high-level language to describe and manipulate the data

Requirements: Precise semantics Easy integration into applications written in C+

+/Java/Visual Basic/etc. Easy to learn DBMS needs to be able to efficiently evaluate

queries written in the language

Page 32: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

SQL: Structured Query Language

Developed by IBM (System R) in the 1970s

ANSI standard since 1986: SQL-86 SQL-89 (minor revision) SQL-92 (major revision, current standard) SQL-99 (major extensions)

More about SQL in later lectures

Page 33: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Example Query

SELECT

Customers.cid,

Customers.name,

Customers.byear,

Customers.state

FROM Customers

WHERE Customers.cid = 3

Page 34: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Why Database Management Systems? Benefits

Transactions (concurrent data access, recovery from system crashes)

High-level abstractions for data access, manipulation, and administration

Data integrity and security Performance and scalability

Page 35: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Integrity Constraints

Integrity Constraints (ICs): Condition that must be true for any instance of the database.

ICs are specified when schema is defined. ICs are checked when relations are modified. A legal instance of a relation is one that satisfies

all specified ICs. DBMS should only allow legal instances. Example: Domain constraints.

Page 36: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Security

Secrecy: Users should not be able to see things they are not supposed to. E.g., A student can’t see other students’

grades. Integrity: Users should not be able to

modify things they are not supposed to. E.g., Only instructors can assign grades.

Availability: Users should be able to see and modify things they are allowed to.

Page 37: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Why Database Management Systems? Benefits

Transactions (concurrent data access, recovery from system crashes)

High-level abstractions for data access, manipulation, and administration

Data integrity and security Performance and scalability

Page 38: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

DBMS and Performance

Efficient implementation of all database operations

Indexes Query optimization Automatic high-performance concurrent

query execution, query parallelization

Page 39: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Summary Of DBMS Benefits

Transactions ACID properties, concurrency control, recovery

High-level abstractions for data access Data models

Data integrity and security Key constraints, foreign key constraints, access

control Performance and scalability

Parallel DBMS, distributed DBMS, performance tuning

Page 40: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

The Three-Tier Architecture

Presentation tier Client Program (Web Browser)

Middle tier Application Server

Database Management System

Data management tier

Page 41: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Presentation Tier

Primary interface to the user Needs to adapt to different display devices

(PC, PDA, cell phone, voice access?)

Page 42: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Middle Tier

Application Programs: Create and process forms Create and transmit queries Create and process reports Execute application logic: implement complex

actions, maintain state between different steps of a workflow

Access different data management systems

Page 43: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Database Management Tier

One or more standard database management systems: Oracle, DB2, SQL Server, MySQL

Page 44: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Example 1: Airline reservations

Build a system for making airline reservations

Database System

Application Server

Client Program

Page 45: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Example 1: Airline reservations

Build a system for making airline reservations Database System

Airline info, available seats, customer info, etc.

Application Server Logic to make reservations, cancel reservations, add

new airlines, etc.

Client Program Log in different users, display forms and human

readable output

Page 46: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Three-Tier Architecture: Advantages Heterogeneous systems

Tiers can be independently maintained, modified, and replaced Thin clients

Only presentation layer at clients (web browsers) Integrated data access

Several database systems can be handled transparently at the middle tier

Central management of connections Scalability

Replication at middle tier permits scalability of business logic Software development

Code for business logic is centralized Interaction between tiers through well-defined APIs: Can reuse

standard components at each tier

Page 47: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Technologies

Client Program (Web Browser)

Application Server

Database Management System

HTML, Javascript, XSLT

XML, C#, Cookies, XPath,web services

SQL, Stored Procedures

Page 48: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Next: Microsoft Access

DBMS + Application Server

Page 49: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu
Page 50: IT420: Database Management and Organization Adina Crăiniceanu adina@usna.edu

Relational DB => “relate tables”

Tables are related by “keys” which uniquely identify a record in a table