introduction to dbms.pdf

41
What is a database? data database information information base knowledge knowledge base wisdom ??? philosophy !!! ??? ... 020-Intro: 1 HKU CSIS0278[AB] 2002-2003 Introduction to Database Systems

Upload: nishima13

Post on 01-Jan-2016

15 views

Category:

Documents


0 download

DESCRIPTION

it gives an introduction to the subject dbms.what is database management system. introduction for beginners

TRANSCRIPT

Page 1: Introduction to dbms.pdf

➤●

What is a database?data database

information information baseknowledge knowledge base

wisdom ???philosophy !!!

??? ...

020-Intro: 1HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 2: Introduction to dbms.pdf

What is a database

◆ A database is a collection of data items

• Usually owned by a single enterprise or organization

• Contain facts the enterprise or organization cares

about

◆ The data items can be text, numbers, dates, sound file,

music, video, among others

◆ Searched by using a key

020-Intro: 2HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 3: Introduction to dbms.pdf

Database application example:Using an ATM

◆ A database in the bank keeps data about your account

◆ Passwords are verified to allow transactions to be done

on your account

◆ Transactions are recorded in the central database of the

bank

◆ Ensures that no two transactions can be done in parallel

in a way that create anomalies

020-Intro: 3HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 4: Introduction to dbms.pdf

Database application example:Searching books in a library

◆ The library INNOPAC system keeps data such as book

titles, call numbers, locations, table of contents, and

user loan records

◆ The content of the database is searched when you query

it for records of the title of a book

◆ Loan status of a book and the user borrow status are

changed when you check out a book

◆ The system allows multiple database transactions to be

carried out at the same time

020-Intro: 4HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 5: Introduction to dbms.pdf

Database application example:Purchasing from a supermarket

◆ The supermarket database keeps data such as product

bar codes, product names, and price

◆ Products are scanned at the checkout counter and is

looked up for the price

◆ Promotion discount information are also kept

◆ The database is also used in acquisition of products by

the supermarket

020-Intro: 5HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 6: Introduction to dbms.pdf

Database application example:ICQ (uh oh!)

◆ An ICQ server is a database containing user information,

contact list (from 2001b on), and online status

◆ When you connect, your ICQ number is sent to an ICQ

server

◆ The server checks the online status of those in your

contact list and show them in your list

◆ The server also inform those in your contact list your

online status

020-Intro: 6HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 7: Introduction to dbms.pdf

Database application example:Hostname resolution

◆ Host names (e.g., virtue.csis.hku.hk) need to be

resolved into IP addresses (147.8.176.10)

◆ Each machine may keep a host table in which host

name to IP addresses mappings are kept

◆ The table can be seen as a local database

◆ If not found, it may consult the Domain Name Server

(DNS)

◆ DNS may be located in a remote machine

◆ DNS may refer your request to a higher level DNS for

address resolution

020-Intro: 7HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 8: Introduction to dbms.pdf

Advanced database applications:Deriving data from databases

◆ Aggregate query:

• Given a database about your favorite singer:

◦ album titles

◦ album release date

◦ song titles in each album

• How many songs does her/his latest album contain?

• How many songs has he/she released in total?

020-Intro: 8HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 9: Introduction to dbms.pdf

Advanced database applications:Discovering information

◆ Discovering information. Remember the spectrum from

data to philosophy?

◆ Association rule mining:

• Given: a supermarket database containing transaction

information about the set of items bought together

by customers

• What combination of items are most frequently

bought together?

020-Intro: 9HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 10: Introduction to dbms.pdf

What are the main issues of database design?

020-Intro: 10HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 11: Introduction to dbms.pdf

effectiveefficient

× storage

retrieval

of data items

020-Intro: 11HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 12: Introduction to dbms.pdf

Effective retrieval

◆ Convenient and painless retrieval of data

◆ Special programs available to suit application

◆ Example: ATMs retrieve your account information

effectively — only a card and a few keypresses are

needed. (by the way, what does ATM stand for?)

020-Intro: 12HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 13: Introduction to dbms.pdf

Efficient retrieval

◆ Fast response of retrieval requests

◆ Enterprises maintain huge databases

• How many credit cards do you have?

• How many mobile phone numbers are there in Hong

Kong? (Visit the OFTA site for an answer)

◆ Index on data required for efficient data access

◆ Concurrent access of databases

020-Intro: 13HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 14: Introduction to dbms.pdf

Effective storage

◆ Convenient creation and modification of data

◆ Retrieved data is consistent with stored data

◆ Special programs available to suit application

◆ Example: file systems are effective in storage of digital

data

◆ What will happen if one process deletes a file while

another is accessing it in unix? Windows?

020-Intro: 14HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 15: Introduction to dbms.pdf

Efficient storage

◆ Data items use up only a limited amount of storage

space

◆ Enterprises maintain huge databases

• How much disk space is needed for a credit card

database in which each record is 64k byte in size?

◆ Reduction of redundant information needed

◆ Sharing of information via suitable database design

020-Intro: 15HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 16: Introduction to dbms.pdf

Database management systems

◆ To achieve the effectiveness and efficiency goals, we

need Database management systems (DBMSes)

◆ A DBMS should:

• Hide low-level implementation details of the database

from most users

• Provide database operations

• Implement database operations efficiently

• Allow multiple users to access the database

concurrently

020-Intro: 16HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 17: Introduction to dbms.pdf

File systems store data effectively,

why not use flat files for database storage?

020-Intro: 17HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 18: Introduction to dbms.pdf

Flat file

◆ General purpose operating systems support file systems

of one kind or another

◆ A file can be seen as a stream of bytes

◆ Data items can be serialized and modeled as a stream of

bytes

◆ Files can be used to implement databases

020-Intro: 18HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 19: Introduction to dbms.pdf

Flat file exampleSavings Account:

array of record

accountNo: char(10); (* unique account number *)

balance: integer; (* balance *)

name: char(18); (* customer name *)

address: char(64); (* customer address *)

end record;

Current Account:array of record

accountNo: char(10); (* unique account number *)

balance: integer; (* balance *)

overdraftLimit: integer; (* overdraft limit *)

name: char(18); (* customer name *)

address: char(64); (* customer address *)

end record;

020-Intro: 19HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 20: Introduction to dbms.pdf

Flat files as byte streamsSavings Account:

accountNo: char(10);

balance: integer;

name: char(18);

address: char(64);

0102000001????Ogino Chihiro

Somewhere in Japan

0102000002????Haku

Aburaya

020-Intro: 20HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 21: Introduction to dbms.pdf

Flat files as byte streamsCurrent Account:

accountNo: char(10);

balance: integer;

overdraftLimit: integer;

name: char(18);

address: char(64);

0102000001????????Ogino Chihiro

Somewhere in Japan

0102000002????????Haku

Aburaya

020-Intro: 21HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 22: Introduction to dbms.pdf

So what?

◆ The file formats are different

◆ Discrepancy is possible because different people may

handle different part of the database

◆ How do you synchronize data in different address books?

• Mobile phone book

• Society member/classmate list

• Little handy cards prepared by friends

• The one in your spreadsheet file

• The one in your PDA

020-Intro: 22HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 23: Introduction to dbms.pdf

Problems!◆ Data redundancy

• Wastes storage space

• May cause data inconsistency

◆ Data dependence

• Causes proliferation of application programs

• Data correctness depends on file formats

◆ Data isolation

◆ Atomicity problem

◆ Concurrent access anomalies

◆ Data access control problem

020-Intro: 23HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 24: Introduction to dbms.pdf

Data redundancy ➤

Savings Account:0102000001????Ogino Chihiro Somewhere in Japan 0102000002????Haku Aburaya

Current account:0102000001????????Ogino Chihiro Somewhere in Japan 0102000002????????Haku Aburaya

◆ The same piece of information may be recorded multiple

times

• Names of account owners

• Addresses of account owners

◆ Wastes storage space

◆ May cause data inconsistency

020-Intro: 24HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 25: Introduction to dbms.pdf

Data inconsistency ➤

Savings Account:0102000001????Ogino Chihiro Somewhere in Japan 0102000002????Haku Aburaya

Current account:0102000001????????OGINO Chihiro Somewhere in Japan 0102000002????????Haku Aburaya

◆ Chihiro’s name is not consistent across the two files

◆ Suppose Chihiro’s has changed her address to Tokyo,

under what occasion would the two files be inconsistent?

020-Intro: 25HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 26: Introduction to dbms.pdf

Data dependence ➤

◆ List the names of all customers who live in Pokfulam:

write a special program

◆ List the names of all customers who live in Pokfulam

having more than 10000 dollars in their balance: write

another special program

◆ A special program need to be written for every query

020-Intro: 26HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 27: Introduction to dbms.pdf

File format dependence ➤

◆ Correctness of data may depend on the file format

◆ For example, the integer 291 decimal, which is 123

hexadecimal, is stored as follows:Big endian mode: 00 00 01 23

Small endian mode: 23 01 00 00

◆ C and C++ storage depends on the byte sex of the CPU

◆ Java stores integers in big endian

◆ Correctly interpreting the integer depends on the CPU

and programming language used

◆ 587268096 is very different from 291 !

020-Intro: 27HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 28: Introduction to dbms.pdf

Data access: what and how

◆ Programs tell the computer how to obtain the required

data

◆ Example: the 11th to 14th byte of the file for Savings

account contains the balance as an integer stored in big

endian format

◆ User queries specify what is needed

◆ Example: what is the address of Haku?

◆ Programmers who know the file format can transform

the “what” to the “how”

020-Intro: 28HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 29: Introduction to dbms.pdf

More “what”, less “how”

◆ A way out: hide implementation details (file formats,

byte sex issues, etc.) from users

◆ Query languages are designed to do that — users only

need to write statements that tell a DBMS what he

wants, rather than programs that contain instructions

on how to obtain the required data

020-Intro: 29HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 30: Introduction to dbms.pdf

Data isolation ➤

◆ Find the names of all customers who have both saving

and current accounts — how? �

◆ What if there are more account types?

◆ Scattering data in different files (and probably handled

by different people) makes programs that require access

of more than one file difficult to write

◆ DBMSes provide a central repository of data shared

among different users to enable avoidance of data

isolation problem

020-Intro: 30HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 31: Introduction to dbms.pdf

Atomicity problem:A fund transfer scenario ➤

◆ Suppose Chihiro wants to transfer HKD 500 from her

Savings account to Haku’s Current account

◆ A program for fund transfer can be used to handle that

◆ The program has to modify the contents of files for

both accounts

◆ Assume that files used are Savings account and

Current account

020-Intro: 31HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 32: Introduction to dbms.pdf

Atomicity problem:fund transfer steps ➤

1. Open the file Savings account

2. Open the file Current account

3. Retrieve record for Chihiro’s savings account

4. Deduce HKD 500 from Chihiro’s record

5. Write the updated record to the Savings account file

6. Retrieve record for Haku’s current account

7. Add HKD 500 to Haku’s record

8. Write the updated record to the Current account file

9. Close the Current account file

10. Close the Savings account file

020-Intro: 32HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 33: Introduction to dbms.pdf

Fund transfer troubles1. Open the file Savings account

2. Open the file Current account

3. Retrieve record for Chihiro’s savings account

4. Deduce HKD 500 from Chihiro’s record

5. Write the updated record to the Savings account file

6. Retrieve record for Haku’s current account

7. Add HKD 500 to Haku’s record

8. Write the updated record to the Current account file

9. Close the Current account file

10. Close the Savings account file

◆ Assume that the file system doesn’t do buffering; write

operations are immediately reflected in files. What if the

system crashes after Step ??? Step ??? Step ???

Step ???020-Intro: 33

HKU CSIS0278[AB] 2002-2003Introduction to Database Systems

Page 34: Introduction to dbms.pdf

Fund transfer troubles

◆ What if there is buffering?

020-Intro: 33HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 35: Introduction to dbms.pdf

Atomicity

◆ Computers may fail: power failure, disk crash, virus

infection, hacker intrusion, . . .

◆ These should not cause data corruption

◆ Half-executed transactions (e.g., fund transfer

operations) should be completed or undone

◆ Transactions should be atomic

◆ All-or-nothing property needed

◆ DBMSes handle database transactions atomically

020-Intro: 34HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 36: Introduction to dbms.pdf

Concurrent access:handling deposits ➤

◆ Suppose the following method is used in a bank to

handle deposits:

void deposit(Acct acct, double sum){

acct.open();double bal=acct.getBalance();bal+=sum;acct.setBalance(bal);acct.close();

}

020-Intro: 35HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 37: Introduction to dbms.pdf

A serial deposit scenario

Deadline is far, only early birds pay the tuition fee:

Time↓

// in HKU branch HKUAcct. // in Central branchdeposit(Account.HKUAcct,21050) balance deposit(Account.HKUAcct,21050)------------------------------ -------- ------------------------------acct.open(); 0double bal=acct.getBalance(); 0bal+=sum; 0acct.setBalance(bal); 21050acct.close(); 21050

21050 acct.open();21050 double bal=acct.getBalance();21050 bal+=sum;42100 acct.setBalance(bal);42100 acct.close();

020-Intro: 36HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 38: Introduction to dbms.pdf

A parallel deposit scenario

Today is the deadline, people rush to pay the tuition fee:

Time↓

// in HKU branch HKUAcct. // in Central branchdeposit(Account.HKUAcct,21050) balance deposit(Account.HKUAcct,21050)------------------------------ -------- ------------------------------acct.open(); 42100double bal=acct.getBalance(); 42100bal+=sum; 42100

42100 acct.open();42100 double bal=acct.getBalance();42100 bal+=sum;63150 acct.setBalance(bal);63150 acct.close();

acct.setBalance(bal); 63150acct.close(); 63150

Two people have paid, why isn’t the balance 84200?

020-Intro: 37HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 39: Introduction to dbms.pdf

Concurrency control◆ Changes in local copies of data items do not affect

others using the data item

◆ Read/Write access to files are not restricted

◆ Concurrency control needed: disallow conflicting

accesses to the same piece of data item by different

transactions

◆ A simple example: lock the record whenever it need to

be accessed

◆ Transaction may need to wait, abort, or restart

◆ Deadlock/livelock problem

020-Intro: 38HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 40: Introduction to dbms.pdf

Data access control ➤

◆ Direct sales department of the bank only need to know

customer name and address

◆ Account balance is sensitive information

◆ No way to restrict access to only these two fields

◆ Difficult to limit access to part of a flat file

◆ DBMSes provide different views of databases (subsets of

data in the database) to different users

020-Intro: 39HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems

Page 41: Introduction to dbms.pdf

●➤

How can DBMSes handle all those problems?

◆ Data redundancy• Wastes storage space• May cause data inconsistency

◆ Data dependence• Causes proliferation of application programs• Data correctness depends on file formats

◆ Data isolation◆ Atomicity problem◆ Concurrent access anomalies◆ Data access control problem

Let’s see what DBMSes can offer.

020-Intro: 40HKU CSIS0278[AB] 2002-2003

Introduction to Database Systems