instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授...

33
Instructor 彭彭彭 彭彭彭彭彭彭彭彭彭彭彭彭彭彭彭彭彭 彭彭彭彭彭彭彭彭彭彭彭 彭彭 :87653196 Email: [email protected]

Upload: daniela-pope

Post on 16-Jan-2016

333 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Instructor

彭智勇武汉大学计算机学院珞珈学者特聘教授

软件工程国家重点实验室电话 :87653196

Email: [email protected]

Page 2: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Book

经典原版书库 《 Database System Implementation 》

(美) Hator Garcia-Molina, Jeffrey.D.Ullman, Jennifer Widom 著

( 斯坦福大学 )

机械工业出版社

Page 3: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Marking Scheme

• Assignment (3) ( 练习 ,3 次 ): 15% • Small Test (3) ( 小测验 ,3 次 ): 15%• Final Examination ( 期末考试 ): 70%• Total 100%

Page 4: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Practice

• 安装 PostgreSQL 系统• 分析 PostgreSQL 源代码• 改进 PostgreSQL 系统

http://www.postgresql.org

Page 5: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Database System Implementation

Hector Garcia-Molina

Jeffrey D. Ullman

Jennifer Widom

Page 6: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Chapter 1

Introduction to DBMS Implementation

Page 7: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Database Management System

A database management system (DBMS) is a powerful tool for creating and managing large amounts of data efficiently and allowing it to persist over long periods of time, safely.

Page 8: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Capabilities of a DBMS

• Persistent Storage• Programming interface

allowing the user to access and modify data

through a powerful query language. • Transaction Management Supporting concurrent access to data and

resiliency ( i.e. recovering from failures or errors)

Page 9: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Terminology Review

• Data• Database A collection of data, well organized for access and

modification, preserved over a long period.

• Query• Relation An organization of data into a two-dimensional table.

• Schema (Metadata) A description of the structure of the data

Page 10: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

A Simple DBMS: Megatron 2000

Megatron 2000 is a relational database management system which supports the SQL query language.

Page 11: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Megatron 2000 Implementation

The Relation Students(name, id, dept)

Data: /usr/db/students

Smith#123#CSJohnson#522#EE……

Schema: /usr/db/schema

Students#name#STR#id#INT#dept#STRDepts#name#STR#office#STR……

Page 12: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Execution of Megatron 2000 DBMS

dbhost> megatron2000

WELCOME TO MEGATRON 2000 !

& SELECT *FROM Students #

name id dept

Smith 123 CSJohnson 522 EE

Page 13: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

& SELECT *FROM StudentsWHERE id>= 500 | HighID #

/usr/db/HighID

Johnson#522#EE

Page 14: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

How Megatron 2000 Executes Queries

SELECT * FROM R WHERE <Condition>

1. Read the file schema to determine the attributes of relation R and their types.2. Check that the <Condition> is semantically valid for R.3. Display each of the attribute names as the header of a column, and draw a line.4. Read the file named R, and for each line: (a) Check the condition, and (b) Display the line as a tuple, if the condition is true.

Page 15: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

SELECT * FROM R WHERE <Condition> | T

1. Read the file schema to determine the attributes of relation R and their types.2. Check that the <Condition> is semantically valid for R.3. Read the file named R, and for each line: (a) Check the condition, and (b) Write the result to a new file /usr/db/T, if the condition is true.4. Add to the file /usr/db/schema an entry for T that looks just like the entry for R,

except that relation name T replaces R. That is, the schema for T is the same as the schema for R.

Page 16: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

SELECT officeFROM Students, DeptsWHERE Students.name = ‘Smith’ AND Students.dept = Depts.name #

for (each tuple s in Students) for (each tuple d in Depts) if(s and d satisfy the WHERE-condition) display the office value from Depts;

Example 1.2

Page 17: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (1) of Megatron 2000

Tuple layout on disk

The data layout on disk is not flexible. e.g., - Change string from ‘Cat’ to ‘Cats’ and we have to rewrite file

- ASCII storage is expensive

- Deletions are expensive

Page 18: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (2) of Megatron 2000

Search expensive; no indexese.g., - Cannot find tuple with given key quickly

- Always have to read full relation

Page 19: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (3) of Megatron 2000

Brute force query processingQuery-processing is not clever.

e.g., select *

from R,S

where R.A = S.A and S.B > 1000

- Do select first?

- More efficient join?

Page 20: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (4) of Megatron 2000

• No buffer manager

There is no buffer in main memory.e.g., Need caching

Page 21: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (5) of Megatron 2000

There is no concurrency control.

Page 22: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (6) of Megatron 2000

•No reliabilitye.g., - Can lose data

- Can leave operations half done

Page 23: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (7) of Megatron 2000

No securitye.g., - File system insecure

- File system security is coarse

Page 24: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (8) of Megatron 2000

• No application program interface (API) e.g.,How can a payroll program get at the data?

Page 25: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (9) of Megatron 2000

• Cannot interact with other DBMSs.

Page 26: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (10) of Megatron 2000

• Poor dictionary facilities

Page 27: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Problem (11) of Megatron 2000

• No GUI

Page 28: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Overview of a Database Management System

Storage

Storagemanager

Buffermanager

Index/file/rec-Ord manager

Executionengine

QueryCompiler Transaction Manager

Logging and Recovery

Buffers

DDL Compiler

Concurrency Control

Locktable

User/application

queries,updates

transaction commands

Databaseadministrator

query plan

Index, file, andrecord requests

page commands

read/write pages

logpages

data,metadata,indexes

metadata statistics metadata

DDL Commands

Page 29: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

It is responsible for storing data, metadata, indexes, and logs. An important storage management component is the buffer manager, which keeps portions of the data contents in main memory.

Storage Management

Page 30: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

A user or an application program initiates query to extract data from the database. The query is parsed and optimized by a query compiler. The resulting query plan is executed by the execution engine.

Query Processing

Page 31: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Transaction Management

• Logging and Recovery

• Concurrency Control

• Deadlock Resolution

Page 32: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

Course Overview• Storage-Management Overview C2 Memory hierarchy

C3 Storage of data elements C4 one-dimensional indexes C5 Multidimensional indexes

• Query ProcessingC6 Query Execution

C7 Query compiler and optimizer

• Transaction-Processing C8 System failures C9 Concurrency control C10 More about transaction management

• Information integration

Page 33: Instructor 彭智勇 武汉大学计算机学院珞珈学者特聘教授 软件工程国家重点实验室 电话 :87653196 Email: peng@whu.edu.cn

The course lets students know better ways of building a database management system.