physical database design barry floyd bus 498 advanced database management systems

Post on 12-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Physical Database Design

Barry Floyd

BUS 498Advanced Database Management Systems

Introduction

The Physical Database Design Process

Goal is to translate our conceptual designs into physical reality

Draw on requirements analysis and our conceptual data model

Agenda

Data Volume and Usage AnalysisData Distribution Strategy

discuss this later in the quarterIndexesDenormalization

Overview

Important step in the database design process (also the last step)

Decisions made here impact ... data accessibility response times usability

Vocabulary

Data volume - how many recordsData usage - how often and in what

manner are the records used

Data Volume Analysis

Use volume analysis to select physical storage devices estimate costs of storage

Data Volume Analysis

TREATMENTTREATMENT PATIENTPATIENT PHYSICIANPHYSICIAN5050

CHARGECHARGE ITEMITEM500500

LOCATIONLOCATION100100

GIVENGIVEN

GIVENGIVEN

GIVENGIVEN

Data Volume Analysis

TREATMENTTREATMENT PATIENTPATIENT10001000

PHYSICIANPHYSICIAN5050

CHARGECHARGE ITEMITEM500500

LOCATIONLOCATION100100

* Keep patient record active* Keep patient record active for 30 daysfor 30 days* Average length of stay * Average length of stay for a patient is 3 daysfor a patient is 3 days

100 X 30 / 3 => 1000100 X 30 / 3 => 1000

* Keep patient record active* Keep patient record active for 30 daysfor 30 days* Average length of stay * Average length of stay for a patient is 3 daysfor a patient is 3 days

100 X 30 / 3 => 1000100 X 30 / 3 => 1000

(10)(10)

(20)(20)

DERIVEDERIVE

Data Volume Analysis

TREATMENTTREATMENT40004000

PATIENTPATIENT10001000

PHYSICIANPHYSICIAN5050

CHARGECHARGE ITEMITEM500500

LOCATIONLOCATION100100

* Each patient has 4 treatments* Each patient has 4 treatments on average.on average.

1000 X 4 => 40001000 X 4 => 4000

* Each patient has 4 treatments* Each patient has 4 treatments on average.on average.

1000 X 4 => 40001000 X 4 => 4000

(10)(10)

(20)(20)(4)(4)

DERIVEDERIVE

Data Volume Analysis

TREATMENTTREATMENT40004000

PATIENTPATIENT10001000

PHYSICIANPHYSICIAN5050

CHARGECHARGE10,00010,000

ITEMITEM500500

LOCATIONLOCATION100100* Each patient has 10 charges* Each patient has 10 charges on average.on average.

1000 X 10 => 10,0001000 X 10 => 10,000

* Each patient has 10 charges* Each patient has 10 charges on average.on average.

1000 X 10 => 10,0001000 X 10 => 10,000

(20)(20)(4)(4)

DERIVEDERIVE

(20)(20)

(10)(10)

Data Volume Analysis

TREATMENTTREATMENT40004000

PATIENTPATIENT10001000

PHYSICIANPHYSICIAN5050

CHARGECHARGE10,00010,000

ITEMITEM500500

LOCATIONLOCATION100100

(10)(10)

(20)(20)(4)(4)

(20)(20)

(10)(10)KNOW ...KNOW ...Number ofNumber ofrecords andrecords andrelationshipsrelationships

Data Usage Analysis

Want to identify major transactions and processes which hit on the database

Analyze each transaction and process to determine access paths used and frequency of use

Create composite map from individual analyses

Transaction Analysis FormTRANSACTION NUMBER MVCH-4TRANSACTION NAME: CREATE PATIENT BILLTRANSACTION VOLUME:AVERAGE 2/HR PEAK: 10/HR

PATIENTPATIENT10001000

CHARGECHARGE10,00010,000

ITEMITEM500500

(1)

(2) (3)

NO. NAME ACCESS TRAN PERIOD TYPE REF REF(1) ENTRY-PATIENT READ 1 10

Transaction Analysis Form

NO. NAME ACCESS TRAN PERIOD TYPE REF REF(1) ENTRY-PATIENT READ 1 10(2) PATIENT-CHARGE READ 10 100(3) CHARGE-ITEM READ 10 100

PATIENTPATIENT10001000

CHARGECHARGE10,00010,000

ITEMITEM500500

(1)

(2) (3)

Composite Usage Map

Determine how the data structures are accessed for each transaction and process include programs standard queries

programmedad hoc

Composite Usage Map

TREATMENTTREATMENT40004000

PATIENTPATIENT10001000

PHYSICIANPHYSICIAN5050

CHARGECHARGE10,00010,000

ITEMITEM500500

LOCATIONLOCATION100100

(25)

(50)

(50)

(50)NUMBER ISPER HOURAT PEAK VOLUME

Composite Usage Map

TREATMENTTREATMENT40004000

PATIENTPATIENT10001000

PHYSICIANPHYSICIAN5050

CHARGECHARGE10,00010,000

ITEMITEM500500

LOCATIONLOCATION100100

(75) (25) (30)

(200)

(20)

(50)

(50)

(100)

Composite Usage Map

TREATMENTTREATMENT40004000

PATIENTPATIENT10001000

PHYSICIANPHYSICIAN5050

CHARGECHARGE10,00010,000

ITEMITEM500500

LOCATIONLOCATION100100

(75) (25) (30)(25)

(200)

(20)

(50)

(50)

(50)(50)

(50)

(100)

Summary

Given volume and usage knowledge we can consider different physical implementation strategies, including ... INDEXES DENORMALIZATION CLUSTERING

Indexes

Purpose: To speed up access to a particular row or a group of rows in a table.

Also used to enforce uniquenessEliminates the necessity of re-sorting

the table each time we need to create a sequenced list

Indexes

Allen 3Brian 6Carole 7John 2Karen 5Marvin1Sharon 8Sue 4

1 Marvin …2 John ...3 Allen ...4 Sue ...5 Karen ...6 Brian ...7 Carole ...8 Sharon ...

Example

SELECT NAME, DEPT, RATING FROM EMP WHERE RATING = 10;

Indexing on RATING improves performance. Without an index, must do a full table scan.

Costs of an index?

Storage spaceMaintenance

Indexed must be changed for each add/delete or change in value on indexed field.

One benchmark ... insert into table w/o indexes, 0.11 seconds, w/ 8 indexes, 0.94 seconds.

Access Indexes

Automatically created on primary key.

You must create other indexes as needed.

Note, creating a unique index on a foreign key turns the relationship into a 1 - 1 relationship rather than a 1 - m relationship.

Let’s consider Oracle indexes and performance ...

Oracle Indexes

% Seconds8.5 0.66 12.03 35.7015.5 1.04 16.21 35.7025.2 1.54 25.45 35.7050.7 2.80 33.89 35.70100 5.72 87.23 35.70

SELECT COUNT(*)FROM EMPWHERE EMP_NO>0

SELECT EMP_NAMEFROM EMPWHERE EMP_NO>0

INDEX + TABLE

FULL TABLE SCAN

INDEXONLY

% OFFILEREAD

26,000 Rows, 7 Rows per Block

BREAK-EVEN

% Seconds8.5 0.66 2.31 4.5215.5 1.05 4.01 4.5225.2 1.59 6.37 4.5250.7 2.91 12.69 4.52100 6.01 25.37 4.52

SELECT COUNT(*)FROM EMPWHERE EMP_NO>0

SELECT EMP_NAMEFROM EMPWHERE EMP_NO>0

INDEX + TABLE

FULL TABLE SCAN

INDEXONLY

% OFFILEREAD

26,000 Rows, 258 Rows per Block

BREAK-EVEN

Oracle Indexes

Rules of thumb

Use indexes generously for applications which are decision support/retrieval based.

Use indexes judiciously for transaction processing applications.

Places to use indexes

PRIMARY KEYFOREIGN KEYSNon Key attributes that are referred

to in qualification, sorting, and grouping (WHERE, ORDER BY, GROUP BY)

Denormalization

Goal is to reduce the number of physicals reads to the storage devices by reducing the number of joins.

Costs of Denormalization

Makes coding more complexOften sacrifices flexibilityWill speed up retrieval but slow

updates

Including children in the parent record

Multiple addresses in the personnel record Absolute number of children for a

parent is known (e.g., 2 addresses) The number won’t change over time The number is not very large

Clusters in Oracle

Clustering stores records from two tables into the same physical storage space Only useful for EQUI-JOINS Improves performance by 2-3 times

Storing most recent child data in the parent record

Multiple children, but children have an ordering (e.g., date of order) For example, perhaps storing amount of

last order. Amount of last dividend paid to a

particular account

Store running totals /Create extract tables

Store summary data from a child record Year to date sales

Create a summary table which contains aggregate values over some period (say, one month)

Duplicating a key beyond an immediate child record

ORDERS

PARTS

CLASS CLASS_ID

PART_ID,CLASS_ID

ORDER_ID,PART_ID,CLASS_IDADD THIS KEY

Consider SQL statement for previous example

SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESCFROM CLASS C, PART P, ORDER OWHERE O.PART_NO = P.PART_NOAND P.CLASS = C.CLASS;

SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESCFROM CLASS C,ORDER OWHERE O.CLASS = C.CLASS;

Record Partitioning

Breaking up a record into two parts

A,B,C,D,E,F,G

A,B,C,D

E,F,G

Summary

Logical design gives you information about the ‘how’ to build the system.

Good physical design takes into account the performance of the final design … to know how best to do this task, you must understand how the system is being used!

top related