chapter 8 physical database design. mcgraw-hill/irwin © 2004 the mcgraw-hill companies, inc. all...

62
Chapter 8 Chapter 8 Physical Database Design

Post on 22-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

Chapter 8Chapter 8

Physical Database Design

Page 2: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Outline Outline

Overview of Physical Database DesignInputs of Physical Database Design File Structures Query Optimization Index SelectionAdditional Choices in Physical Database

Design

Page 3: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Overview of Physical Database Overview of Physical Database DesignDesign Importance of the process and environment of

physical database design– Process: inputs, outputs, objectives

– Environment: file structures and query optimization Physical Database Design is characterized as a

series of decision-making processes. Decisions involve the storage level of a database:

file structure and optimization choices.

Page 4: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Storage Level of DatabasesStorage Level of Databases

The storage level is closest to the hardware and operating system.

At the storage level, a database consists of physical records organized into files.

A file is a collection of physical records organized for efficient access.

The number of physical record accesses is an important measure of database performance.

Page 5: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Relationships between Logical Relationships between Logical Records (LR) and Physical Records (LR) and Physical Records (PR)Records (PR)

LR

LR

LR

PR

PR

LR T1

LR T2

LR T2

(a) Multiple LRs per PR (b) LR split across PRs (c) PR containing LRs from different tables

PR LR PR

Page 6: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Transferring Physical RecordsTransferring Physical Records

LR1

LR2

LR3

Application buffers:Logical records (LRs)

DBMS Buffers:Logical records (LRs) insideof physical records (PRs)

LR1

LR2

LR3

LR4

LR4

Operating system:Physical records(PRs) on disk

read read

writewrite

PR1

PR2

PR1

PR2

Page 7: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Objectives Objectives

Minimize response time to access and change a database.

Minimizing computing resources is a substitute measure for response time.

Database resources– Physical record transfers– CPU operations– Communication network usage (distributed

processing)

Page 8: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

ConstraintsConstraints

Main memory and disk space are considered as constraints rather than resources to minimize.

Minimizing main memory and disk space can lead to high response times.

Thus, reducing the number of physical record accesses can improve response time.

CPU usage also can be a factor in some database applications.

Page 9: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Combined Measure of Database Performance : PRA + W * CPU-OP

Combined Measure of Combined Measure of Database PerformanceDatabase Performance To accommodate both physical record

accesses and CPU usage, a weight can be used to combine them into one measure.

The weight is usually close to 0 because many CPU operations can be performed in the time to perform one physical record transfer.

.

Page 10: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Inputs, Outputs, and Inputs, Outputs, and EnvironmentEnvironment

Physicaldatabasedesign

Table design(from logical database design)

Tableprofiles

Applicationprofiles

File structures

Data placement

Data formatting

Denormalization

Knowledge about file structures andquery optimization

Page 11: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Difficulty of physical database Difficulty of physical database designdesignNumber of decisionsRelationship among decisionsDetailed inputsComplex environmentUncertainty in predicting physical record

accesses

Page 12: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Inputs of Physical Database Inputs of Physical Database DesignDesign Physical database design requires inputs

specified in sufficient detail.Table profiles and application profiles are

important and sometimes difficult-to-define inputs.

Page 13: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Table ProfileTable Profile

A table profile summarizes a table as a whole, the columns within a table, and the relationships between tables.

Typical Components of a Table Profile

Component Statistics Table Number of rows and physical records

Column Number of unique values, distribution of values

Relationship Distribution of the number of related rows

Page 14: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Application profiles Application profiles

Application profiles summarize the queries, forms, and reports that access a database.

Typical Components of an Application Profile

Application Type Statistics Query Frequency; distribution of parameter values

Form Frequency of insert, update, delete, and retrieval operations to the main form and the subform

Report Frequency; distribution of parameter values

Page 15: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

File structuresFile structures

Selecting among alternative file structures is one of the most important choices in physical database design.

In order to choose intelligently, you must understand characteristics of available file structures.

Page 16: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Sequential FilesSequential Files

Simplest kind of file structure Unordered: insertion orderOrdered: key orderSimple to maintainProvide good performance for processing

large numbers of records

Page 17: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Unordered Sequential FileUnordered Sequential File

123-45-6789 Joe Abbot ...

788-45-1235 Sue Peters ...

122-44-8655 Pat Heldon ...

466-55-3299 Bill Harper ...

323-97-3787 Mary Grant ...

PR1

PRn

543-01-9593 Tom Adtkins

Insert a new logicalrecord in the lastphysical record.

StdSSN Name ...

...

Page 18: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Ordered Sequential FileOrdered Sequential File

123-45-6789 Joe Abbot ...

122-44-8655 Pat Heldon ...

788-45-1235 Sue Peters ...

466-55-3299 Bill Harper ...

323-97-3787 Mary Grant ...

PR1

PRn

543-01-9593 Tom Adtkins

Rearrange physical recordto insert new logical record.

StdSSN Name ...

...

Page 19: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Hash FilesHash Files

Support fast access unique key value Converts a key value into a physical record

addressMod function: typical hash function

– Divisor: large prime number close to the file capacity

– Physical record number: hash function plus the starting physical record number

Page 20: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Example: Hash Function Example: Hash Function Calculations for StdSSN KeyCalculations for StdSSN Key

.

StdSSN StdSSN Mod 97 PR Number 122448655 26 176 123456789 39 189 323973787 92 242 466553299 80 230 788451235 24 174 543019593 13 163

Page 21: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Hash File after InsertionsHash File after Insertions

122-44-8655 Pat Heldon

123-45-6789 Joe Abbot

PR176

...

PR189PR163 543-01-9593 Tom Adtkins

788-45-1235 Sue PetersPR174

......

466-55-3299 Bill HarperPR230

...

323-97-3787 Mary GrantPR242

Page 22: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Linear Probe Collision Handling Linear Probe Collision Handling During an Insert OperationDuring an Insert Operation

122-44-8752 Joe Bishop ...

122-44-8655 Pat Heldon ...

122-44-8753 Bill Hayes ...

122-44-8849 Mary Wyatt ...

PR 176

122-44-8946 Tom Atkins adtkins

Home address (176) is full.

...

PR 177

...

Linear probe to find a physical record with space

Home address = Hash function value + Base address

(122448946 mod 97 = 26) + 150

Page 23: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Multi-Way Tree (Btrees) FilesMulti-Way Tree (Btrees) Files

A popular file structure supported by most DBMSs.

Btree provides good performance on both sequential search and key search.

Btree characteristics:– Balanced

– Bushy: multi-way tree

– Block-oriented

– Dynamic

Page 24: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Structure of a Btree of Height 3 Structure of a Btree of Height 3

...

...

...

Level0

Level1

Level2

Rootnode

Leaf nodes

Page 25: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Btree Node Containing Keys Btree Node Containing Keys and Pointersand Pointers

Key1 Key2 ... Keyd ... Key2d

Pointer 1 Pointer 2 Pointer 3 Pointer d+1 Pointer 2d+1

Each non root node contains at least half capacity(d keys and d+1 pointers).

Each non root node contains at most full capacity(2d keys and 2d+1 pointers).

...

Page 26: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Btree Insertion ExamplesBtree Insertion Examples 20 45 70

22 28 35 40 50 60 65

20 45 70

22 28 35 40 50 55 60 65

(a) Initial Btree

(b) After inserting 55

20 45 58 70

22 28 35 40 50 55

(c) After inserting 58

60 65

Node split

Middle key value(58) moved up

Page 27: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Btree Deletion ExamplesBtree Deletion Examples 20 45 70

22 28 35 50 60 65

20 45 70

22 28 35 50 65

(a) Initial Btree

(b) After deleting 60

20 35 70

22 28 45 50

(c) After deleting 65Borrowing a key

Page 28: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Cost of OperationsCost of Operations

The height of Btree dominates the number of physical record accesses operation.

Logarithmic search cost– Upper bound of height: log function’– Log base: minimum number of keys in a node

The cost to insert a key = [the cost to locate the nearest key] + [the cost to change nodes].

Page 29: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

B+TreeB+Tree

Provides improved performance on sequential and range searches.

In a B+tree, all keys are redundantly stored in the leaf nodes.

To ensure that physical records are not replaced, the B+tree variation is usually implemented.

Page 30: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Index MatchingIndex Matching

Determining usage of an index for a queryComplexity of condition determines match.Single column indexes: =, <, >, <=, >=, IN

<list of values>, BETWEEN, IS NULL, LIKE ‘Pattern’ (meta character not the first symbol)

Composite indexes: more complex and restrictive rules

Page 31: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Bitmap IndexBitmap Index

Can be useful for stable columns with few values Bitmap:

– String of bits: 0 (no match) or 1 (match)

– One bit for each row Bitmap index record

– Column value

– Bitmap

– DBMS converts bit position into row identifier.

Page 32: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Bitmap Index ExampleBitmap Index Example

RowId FacSSN … FacRank 1 098-55-1234 Asst 2 123-45-6789 Asst 3 456-89-1243 Assc 4 111-09-0245 Prof 5 931-99-2034 Asst 6 998-00-1245 Prof 7 287-44-3341 Assc 8 230-21-9432 Asst 9 321-44-5588 Prof 10 443-22-3356 Assc 11 559-87-3211 Prof 12 220-44-5688 Asst

FacRank Bitmap Asst 110010010001 Assc 001000100100 Prof 000101001010

Faculty Table

Bitmap Index on FacRank

Page 33: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Bitmap Join IndexBitmap Join Index

Bitmap identifies rows of a related table.Represents a precomputed join Can define for a join column or a non-join

columnTypically used in query dominated

environments such as data warehouses (Chapter 16)

Page 34: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Summary of File StructuresSummary of File Structures

Unordered Ordered Hash B+tree Bitmap

Sequential search

Y Y Extra PRs Y N

Key search

Linear Linear Constant time

Logarithmic Y

Range search

N Y N Y Y

Usage Primary only

Primary only

Primary or secondary

Primary or secondary

Secondary only

Page 35: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Query OptimizationQuery Optimization

Query optimizer determines implementation of queries.

Major improvement in software productivity

You can sometimes improve the optimization result through knowledge of the optimization process.

Page 36: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Translation TasksTranslation Tasks

Syntax and semanticanalysis

Query transformation

Access planevaluation

Access planinterpretation

Code generation

Parsed query

Relational algebra query

AccessPlan

AccessPlan

Query

Query results Machine code

Page 37: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Access PlansAccess Plans

Enrollment Offering

Btree(OfferNo) BTree(OfferNo)

Sort Merge Join Faculty

BTree(FacSSN)

Sort Merge Join

Sort(FacSSN)

Page 38: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Access Plan EvaluationAccess Plan Evaluation

Optimizer evaluates thousands of access plans

Access plans vary by join order, file structures, and join algorithm.

Some optimizers can use multiple indexes on the same table.

Access plan evaluation can consume significant resources

Page 39: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Join AlgorithmsJoin Algorithms

Nested loops Sort merge Hybrid joinHash joinStar join

Page 40: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Optimization Tips IOptimization Tips I

Detailed and current statistics neededSave access plans for repetitive queries Review access plans to determine

problemsUse hints carefully to improve results

Page 41: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Optimization Tips IIOptimization Tips II

Replace Type II nested queries with separate queries.

For conditions on join columns, test the condition on the parent table.

Do not use the HAVING clause for row conditions.

Page 42: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Index SelectionIndex Selection

Most important decisionDifficult decisionChoice of clustered and nonclustered

indexes

Page 43: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Clustering Index ExampleClustering Index Example

Index set

<Abe, 1> <Adam, 2> <Bill, 4> <Bob, 3> <Carl, 5> <Carol, 6>

1. Abe, Denver, ...2. Adam, Boulder, ...

3. Bob, Denver, ...4. Bill, Aspen, ...

5. Carl, Denver, ...6. Carol, Golden, ...

...

Physical recordscontaining rows

Sequence set

Page 44: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Nonclustering Index ExampleNonclustering Index Example

Index set

<Abe, 6> <Adam, 2> <Bill, 4> <Bob, 5> <Carl, 1> <Carol, 3>

1. Carl,Denver, ...2. Adam, Boulder, ...

3. Carol, Golden, ...4. Bill, Aspen, ...

5. Bob, Denver, ...6. Abe, Denver, ...

...

Physicalrecordscontaining rows

Sequence set

Page 45: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Inputs and Outputs of Index Inputs and Outputs of Index SelectionSelection

IndexSelection

SQL statementsand weights

Table profiles

Clustered indexchoices

Nonclusteredindex choices

Page 46: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Trade-offs in Index SelectionTrade-offs in Index Selection

Balance retrieval against update performance Nonclustering index usage:

– Few rows satisfy the condition in the query– Join column usage if a small number of rows result in

child table Clustering index usage:

– Larger number of rows satisfy a condition than for nonclustering index

– Use in sort merge join algorithm to avoid sorting– More expensive to maintain

Page 47: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Difficulties of Index SelectionDifficulties of Index Selection

Application weights are difficult to specify.Distribution of parameter values needed Behavior of the query optimization

component must be known. The number of choices is large. Index choices can be interrelated.

Page 48: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Selection RulesSelection Rules Rule 1: A primary key is a good candidate for a

clustering index. Rule 2: To support joins, consider indexes on

foreign keys.Rule 3: A column with many values may be a good

choice for a non-clustering index if it is used in equality conditions.

Rule 4: A column used in highly selective range conditions is a good candidate for a non-clustering index.

Page 49: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Selection Rules (Cont.) Selection Rules (Cont.) Rule 5: A frequently updated column is not a

good index candidate.Rule 6: Volatile tables (lots of insertions and

deletions) should not have many indexes.Rule 7: Stable columns with few values are good

candidates for bitmap indexes if the columns appear in WHERE conditions.

Rule 8: Avoid indexes on combinations of columns. Most optimization components can use multiple indexes on the same table.

Page 50: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Index CreationIndex Creation

To create the indexes, the CREATE INDEX statement can be used.

The word following the INDEX keyword is the name of the index.

CREATE INDEX is not part of SQL:1999.

Example:

CREATE INDEX StdGPAIndex ON Student (StdGPA) CREATE UNIQUE INDEX OfferNoIndex ON Offering (OfferNo) CREATE BITMAP INDEX OffYearIndex ON Offering (OffYear)

Page 51: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Denormalization Denormalization

Additional choice in physical database design

Denormalization combines tables so that they are easier to query.

Use carefully because normalized designs have important advantages.

Page 52: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Normalized designsNormalized designs

Better update performance Require less coding to enforce integrity

constraintsSupport more indexes to improve query

performance

Page 53: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Repeating Groups Repeating Groups

A repeating group is a collection of associated values.

The rules of normalization force repeating groups to be stored in an M table separate from an associated one table.

If a repeating group is always accessed with its associated one table, denormalization may be a reasonable alternative.

Page 54: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Denormalizing a Repeating Denormalizing a Repeating GroupGroup

TerrNoTerrNameTerrLoc

Territory

TerrNoTerrQtrTerrSales

TerritorySales

1

M

TerrNoTerrNameTerrLocQtr1SalesQtr2SalesQtr3SalesQtr4Sales

Territory

Normalized Denormalized

Page 55: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Denormalizing a Generalization Denormalizing a Generalization Hierarchy Hierarchy

EmpNoEmpNameEmpHireDate

Emp

EmpNoEmpSalary

SalaryEmp

1

1

Normalized Denormalized

EmpNoEmpRate

HourlyEmp

EmpNoEmpNameEmpHireDateEmpSalaryEmpRate

Emp

1

Page 56: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Codes and MeaningsCodes and Meanings

DeptNoDeptNameDeptLoc

Dept

EmpNoEmpNameDeptNo

Emp

1

M

Normalized Denormalized

DeptNoDeptNameDeptLoc

Dept

EmpNoEmpNameDeptNoDeptName

Emp

1

M

Page 57: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Record FormattingRecord Formatting

Record formatting decisions involve compression and derived data.

Compression is a trade-off between input-output and processing effort.

Derived data is a trade-offs between query and update operations.

Page 58: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Storing Derived Data to Storing Derived Data to Improve Query PerformanceImprove Query Performance

OrdNoOrdDateOrdAmt

Order

OrdNoProdNoQty

OrdLine

1

M

ProdNoProdNameProdPrice

Product

1

M

deriveddata

Page 59: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Parallel ProcessingParallel Processing

Parallel processing can improve retrieval and modification performance.

Retrieving many records can be improved by reading physical records in parallel.

Many DBMSs provide parallel processing capabilities with RAID systems.

RAID is a collection of disks (a disk array) that operates as a single disk.

Page 60: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Striping in RAID Storage SystemsStriping in RAID Storage Systems Each stripe consists of four adjacent physical records. Three stripes areshown separated by dotted lines.

PR1 PR2 PR3 PR4

PR5 PR6 PR7 PR8

PR9 PR10 PR11 PR12

Page 61: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

Other Ways to Improve Other Ways to Improve PerformancePerformance

Transaction processing: add computing capacity and improve transaction design.

Data warehouses: add computing capacity and store derived data.

Distributed databases: allocate processing and data to various computing locations.

Page 62: Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.

SummarySummaryGoal: minimize computing resourcesTable profiles and application profiles must be

specified in sufficient detail.Environment: file structures and query

optimizationMonitor and possibly improve query optimization

resultsIndex selection: most important decisionOther techniques: denormalization, record

formatting, and parallel processing