database performance part 1—topics

34
IMS 6217: Database Performance Part 1 1 Dr. Lawrence West, Management Dept., University of Central Florida [email protected] Database Performance Part 1—Topics Storing Data Retrieving Data Costs of Retrieving Data Reasons for Concern Data Volume Analysis Data Usage Analysis Enhancement Mechanisms Indexes

Upload: yosef

Post on 07-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Database Performance Part 1—Topics. Storing Data Retrieving Data Costs of Retrieving Data Reasons for Concern Data Volume Analysis Data Usage Analysis Enhancement Mechanisms Indexes. Default SQL Server Data Storage. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

1Dr. Lawrence West, Management Dept., University of Central [email protected]

Database Performance Part 1—Topics

• Storing Data

• Retrieving Data

• Costs of Retrieving Data

• Reasons for Concern

• Data Volume Analysis

• Data Usage Analysis

• Enhancement Mechanisms

• Indexes

Page 2: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

2Dr. Lawrence West, Management Dept., University of Central [email protected]

Default SQL Server Data Storage

• Data in tables is stored on pages and there are eight pages per extent.

• When more space is needed an entire extent is added to the database

• Each row (record) in the databaseis physically stored on a page and in an extent

• Each row has a RowID and PageOffset that identifies it and it’s location in the page

RowID1 Data for Row 1RowID2 Data for Row 2RowID3 Data for Row 3RowID4 Data for Row 4RowID5 Data for Row 5RowID6 Data for Row 6RowID7 Data for Row 7RowID8 Data for Row 8RowID9 Data for Row 9RowID10 Data for Row 10 RowID11 Data for Row 11RowID12 Data for Row 12RowID13 Data for Row 13

Page 3: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

3Dr. Lawrence West, Management Dept., University of Central [email protected]

SQL Server Data Storage (cont.)

• Without a clustered index (covered later) rows areadded to pages in the order of insertion.

• When pages are full rowsare added to the next page in the extent.

• When extents are full new extents are created

• Tables keep track of the sequence of extents thatcontain their contents tocreate a logical sequence

Page 4: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

4Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Retrieval

• By default, queries of tables require that each page be loaded into memory in sequence and each row examined to see if it meets the query conditions

• "Full Table Scan"

Page 5: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

5Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Retrieval (cont.)

• The Page is the basic unit of IO

– Entire page is moved from physical storage to RAM for evaluation

• In a pure table scan (the default method of retrieval) each record is examined to see if it matches the WHERE clause conditions (if any)

– Test value and column value moved to CPU for testing

– Records where condition is TRUE are added to result set

• Pages are cached and the cached copy will be read if available and needed

Page 6: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

6Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Retrieval (cont.)

• In SQL Server page sizes are fixed at 8 KB

– (Entire extent is 64 KB)

– Some DBMS have different sizes

– Some DBMS allow tuning on a table by table basis

– 8 KB is also the maximum record size

• Number of Records on a page depends on record size

– Sum of data sizes of each column

• IO time for a pure scan increases with

– Number of records

– Record size

Page 7: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

7Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Retrieval Costs

• Two levels of costs associated with data retrieval

– Most Important: IO moving page from disk storage to RAM

– Less Important: CPU effort to evaluate records

– In default mode records cannot be evaluated until they have been moved into RAM

• We also care about physical storage space

– Less important as a performance issue

• We also care about costs of reorganizing data as it is added to the DB or updated (later)

Page 8: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

8Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Retrieval Costs (cont.)

• ALL Retrieval Enhancement mechanisms must be evaluated on the dimensions from the previous slide

• None of the enhancements come without cost

• Decisions affected by use of the data, not just pure database characteristics

– Understanding organizational tasks and priorities key

– Requires balance between technical and organizational knowledge

– MIS graduates ideally positioned to participate in this analysis

Page 9: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

9Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Retrieval Costs (cont.)

• Degree of the cost changes with many factors– Table sizes– Access mechanisms (paths—more later)– Nature of query– Number of tables needed in query– Nature of the enhancement approach

• Remember that our DB design goal of minimizing storage space and redundancy caused (normalization) spread data around the database– More tables containing transaction logic– More complicated queries

Page 10: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

10Dr. Lawrence West, Management Dept., University of Central [email protected]

Reasons for Concern

• Write the SQL query to calculate your GPA

• How is query executed (if no enhancements)?

DeptCodeCourseNoNameCreditHrsLabHrs

COURSE

SectionIDDeptCode <AK>CourseNo <AK>SecNo <AK>Term <AK>Year <AK>Room <FK1>DaysTimeInstructorID <FK2>

SECTION

SectionID <FK1>StudentID <FK2>Grade <FK3>

ENROLLMENT

StudentIDLastNameFirstName :

STUDENT

HasHas

Has

GradeGradePts :

GRADE

Has

Page 11: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

11Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Volume Analysis

• We don’t have retrieval problems with small tables

• Need to know how big a table will get over the life of the system to understand the potential magnitude of the problem

• Q: How many records are expected in the ENROLLMENT table?

• Document in the data dictionary

– Estimate of number of records expected

– How estimate was computed

DeptCodeCourseNoNameCreditHrsLabHrs

COURSE

SectionIDDeptCode <AK>CourseNo <AK>SecNo <AK>Term <AK>Year <AK>Room <FK1>DaysTimeInstructorID <FK2>

SECTION

SectionID <FK1>StudentID <FK2>Grade <FK3>

ENROLLMENT

StudentIDLastNameFirstName :

STUDENT

HasHas

Has

GradeGradePts :

GRADE

Has

Page 12: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

12Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Volume Analysis (cont.)

• Estimating DV

– Absolute count: We know there are 12 possible grades that can be contained in the GRADE table

– Estimate: We think that we will have 32,000 students next year (use your statistics!)

– Derived: Each enrolled student takes an average of four sections per semester

– Historical trends: Enrollment is growing at 2% per year

– System Lifetime: Specify the expected useful life of the system→Cap on records

Page 13: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

13Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Volume Analysis (cont.)

• Don’t forget historical data!

– Are graduated or withdrawn student records retained in the STUDENT and ENROLLMENT tables?

– How long will they be kept?

– What is the potential size of the ENROLLMENT table if records are never discarded?

• Precise entity definitions are critical in DVA

• Document where or how you came up with volume estimates

Page 14: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

14Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Usage Analysis

• DUA is concerned with three factors

– How frequently are tables accessed?

– How urgent are the table accesses?

– What is the access path into the table?

• Usually means what fields are being compared in a WHERE clause

• Including Join ON expressions

• Goal is to find the high frequency, important retrievals and to put enhancements on the path used by the retrieval

Page 15: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

15Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Usage Analysis (cont.)

• Many frequency and urgency estimates will come from an analysis of the organization’s business practices and needs

– What is max time a customer can be allowed to wait for a response?

– How many sales take place a day?

– Can this transaction take place in batch overnight?

• How many sales are made per hour? Do we expect it to grow?

• Consider electronic credit card clearing from retail stores

Page 16: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

16Dr. Lawrence West, Management Dept., University of Central [email protected]

Data Usage Analysis (cont.)

• The access path is the fields being searched to find appropriate records in a transaction

• What is the path taken through the sample ERD to:

– Calculate your GPA?

– Determine if you have met a course prerequisite?

• Don’t forget checks of operational business rules made in conjunction with a transaction

– What if we had a business rule that said only students with a 3.0 GPA could take ISM 4212?

– How about checking prerequisites?

Page 17: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

17Dr. Lawrence West, Management Dept., University of Central [email protected]

For Your Group's Project…

• Which business transaction will be conducted the most frequently?

– What SQL does it require?

• Include triggers

– What tables are used?

– How large are the tables?

– How time sensitive is the transaction?

• Identify a report your organization will need

– What SQL does it require?

– Is it needed near-real-time or can it wait?

Page 18: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

18Dr. Lawrence West, Management Dept., University of Central [email protected]

Enhancement Mechanisms

• Indices

• Denormalizing tables

• Partitioning tables

• Hardware enhancements

Page 19: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

19Dr. Lawrence West, Management Dept., University of Central [email protected]

Indexes

• If SQL Server knows the extent address, page address, and RowID of desired data it can go directly to the page in question (one page read into memory) and directly to the desired record

• Indexes are separate storage structures that map from values in columns of tables to the location of the row from which the value was taken

00010002000300040005000600070008000900100011

JonesAdamsSmithWilliamsFlintstoneRobertsGaskinAdamskiHawkinsJeffersonMeckley

Sally...Fred...Jerry...Bill...Sue...Louise...Bob...Joe...Joe...Frank...Linda...

AdamsAdamskiFlintstoneGaskinHawkinsJeffersonJonesMeckleyRobertsSmithWilliams

00020008000500070009001000010011000600030004

Page 20: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

20Dr. Lawrence West, Management Dept., University of Central [email protected]

Indexes (cont.)

• Indexes let the system search a small record to find the exact address of a large record

1

2

3

4

5

6

7

A 5

B 3

C 6

D 1

E 2

F 7

G 4

More records per page than the main table

Page 21: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

21Dr. Lawrence West, Management Dept., University of Central [email protected]

Indexes (cont.)

• There are a multitude of algorithms and techniques for implementing indexes

• Computer scientists develop, test, and evaluate various indexing methods

• Our indexing techniques will usually be determined by our choice of RDBMS

Page 22: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

22Dr. Lawrence West, Management Dept., University of Central [email protected]

The B-Tree (Balanced Tree) Index

1 120 300 480 610

1 27 39 72 91 610 622 647 679 725

1 5 12 19 25 27 29 32 34 35

Root Page

Leaf Pages

Data Pages

Page 23: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

23Dr. Lawrence West, Management Dept., University of Central [email protected]

The B-Tree Index (cont.)

• Rows in each index page are inorder according to the column(s)on which the index was created

• Upper level pages have sparsepopulations of indexes values

– Not all values listed

– Each entry points to the page with denser values

• Leaf pages (nodes) contain all values within a range

• Leaf pages point to the actual data page and Row ID from which the index value came

1 120 300 480 610

1 27 39 72 91 610 622 647 679 725

1 5 12 19 25 27 29 32 34 35

Page 24: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

24Dr. Lawrence West, Management Dept., University of Central [email protected]

Clustered Index

• In a clustered index the data rows are physically in the order specified by the index key

• Leaf Nodes in the index are actually the data pages

CustomerID CompanyName ---------- ---------------------------------------- ALFKI Alfreds FutterkisteANATR Ana Trujillo Emparedados y heladosANTON Antonio Moreno TaqueríaAROUT Around the HornBERGS Berglunds snabbköpBLAUS Blauer See DelikatessenBLONP Blondesddsl père et filsBOLID Bólido Comidas preparadasBONAP Bon app'BOTTM Bottom-Dollar Markets

Page 25: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

25Dr. Lawrence West, Management Dept., University of Central [email protected]

Clustered Index (cont.)

• Because data rows are physically ordered by the index value records must be moved around to allow insertions

CustomerID CompanyName ---------- ---------------------------------------- ALFKI Alfreds FutterkisteANATR Ana Trujillo Emparedados y heladosANTON Antonio Moreno TaqueríaAROUT Around the HornBERGS Berglunds snabbköpBERNI Bernie’s Fish-O-RamaBLAUS Blauer See DelikatessenBLONP Blondesddsl père et filsBOLID Bólido Comidas preparadasBONAP Bon app'BOTTM Bottom-Dollar Markets

Other records must be moved

Insertion

Page 26: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

26Dr. Lawrence West, Management Dept., University of Central [email protected]

Clustered Indexes (cont.)

• When a clustered index page is full it must “split”

– Half of records are moved to new page and half remain in place

– New pages may end up in new extents

– Pointers must link pages in the logical order of the data

• Pages with extensive insertions that are not naturally in the clustered index order can take extensive processing time

– E.g.—Adding Employees with SSN PK

• Page splits may cascade upwards to splits of index pages

Page 27: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

27Dr. Lawrence West, Management Dept., University of Central [email protected]

Clustered Indexes (cont.)

• Clustered indexes have significant advantages when performing range queries or when the desired index value is a ‘natural’ sequence for the data

– Timestamp

– CustomerID

• There can only be one clustered index per table (Why?)

• Nonclustered indexes on a table with a clustered index table point to the clustered index

Page 28: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

28Dr. Lawrence West, Management Dept., University of Central [email protected]

Implementing Indexes

• Use the ManageIndexes & Keyswindow in EnterpriseManager

• Default for PK index is to make it clustered

– Override if you don’twant this

– Do not automatically accept the default

Page 29: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

29Dr. Lawrence West, Management Dept., University of Central [email protected]

Using Indexes

• SQL Server will automatically select indices to use in queries

– Where clauses

– Inner Join clauses

• First column of the index must match the criteria

• Additional columns will be used if available

Page 30: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

30Dr. Lawrence West, Management Dept., University of Central [email protected]

Indexes (cont.)

• Places to consider implementing indexes

– Primary Keys (required in most RDBMS)

– Foreign Keys

– Other ‘access fields’

• E.g., Customer phone number if used as a lookup field

• Look at data usage analysis for other potential targets

– Fields in WHERE clause of SQL statements

– Fields in ORDER BY clause of SQL query

Page 31: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

31Dr. Lawrence West, Management Dept., University of Central [email protected]

Indexes (cont.)

• Contraindications for indexes

– Very little variation among the attribute values in the indexed field(s)

• Class (Freshman, Sophomore, etc.)

• Gender

– Many null values in the indexed field(s)

– Small tables (Index may be as large as the table)

Page 32: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

32Dr. Lawrence West, Management Dept., University of Central [email protected]

Index Benefits

• Avoid table scan

• Quick location of record address—one page record to get data

– Small row sizes per each index entry→many fewer page reads to find record address

– B-tree algorithm discards high percentage of records with each level of the index pages evaluated

• SQL stops looking when it knows it has finished—indices can determine this

• Indexes may be used for IF EXISTS queries without accessing data pages

Page 33: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

33Dr. Lawrence West, Management Dept., University of Central [email protected]

Index Costs

• Extra storage space

• Each table index must be updated with each data modification to the table

– Increased processing time

• Easy to implement and sometimes overused

Page 34: Database Performance Part 1—Topics

IMS 6217: Database Performance Part 1

34Dr. Lawrence West, Management Dept., University of Central [email protected]

Index Tricks and Techniques

• Consider dropping and then rebuilding indices when bulk updates are required

• Nonclustered indices can have additional data included in the leaf node

– Avoid retrieval of main data page

– Increases index size and therefore reduces efficiency