sql optimization & access paths: what’s old & new part 1 user group 1.pdf · db2 dba on...

34
1 David Simpson Themis Inc. [email protected] SQL Optimization & Access Paths: What’s Old & New Part 1 © 2008 Themis, Inc. All rights reserved. David Simpson is currently a Senior Technical Advisor at Themis Inc. He teaches courses on SQL, Application Programming, DB2 Administration as well as performance and tuning. He has supported transactional systems that use DB2 for z/OS databases in excess of 10 terabytes. David has worked with DB2 for 14 years as an application programmer, DBA and technical instructor. David is a certified DB2 DBA on both z/OS and LUW. David was voted Best User Speaker and Best Overall Speaker at IDUG North America 2006. He was also voted Best User Speaker at IDUG Europe 2006.

Upload: others

Post on 23-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

1

David SimpsonThemis Inc.

[email protected]

SQL Optimization & Access Paths: What’s Old & New

Part 1

© 2008 Themis, Inc. All rights reserved.

David Simpson is currently a Senior Technical Advisor at Themis Inc. He teaches courses on SQL, Application Programming, DB2 Administration as well as performance and tuning. He has supported transactional systems that use DB2 for z/OS databases in excess of 10 terabytes. David has worked with DB2 for 14 years as an application programmer, DBA and technical instructor. David is a certified DB2 DBA on both z/OS and LUW. David was voted Best User Speaker and Best Overall Speaker at IDUG North America 2006. He was also voted Best User Speaker at IDUG Europe 2006.

Page 2: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

2

Disclaimer

In other words….IT DEPENDS!

The content of this presentation reflects my personal experience and is a product of the systems and applications I have worked with. Your results may vary. My results may or may not be typical.

“Themis makes no representation, warranties or guarantees whatsoever in relationship to the information contained in this presentation. This presentation is provided solely to share information with the audience relative to the subject matter contained in the presentation and is not intended by the presenter or Themis to be relied upon by the audience of this presentation.”

Page 3: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

3

Goals of Optimization

• Improve overall application performance by:• Reducing CPU used by DB2• Reducing I/O done by DB2• Reducing contention

Optimization is essentially about 3 things:

•Reducing CPU used by DB2•Reducing I/O done by DB2•Reducing contention

Accomplishing any of the above will likely improve the performance of any application.

Page 4: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

4

Reducing I/O in DB2• Appropriate use of indexes

• System tuning

• Bufferpool tuning

• Early elimination of data from consideration (i.e. get a good access path)

Reducing I/O within DB2 is one of the easiest ways to improve the performance of SQL within an application.

Creating appropriate indexes on columns or groups of columns that are commonly used to identify needed data can significantly reduce the I/O and CPU needed to retrieve a result.

It is also important that they DB2 system itself be configured optimally for the workload that must be supported. One important component of system tuning is the bufferpools. Bufferpools exist to reduce the amount of I/O needed by applications. Bufferpools should be configured to allow for as much reuse of cached data as possible. Objects may be grouped by sequential and random access patterns and the settings of the pools adjusted accordingly. In some cases a system tuning effort can reap significant rewards.

No amount of system tuning, however, can recover the resources wasted by a poor database design or poor access paths generated by the DB2 optimizer. This course focuses on optimization and tuning at the SQL level. In general, we want the optimizer to generate an access path that eliminates as much data from consideration as early as possible in the process.

Page 5: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

5

Determining What to Tune

• Trace Data• Accounting and Statistics Reports• Online Monitor “Top 10” Lists

• Critical Path Queries• May only execute once per day, but runs 2 hours

• Ad Hoc Queries

Determining which SQL statements should be targets of optimization is always a challenge. Most monitoring tools have reports that will generate a top “n” list of statements and their cost for a time period. This can be done either with the monitor’s own historical data store, or by summarizing trace data from SMF or GTF.

Some queries require tuning even if they don’t run very often. Batch SQL in the critical path may warrant a tuning effort even if it only executes once a day. Ad Hoc queries as part of a reporting system or data warehouse may also need to be optimized as they enter the system.

Page 6: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

6

The DB2 Optimizer

Catalog StatisticsObject Definitions

Access Path

Access Path Hint

Through the Data Manipulation Language (DML) the user of a DB2 database supplies the “WHAT”; that is, the data that is needed from the database to satisfy the business requirements. DB2 then uses the information in the DB2 Catalog to resolve “WHERE” the data resides. The DB2 Optimizer is then responsible for determining the all important “HOW” to access the data most efficiently.

Ideally, the user of a relational database is not concerned with how the system accesses data. This is probably true for an end user of DB2, who writes SQL queries quickly for one-time or occasional use. It is less true for developers who write application pro-grams and transactions, some of which will be executed thou-sands of times a day. For these cases, some attention to DB2 access methods can significantly improve performance. DB2’s access paths can be influenced in four ways:

♦ By rewriting a query in a more efficient form. ♦ By creating, altering, or dropping indexes. ♦ By updating the catalog statistics that DB2 uses to estimate access costs. ♦ By utilizing Optimizer Hints.

Page 7: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

7

Explain

Optim

izer

EXPLAIN PLAN SET QUERYNO = 10 FORSELECT LASTNAME,SALARY

FROM EMPWHERE EMPNO BETWEEN '000000' AND '099999'

AND SALARY < 40000

OR

PLAN_TABLEDSN_STATEMNT_TABLEDSN_FUNCTION_TABLE

& a bunch of super-secret “hidden” tables

BIND PACKAGE with optionEXPLAIN(YES)

The process of asking the DB2 optimizer to describe an access path that was chosen (or will be chosen) for a query is called an explain. When we run an explain, the output is placed in DB2 tables that we may then view.

A PLAN_TABLE is a regular DB2 table that holds results of an EXPLAIN. IBM’s SQL Reference Guide contains a format for the PLAN TABLE and a description of all the columns. Each user running an explain needs access to a plan table either by owning one directly or through a secondary authid. In DB2 Version 8, aliases may also be used to allow users to share a single set of explain tables.

Although the plan table is required to run explains, there are also several other explain tables which may optionally be created to hold explain data. These extra tables will be populated during an explain if they exist.

The DSN_STATEMNT_TABLE contains information about the total perceived cost of the query being explained. This cost data may be compared for several iterations of refinement for a query to see if it might improve the performance. Costs may also be tracked over time.

Additionally, there are many more “hidden” explain tables that will be populated if they exist. IBM has chosen not to document the contents of these tables, but they are used by Visual Explain in DB2 Version 8 to display much more detailed analysis of a query.

Page 8: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

8

SELECT * FROM MY.PLAN_TABLEWHERE PROGNAME = 'PACK1'AND COLLID = 'COLL1'AND VERSION = 'PROD1'

ORDER BY TIMESTAMP, QUERYNO, QBLOCKNO,PLANNO, MIXOPSEQ;

Retrieving Rows From a Plan Table

Several processes can insert rows into the same plan table. To understand access paths, you must retrieve the rows for a particular query in an appropriate order.

Retrieving Rows for a PlanThe rows for a particular plan are identified by the value of APPLNAME. The PLAN_TABLE query on the preceding page returns the rows for all the explainable statements in a plan in their logical order.

The Result of the ORDER BY Clause IllustratesIt is important to arrange the rows selected from the PLAN_TABLE in order to view the access path in their logical sequence. The result of the ORDER BY clause shows whether there are:•Multiple QBLOCKNOs within a QUERYNO•Multiple PLANNOs within a QBLOCKNO •Multiple MIXOPSEQs within a PLANNO

All rows with the same non-zero value for QBLOCKNO and the same value for QUERYNO relate to a step within the query. QBLOCKNOs are not necessarily executed in the order shown in PLAN_TABLE. But within a QBLOCKNO, the PLANNO column gives the substeps in the order they execute. For each substep, the TNAME column identifies the table accessed. Sorts can be shown as part of a table access or as a separate step.

Page 9: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

9

Visual Explain for Version 8

• Download from http://www-306.ibm.com/software/data/db2/zos/osc/ve

• Requires DB2 Connect access either via Personal Edition or the DB2 client and a DB2 Connect EE gateway.

• VE Version 8 will work with DB2 Version 7 with loss of some functionality.

• Requires all external and “hidden” explain tables be created. VE will prompt you to create them if they don’t exist for your SQLID. DBA authority will be needed to do this.

Visual Explain is a free tool provided by IBM to help with optimization and SQL tuning. Visual Explain was first introduced in Version 7. The original tool essentially displayed the data from the plan table in a graphical format with some additional data provided from the DSN_STATEMNT_TABLE and the DSN_FUNCTION_TABLE. Visual Explain is a windows based tool that requires DB2 Connect (either directly or through a gateway) to connect to DB2 on z/OS.

In Version 8, Visual Explain has been re-written and enhanced to provide additional information that cannot be obtained by looking at the plan table alone. Visual Explain uses additional explain tables that provide enhanced data for optimization. These additional tables are not documented, but may be viewed after an explain is done. These additional tables are populated when present even if the explain is not performed with Visual Explain.

All 12 tables are required for Visual Explain to perform an explain. The user must have the tables created under their own ID or be part of a secondary authid that has the tables available.

Page 10: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

10

Starting Visual Explain

After launching Visual Explain the “List Databases” window should come up with a list of all datasources configured for your client. Select the one you wish to use andclick the “Connect” button. Log on with your userid and password.

Once connected you may use any of the functionality of Visual Explain against the subsystem. Most commonly used is the “Tune SQL button” which is on the button bar or may be accessed through the “Tools” menu option.

Page 11: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

11

Tuning SQL with Visual Explain

ConnectedOpen SQL Tuning

Window

Page 12: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

12

Enter SQL Statement

Explain Tablesto Use

Creator for UnqualifiedTables

SQL

The SQL Tuning window provides a place to enter an SQL statement. The SQLID should be set to the owner of the explain tables to be used. This must either be the userid or a secondary authid to which the user is connected. The schema box may be edited to provide a qualifier to be used for any references in the SQL statement to unqualified tables. The Current Degree dropdown box specifies whether parallelism should be considered when optimizing the statement. A value of “1” means no parallelism will be used. A value of “ANY” means that parallelism will be considered. If “System Default” is specified then the system’s default parallel mode will be used.

The “Execute” button will actually run the query and present results in a grid window. The “Analyze” button will provide statistics recommendations for the query. The “Explain” button will explain the query and present the access path graph as a result.

Page 13: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

13

Enter SQL Statement

Get StatsRecommendation

Explain theQuery

Run the Query

Parallelism?

Page 14: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

14

Visual Explain Access Path Graph

The access path graph on the facing page show the explain data for the query. Access path graphs are read left to right, bottom to top. Each of the nodes on the graph represent a source of data or an operation on data as it moves towards the result set. Each node may be clicked to provide details about that node on the left side of the screen.

Page 15: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

15

Visual Explain Access Path Graph

Sources of Data

In this graph, there are two nodes that indicate they are a source of data. Data is accessed by doing an index scan of the XEMP03 index. Once appropriate rows are identified then the data in the EMP table is retrieved using the identifiers from the index.

If one of these nodes is highlighted by clicking on it the catalog data about the object is displayed on the left side of the screen. Statistical information is displayed as well as the timestamp when statistics were last gathered.

By navigating the tree at the top left of the screen, information may be viewed about the table, tablespace and any other indexes that exist on the referenced table.

Page 16: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

16

Visual Explain Access Path Graph

Page 17: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

17

Visual Explain Access Path Graph

The other nodes on the diagram represent operations on the data. If one of these nodes is selected the left side of the screen will show the metrics that DB2 used in determining that this was the appropriate access method. Predicate level data is shown as well as row estimates for how many rows will be passed to the next operation. These estimates may then be compared to reality to determine if the optimizer made a good choice.

Page 18: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

18

SN0REMP01

PREFETCHINDEXONLY

ACCESSNAME

MATCHCOLS

ACCESSTYPE

TNAMEMETHODPLANNO

SELECT . . . FROM EMP WHERE HIREDATE > ? ;

Tablespace Scan

Plan Table

Visual Explain

Tablespace scans are illustrated through EXPLAIN by ACCESSTYPE = R and PREFETCH = S.

The query above illustrates a tablespace scan. The query has a predicate, however there are no matching indexes on the HIREDATE column.

When Tablespace Scans are Appropriate Tablespace scan access is selected by DB2 typically when; ♦ A matching index scan is not possible because there are no indexes or there are no predicates to that match the index columns. ♦ A high percentage of the rows in the table qualify. ♦ The indexes that have matching predicates have low cluster ratios, making them efficient only when a small number of rows qualify.

The Visual Explain access path graph for a tablespace scan is also shown. Notice that Visual Explain gives estimates of how many rows will remain at each level.

Page 19: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

19

Tablespace Scan Detail – Visual Explain

Limited Part Scan Prefetch

If Visual Explain is used more detail is provided by clicking on the TBSCAN node in the graph. The left side of the screen will show that sequential prefetch has been selected as well as a limited partition scan. The details also show which partitions will be scanned. Estimates of how many rows will actually qualify for the predicates involved are supplied along with the filter factors used by DB2 in making these estimates.

Filter factors and their importance in the optimization process will be discussed in detail in an upcoming chapter.

Page 20: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

20

Page A

Page B

Highest Key of Page A

Highest Key of Page B

Table

Row Row

Row

Level2

Level1

Level0

Page 1

Page X

Highest Key of Page 1

Highest Key of Page X

- - - - - - - - - - - - - - - -

Nonleaf Page A

Page Z Highest Key of Page Z

- - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - -

Nonleaf Page B

Key Record-ID

- - - - - - - - -- - - - - - - - -

Leaf Page 1

Root Page

Leaf Page X

- - - - - - - - -- - - - - - - - -

Key Record-ID

Leaf Page Z

- - - - - - - - -- - - - - - - - -

Key Record-ID

Index Structures

Indexes can have multiple levels of pages. Index pages that point directly to the table data are called leaf pages. If an index has more than one leaf page, it will have at least one nonleaf page, containing the entries that point to leaf pages. If an index has more than one nonleaf page, the nonleaf pages that point to the leaf pages are referred to as level 1. A second level of nonleaf pages must point to level 1, and so on. The highest level contains a single page, called the root page. This page is created by DB2 when the index in initially built. This index tree then points directly to the table data through the key and the RID (record identifier).

Typically, the larger the key data component of the index, the more levels there will be in the index tree. This is due to the page structure. There are a fixed number of bytes of data that can be stored on any given page. Typically, the more levels to an index, the less likely DB2 will use the index for matching index access.

Although indexes provide may performance advantages, such as direct access to data, avoiding sorts, enforcing uniqueness, clustering, speeding RI checks and assisting in joins. It is important to remember some of costs associated with indexes:

•A row insert requires an insert to every index on that table.•A row delete requires a delete to every index on that table.•An update of an indexed column requires a delete and an insert on indexes referencing that column. •When tables are reorganized or loaded, each index on the table must be rebuilt.

Page 21: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

21

Index Structures – Leaf Pages

Smith, Abby, ASmith, Bubba, ZSmith, David, CSmith, Ed, BSmith, Joe, A

235/35/57

432/937/1683/4

Smith, Mary, NSmith, Nancy, ZSmith, Nate, CSmith, Olivia, BSmith, Traci, A

235/5985/921/3

39/42875/8

Record ID (RID)Physical Location

of the row

Index on(LASTNAME, FIRSTNME, MIDINIT)

Although indexes provide may performance advantages, such as direct access to data, avoiding sorts, enforcing uniqueness, clustering, speeding RI checks and assisting in joins. It is important to remember some of costs associated with indexes:

♦ A row insert requires an insert to every index on that table.♦ A row delete requires a delete to every index on that table.♦ An update of an indexed column requires a delete and an insert on indexes referencing that column.♦ When tables are reorganized or loaded, each index on the table must be rebuilt.

Page 22: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

22

SELECT * FROM EMP WHERE LASTNAME = ‘Coldsmith’AND FIRSTNME = ‘Nichelle’;

DataPages

Root Page

Non-Leaf Pages

Leaf Pages

Index XEMP03 =

1) Root page is read to determine corresponding non-leaf page

2) Non-leaf page is read to determine corresponding leaf page

3) Leaf page(s) are read to determine RID of corresponding data row(s)

4) Data Pages are returned

Index Structure is utilized by reading some Index Pages and their corresponding Data Pages.

Index Scan - Matching

LASTNAME, FIRSTNME, MIDINIT

A matching index scan means there are column(s) specified in the predicate(s) that match the leading column(s) specified in the index. These predicates provide filtering capabilities. The higher the degree of filtering, the more efficient the matching index access becomes. The general rules for determining the number of matching columns are fairly straightforward, although there are a few exceptions.♦ The index columns are examined from leading to trailing. For each index column, DB2 will search for an indexable predicate on that column. If this predicate exists, then it can be used as a matching predicate.♦ If no matching predicate is found for a column, the search for matching predicates stops.♦ If a matching predicate is a range predicate, there can be no more matching columns.

The example above illustrates a composite index on LASTNAME, FIRSTNME AND MIDINIT. Because the first column of the index is referenced in an indexable predicate, DB2 is able (if it chooses) to use the index in a matching mode. Also, the existence of FIRSTNME in an indexablepredicate enables DB2 to use two columns of the index for matching.

Page 23: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

23

SELECT * FROM EMP WHERE LASTNAME = ‘Coldsmith’AND FIRSTNME = ‘Nichelle’;

Index Scan - Matching

XEMP032IEMP01

PREFETCHINDEXONLY

ACCESSNAME

MATCHCOLS

ACCESSTYPE

TNAMEMETHODPLANNO

N

PLAN_TABLE

Matching Index scans are depicted in the PLAN_TABLE by ACCESSTYPE = I, I1, N, or MX and MATCHCOLS > 0. For a Matching Index Scan, DB2 has determined that the query uses predicates that match index columns. In general, the matching predicates on the leading index columns are equal or IN predicates. The predicate that matches the final index column can be an equal, IN, or a range predicate (<, <=, >, >=, LIKE, or BE-TWEEN).

The query above illustrates matching index access. Assume the table EMP has an index; XEMP03 on (LASTNAME, FIRSTNME, MIDINIT). The index XEMP03 is the chosen access path for this query, with MATCHCOLS = 2. There are two equal predicates on the first two columns of the index.

In Visual Explain the IXSCAN detail shows which predicates were used to match columns along with their filter factors. Row estimates are computed and displayed based on the available statistics for the table and index. The value of MATCHCOLS is used to determine the number of columns DB2 can match to predicates in the query. Typically, index access will be more efficient the greater the number of matching columns.

Effort placed on proper index design can have a huge return on investment in terms of the efficiency of DB2’s ability to utilize matching indexes to query predicates.

Page 24: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

24

NXEMP031IEMP01

PREFETCHINDEXONLY

ACCESSNAME

MATCHCOLS

ACCESSTYPE

TNAMEMETHODPLANNO

SELECT * FROM EMPWHERE LASTNAME = ?

AND MIDINIT = ?

INDEX XEMP03 on(LASTNAME, FIRSTNME, MIDINIT)

Index Screening Predicate

Index Screening

PLAN_TABLE

Index screening predicates are specified on index key columns, but are not part of the matching columns used to scan the index structure. These screening predicates improve index access by reducing the number of rows that qualify while searching the index.

Assume the table EMP has an index; XEMP03 on (LASTNAME, FIRSTNME, MIDINIT);

The query above illustrates DB2s ability to use one of the two predicates matching against the index, i.e. with MATCHCOLS = 1. Once DB2 determines that a symbolic key entry matches on the predicate LASTNAME = ?, the predicate MIDINIT = ? can be applied during the index scan to further qualify rows. This is the process known as index screening. If a row meets the criteria of these screening predicates, the row will be retrieved. Once the data row has been retrieved, predicates for columns not in the index can be applied.

When Index Screening is used ♦ When there are predicates available to apply against columns in the index to further qualify rows. ♦ The PLAN_TABLE does not directly tell when an index is screened. However, if the MATCHCOLS is less than the number of in-dex key columns, this indicates index screening is possible. Visual Explain does flag predicates where index screening is used.

Page 25: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

25

Index Screening (cont)

Page 26: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

26

2) Data Pagesare returned in index order Data

Pages

Root Page

Non-Leaf Pages

Leaf Pages

Index XEMP03 = (LASTNAME, FIRSTNME, MIDINIT)

1) Leaf Pages are scanned to acquire corresponding RIDs

Index Scan - Nonmatching

Index Leaf Pages are read and their corresponding Data Pages are read

SELECT * FROM EMP WHERE FIRSNME = ?AND MIDINIT = ?;

A nonmatching index scan means there are no matching columns in the index. Because a nonmatching index scan does not utilize the index structure, it is sometimes referred to as relative positioning.

The example above illustrates a composite index on LASTNAME, FIRSTNME & MIDINIT. Because the first column of the index is not referenced in the WHERE clause, DB2 is unable to use the index in a matching mode. However, the existence of FIRSTNME and MIDINIT in the WHERE clause does give DB2 that ability to use index screening. Through a screening process, DB2 can use a nonmatchingindex scan to “pick” off the data rows associated with the desired criteria.The physical order in which a table's data pages are stored is important. DB2 uses the CLUSTERRATIOF to determine the effectiveness of the index for such access.

DB2 might also choose to scan a nonmatching index, in order to avoid a sort operation or to evaluate a stage 1 predicate. Typically, the CLUSTERRATIOF must be fairly high for this type of access strategy.

Page 27: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

27

SELECT * FROM EMP WHERE FIRSNME = ?AND MIDINIT = ?;

Index Scan - Nonmatching

NXEMP030IEMP01

PREFETCHINDEXONLY

ACCESSNAME

MATCHCOLS

ACCESSTYPE

TNAMEMETHODPLANNO

PLAN_TABLE

Nonmatching Index scans are described through EXPLAIN by ACCESSTYPE = I and MATCHCOLS = 0.

In Visual Explain it is possible to see which columns are used as screening predicates in a nonmatching index scan. Notice that MATCHCOLS shows up as zero in both the PLAN_TABLE and Visual Explain.

When Nonmatching Index Access is used Because there is little or no filtering, a nonmatching index scan is used in only a few special cases. ♦ When index screening is provided. In this case not all the data pages are accessed. Only those data pages that DB2 has de-termined qualified based on the screening. ♦ When the OPTIMIZE FOR n ROWS clause is used in conjunc-tion with an ORDER BY clause and the index can support the ordering. ♦ When there is more than one table in a nonsegmented table-space, the nonmatching index scan can provide access to rows of that table.

Page 28: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

28

SELECT * FROM EMP WHERE FIRSNME = ?AND MIDINIT = ?;

Index Scan - Nonmatching

Nonmatching Index scans are described through EXPLAIN by ACCESSTYPE = I and MATCHCOLS = 0. In Visual Explain it is possible to see which columns are used as screening predicates in a nonmatching index scan. Notice that MATCHCOLS shows up as zero in both the PLAN_TABLE andVisual Explain.

When Nonmatching Index Access is used Because there is little or no filtering, a nonmatching index scan is used in only a few special cases.

♦ When index screening is provided. In this case not all the data pages are accessed. Only those data pages that DB2 has determined qualified based on the screening.♦ When the OPTIMIZE FOR n ROWS clause is used in conjunction with an ORDER BY clause and the index can support the ordering.♦ When there is more than one table in a nonsegmented tablespace, the nonmatchingindex scan can provide access to rowsof that table.

Page 29: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

29

Index Structures – Clustered Index

Leaf Pages

8 13 33 45 75 86

25 61

Data Page Data Page Data Page Data Page

TABLE

Row

Non-LeafPages

RootPage

TABLE

SPACE

When a table has a clustering index during an INSERT, DB2 will insert the data row as closely as possible to the to order of the index values in the index structure. Because the order of the rows reflect the order of the index, significant performance advantages exist when performing certain operations such as grouping, ordering, and comparisons other than equal. DB2 uses a catalog statistic CLUSTERRATIOF to keep track of how closely the order of the index entries on the index leaf pages match the actual order of the data on the data pages. In general, the closer to 100% the value of CLUSTERRATIOF, the more closely the index entries and data entries are in the same clustered sequence.

The index structure above illustrates DB2’s access through a clustering index structure. This illustration depicts a CLUSTERRATIOF of 100%. Note that to access the data in index order, the data pages are read in sequential order.

Page 30: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

30

Leaf Pages

8 13 33 45 75 86

25 61

Data Page Data Page Data Page Data Page

TABLE

Row

Non-LeafPages

RootPage

TABLE

SPACE

Index Structures – NonClustered Index

The index structure above illustrates DB2’s access through a nonclustering index structure. This illustration depicts a CLUSTERRATIOF far less than 100%. Note that to access the data in index order, the data pages are read not in sequential order, but in random order, and in many cases a data page must be reread to access data containing the next key value.

Nonclustered indexes are typically used by DB2 for random access to data rows.

Page 31: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

31

Index Only Access

SELECT LASTNAME, FIRSTNME, MIDINITFROM EMPWHERE LASTNAME LIKE 'JO%'

If all the columns needed for a particular table in a query are available in an index, the optimizer may be able to qualify and retrieve the columns from the index without going to the tablespace at all. This is called index only access and may provide a significant performance improvement, particularly when many rows need to be evaluated using a non-clustered index.

In this example only 1 column in XEMP03 is being used to qualify rows, but placing the FIRSTNME and MIDINIT columns in the index provides significant benefit to this query.

Note that there is no table node for the EMP table in this diagram since only the index is used.

Page 32: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

32

Index Scan – List Prefetch

3) Data Pages are prefetched in order of the sorted RID list

2) Rid list is sorted in ascending sequence by data page number

1) A list of RIDS for data pagesare accessed by a matching index scan

L NXEMP021IEMP01

PREFETCHINDEXONLY

ACCESSNAME

MATCHCOLS

ACCESSTYPE

TNAMEMETHODPLANNO

Qualifying RIDs are sorted in ascending order by Data Page number prior to row retrieval

SELECT . . . .FROM EMPWHERE DEPTNO = ‘P01’;

INDEX XEMP02 onDEPTNO

Data Pages

Non-Leaf Pages

Root Page

Leaf Pages

RIDSORT

List Prefetch reads a set of data pages determined by a list of RIDs taken from a matching scan of one or more indexes. The data pages need not be contiguous. The maximum number of pages that can be read in a single list prefetch is 32.

The illustration above depicts List Prefetch during matching index access, with a single index. As with any matching index access the index structure is utilized to find the RIDs that qualify based on the indexable predicate(s). Once the RIDs have been determined at the Leaf Page level, the RIDs are sorted in Data Page sequence. The purpose of the RID sort is to avoid the rereading of data pages because the CLUSTERRATIOF value is very low.

Page 33: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

33

Index Scan – List Prefetch

It should be noted that while the qualifying data pages are not necessarily contiguous, since the RIDs have been sorted in ascendingsequence, the prefetch process is able to access qualifying data pages in an efficientand orderly manner.

When List Prefetch is used♦ With a single index that has a cluster ratio less than 80%.♦ Sometimes with indexes that have a high cluster ratio, if the estimated amount of data to be accessed is too small to makesequential prefetch efficient, but large enough to require more than one I/O.♦ Always in conjunction with multiple index access.♦ Always in conjunction with the inner table of a hybrid join.

When List Prefetch is not used♦ Matching IN-list predicates cannot be used in conjunction with List Prefetch.♦ The OPTIMIZE FOR 1 ROW clause will discourage List Prefetch for access

Page 34: SQL Optimization & Access Paths: What’s Old & New Part 1 User Group 1.pdf · DB2 DBA on both z/OS and LUW. ... BIND PACKAGE with option EXPLAIN ... A PLAN_TABLE is a regular DB2

34

Reference

IBM Books

SC18-7426 DB2 UDB for OS/390 and z/OS SQL Reference V8

SC18-7413 DB2 UDB for OS/390 and z/OS Administration Guide V8

SC18-7427 DB2 UDB for OS/390 and z/OS Utility Guide and Reference V8

SG24-6079 DB2 UDB for z/OS Version 8: Everything You Ever Wanted to Know, ... and More

Previous IDUG Presentations

IDUG North America 2007 – More Ways to Challenge the DB2 z/OS Optimizer by Terry Purcell of IBM