db2 subsystem health: performance and … 2012 09 13 db2 stats.pdftitle: db2 subsystem health:...

55
© 2012 IBM Corporation September 13, 2012 Mark Rader IBM ATS - DB2 for z/OS [email protected] DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories

Upload: phungdat

Post on 06-Apr-2018

223 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation

September 13, 2012

Mark Rader

IBM ATS - DB2 for z/OS

[email protected]

DB2 Subsystem Health: Performance and Availability

Based on Statistics and Stories

Page 2: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation2

Abstract

� Title: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories

� Abstract: Statistics reports and statistics traces can be used to analyze DB2 subsystem behavior and identify problems or risks for future problems. This session uses the OMPE Statistics Long report as a point of departure to discuss key subsystem indicators that you should be tracking. The statistics discussion leads to stories of real customer experiences, problems and solutions.

Page 3: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation3

Statistics Long – Report and Traces

� Used as prime indicator(s) for DB2 subsystem-related problems

� Based on DB2 Statistics records (IFCID 1,2, 225)

� Many of the DB2 statistics counters are running counters

� So you need at least 2 statistics records for a trace or report

– Traces show delta values between 2 subsequent DB2 statistics records

– Reports summarize statistics over user-defined intervals

Page 4: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation4

Statistics Long – Report and Traces

� Contain numerous sections:– SQL usage (DML, DCL, DDL, direct row access)– Stored proc, triggers, UDFs

– EDM pool– Subsystem services– Open/Close activity– Log activity

– Plan/package processing– DB2 commands– RID list processing, – Dynamic statement cache

– Authorization management– Locking activity / Data-sharing locking– Query parallelism– CPU times

– DB2 IFI requests, IFC, and data capture– DB2 latch counters– Buffer pool and Group buffer pool activity– DDF activity

– Storage statistics

Page 5: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation5

Statistics Long – Report and Traces

� Why at times should you run the statistics with REPORT and at other times with TRACE?

� Answer: It depends on the situation. Some key questions are:– What is the SMF interval?– Are you looking for specific problems? – If so, are the related fields accumulated, or are they a

snapshot view when the record was cut? • If the latter, a field probably does not contain information from the

entire interval.

� REPORT can work very well for the big picture

� TRACE may be required to capture significant changes

Page 6: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation6

Statistics Long – Report and Traces: Notes

� The remainder of the material focuses on a few of the sections available in the statistics report

– Note: these reflect a particular version of OMPE, therefore some formatting may be different than what you will see.

– Other vendor monitors should be able to display similar data; formatting is likely to vary and field meanings may also vary

� Keep in mind, some values are accumulated, some are the values at the time the statistics record is cut.

� Be careful regarding values in relation to the interval period. For some items a wide interval period is very helpful for a birds eye view of the environment, however for other items a wide interval period is worthless.

� Be careful when looking at High Water Marks (HWM). When was the HWM set? Was it a second ago? A week ago? Several months ago? We cannot tell just by looking at the report.

� HWM FAQ – What are some circumstances where the HWM is irrelevant because it was not set recently?

– The environment could have changed. For example, did you dynamically add or decrease CPUs, memory, disk, etc.?

– Or maybe dynamically alter some ZPARM values? We can even move disk or data sets without taking a DB2 outage. Investigating HWM issues takes time and effort.

Page 7: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation7

Statistics Long Report - Header

� Key fields

– Interval

– Total threads

– Total commits

Page 8: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation8

Statistics Long Report – SQL DML

Page 9: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation9

Statistics Long Report – SQL DML

� How much SQL are you executing in this interval, in the number of threads?

� Are you taking advantage of multi-row fetch or multi-row insert?

� These values are important for comparing one reporting interval to another

– Did the mix of SQL DML change with increase in thread volume, or was the ratio the same?

Page 10: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation10

Statistics Long Report – EDM Pool, part 1

Page 11: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation11

Statistics Long Report – EDM Pool, part 1

� Pool full conditions?

– Consider increasing sizes

� Some reports show ‘stealable pages’

– If stealable pages < 50% of the pool, consider increasing the allocation.

Page 12: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation12

Statistics Long Report – EDM Pool, part 2

Page 13: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation13

Statistics Long Report – EDM Pool, part 2

� Hit ratios should approach 100%

Page 14: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation14

Statistics Long Report – Dynamic Statement Cache

Page 15: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation15

Dynamic Statement Caching

� Avoids high CPU cost to fully prepare a dynamic SQL statement for every occurrence

� Used by dynamic SQL applications to reuse and share prepared statements

� Conditions for reuse of SQL statement from dynamic statement cache

– SQL is dynamically prepared SELECT, UPDATE, DELETE or INSERT

– The statement text is identical - character for character (literals problematic – V9)

– The authorization ID is the same

– REOPT(VARS) disables use of cache for that plan/package

� Two levels of caching

– Global Dynamic Statement Cache (GDSC)

– Local Dynamic Statement Cache (LDSC)

Page 16: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation16

Dynamic Statement Caching

� DB2 10 for z/OS provides literals replacement

– New Prepare attribute ‘CONCENTRATE STATEMENT WITH LITERAL (CSWL) to treat like parameter marker

– JCC 9.7 Fix Pack 3 : new connection property ‘enableLiteralReplacement=yes’

– ODBC CLI ini file : LITERALREPLACEMENT

Page 17: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation17

NO CACHING

DB2 Statement Caching – Overview

PREPARE SEXECUTE SCOMMITEXECUTE SPREPARE SEXECUTE S

PREPARE SEXECUTE SEXECUTE S

ProgramA

ProgramB

full prepare

prepared statement S

prepared statement S

prepared statement S

-514/-518

delete

DB2 z/OS

full prepare

full prepare

Page 18: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation18

CACHEDYN=YES

Global Dynamic Statement Cache

P R E P A R E SE X E C U T E SC O M M ITE X E C U T E SP R E P A R E SE X E C U T E S

P R E P A R E SE X E C U T E SE X E C U T E S

P ro g ra m A

P ro g ra m B

p re p a re d s ta te m e n t S

p re p a re d s ta te m e n t S

p re p a re d s ta te m e n t S

-5 1 4 /-5 1 8

d e le te

D B 2 z /O S

S K D SS

fu ll p re p a re

s h o rt p re p a re

sh o rt p re p a re

Page 19: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation19

Global Dynamic Statement Cache

� Enabled by ZPARM CACHEDYN = YES

� ZPARM EDMSTMTC specifies the global statement cache size above the bar

� Statement text and executable of the prepared statement (SKDS) is cached for reuse across all threads

� Only first prepare is full prepare, otherwise short prepare, which is a copy from global cache into thread storage

Page 20: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation20

CACHEDYN=YES

KEEPDYNAMIC(YES)

Local Dynamic Statement Cache

avoided prepare

PREPARE SEXECUTE SCOMMITEXECUTE S

PREPARE SEXECUTE SEXECUTE S

ProgramA

ProgramB

full prepare

short prepare

prepared statement S

prepared statement S

DB2 z/OS

SKDSS

Page 21: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation21

Local Dynamic Statement Cache� Enabled by zparm CACHDYN=YES and BIND option

KEEPDYNAMIC(YES)

� Zparm MAXKEEPD : max # of prepared statements that will be kept in the local cache (thread storage) beyond commit

� Prepared statements kept in thread storage across commit so that prepares can be avoided– Application need not reissue Prepares

� Same prepared SQL statement can be stored in several threads – Older Prepared Statements are thrown away from the LDSC at

commit based on MAXKEEPD

� Zparm CACHEDYN_FREELOCAL frees storage based on internal thresholds

� V10 : All Local Dynamic Statement cache and MAXKEEPD storage above the bar

– MAXKEEPD can be made larger to avoid more Prepares and improve performance

Page 22: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation22

Dynamic Statement Cache Statistics

� GDSC hit ratio = [Short Prepares] / [Short + Full Prepares]

� LDSC hit ratio = [Prepares Avoided]/[Prepares Avoided + ImplicitPrepares]

– If the statement is not found in the LDSC >> implicit prepare• Can result in either Short or Full PrepareI

– If ‘Cache Limit Exceeded’ increase MAXKEEPD value

Field Name Description

QXPREP NUMBER OF SQL PREPARE STATEMENTS

QISEDSI FULL PREPARE REQUESTS

QXSTIPRP IMPLICIT PREPARES

QXSTNPRP PREPARES AVOIDED

QXSTDEXP PREP STMT DISCARDED - MAXKEEPD

QXSTDINV PREP STMT DISCARDED - INVALIDATION

DYNAMIC SQL STMT QUANTITY /SECOND /THREAD /COMMIT

------------------------ -------- ------- ------- -------

PREPARE REQUESTS 124.5K 5.78 0.75 0.25

FULL PREPARES 17446.00 0.81 0.10 0.04

SHORT PREPARES 108.1K 5.02 0.65 0.22

GLOBAL CACHE HIT RATIO (%) 86.10 N/A N/A N/A

IMPLICIT PREPARES 0.00 0.00 0.00 0.00

PREPARES AVOIDED 5603.00 0.26 0.03 0.01

CACHE LIMIT EXCEEDED 0.00 0.00 0.00 0.00

PREP STMT PURGED 3.00 0.00 0.00 0.00

LOCAL CACHE HIT RATIO (%) 100.00 N/A N/A N/A

GDSC hit ratio should be > 90-95%

LDSC hit ratio should be >70%

Page 23: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation23

Dynamic Statement Cache Recommendations

� Global Dynamic Statement Cache

– Should be turned on if dynamic SQL is executed in the DB2 system

– Best trade-off between storage and CPU consumption for applications executing dynamic SQL

� Local Dynamic Statement Cache

– Should only be used selectively for application with a limited number of SQL statements that are executed very frequently (V9)

– Should NOT be used for DB2 systems that are constrained in DBM1 31-bit virtual storage (V9)

– V10 MAXKEEPD storage above the bar. It can be set to a higher value for better performance.

– In a distributed data sharing environment KEEPDYNAMIC YES will keep the DBAT active and will not be able to use dynamic workload balancing across the data sharing members.

Page 24: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation24

Monitoring Dynamic Statement Cache

� Create DSN_STATEMENT_CACHE_TABLE to hold the statistics

– Sample job DSNx10.SDSNSAMP(DSNTESC)� START TRACE(P) CLASS(30) IFCID(316,317,318)

– IFCID 316 contains the first 60 bytes of SQL text and execution statistics– IFCID 317 captures the full text of the SQL statement– IFCID 318 enables collecting the statistics

� Run the workload

� Issue statement EXPLAIN STMTCACHE ALL

– Puts all the statements from the global cache and statistics information into DSN_STATEMENT_CACHE_TABLE

� Stop the performance trace

� Evaluate the cached dynamic statements performance by selecting on the inserted rows from the DSN_STATEMENT_CACHE_TABLE table.

� Helpful to identify dynamic SQL statements with performance issues

� Workstation based tools (Data Studio Admin Client, Optim Query Workload Tuner) can be used for capture and analysis.

Page 25: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation25

Statistics Long Report – Open/Close Activity

Page 26: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation26

Open/Close Activity

� DSETS CONVERTED R/W->R/O

– Keep under 10-15 / minute

– This example: 130 / minute

Page 27: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation27

Statistics Long Report – Log Activity

Page 28: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation28

Log Activity

� READS SATISFIED- OUTP.BUF (%)

– Should be close to 100%

� READS SATISFIED- ACTV.LOG (%)

– Keep low

� READS SATISFIED- ARHC.LOG (%)

– Should be zero

� UNAVAILABLE OUTPUT LOG BUFF

– Should be zero

� OUTPUT LOG BUFFER PAGED IN

– Should be zero

Page 29: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation29

Statistics Long Report – RID List Processing

Page 30: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation30

RID List Processing

� EXCEED RDS LIMIT

– RIDS > 25% of table rows

• According to RUNSTATS

– Prior to DB2 10 for z/OS – generally switches to TS scan

– Usually old statistics – keep RUNSTATS up to date

Page 31: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation31

Statistics Long Report – Latch Counters

Page 32: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation32

Internal DB2 Latch Contention

� Typical high latch contention classes

– LC06 = Index split latch

– LC14 = Buffer pool LRU and hash chain latch

– LC19 = Log latch

– LC24 = Prefetch latch or EDM LRU chain latch

� Latch Class details in IFCID 51/52 (Share) and 56/57 (Exclusive) performance trace

� Disabling Acct Class 3 trace can help reduce CPU time due to high latch contention

Field Name Description

QVLSLC01 to QVLSLC32

INTERNAL LATCH CONTENTION BY CLASS 1-32

LATCH CNT /SECOND /SECOND /SECOND /SECOND

--------- -------- -------- -------- --------

LC01-LC04 0.00 0.00 0.00 0.00

LC05-LC08 0.00 75.62 0.00 0.01

LC09-LC12 0.00 0.79 0.00 1.25

LC13-LC16 0.01 676.17 0.00 0.00

LC17-LC20 0.00 0.00 105.58 0.00

LC21-LC24 0.08 0.00 6.01 4327.87

LC25-LC28 4.18 0.00 0.02 0.00

LC29-LC32 0.00 0.20 0.57 25.46

Typically not a concern < 10K per second

Page 33: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation33

Internal DB2 Latch Contention …

� LC06 for index tree P-lock by index split– Index split is particularly painful in data sharing

• Results in two forced physical log writes– Index split time can be significantly reduced by using faster active log device

– Options to reduce index split• Index freespace tuning for random Insert• Minimum index key size especially if unique index• V8 : NOT PADDED index for large varchar columns • V9 : Large index page size, Asymmetric leaf-page split • Number of index splits in LEAFNEAR/FAR in SYSINDEXPART and RTS

REORGLEAFNEAR/FAR– Function Code (FC) X’46’ in IFCID 57 performance trace– FC X’FE’ index tree latch in non data sharing - in LC07– V10 : IFCID 359 for Index split

Page 34: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation34

Internal DB2 Latch Contention …

� LC14 Buffer Pool latch

– If many tablespaces and indexes, assign to separate buffer poolswith an even Getpage frequency

– If objects bigger than buffer pool, try enlarging buffer pool if possible

– If high LC14 contention, use buffer pool with at least 3000 buffers

– Use FIFO rather than LRU buffer steal (PGSTEAL) algorithm if there is no read I/O, i.e. object(s) entirely in buffer pool

• LRU = Least Recently Used buffer steal algorithm (default)• FIFO = First In First Out buffer steal algorithm

– Eliminates a need to maintain LRU chain which in turn > Reduces CPU time for LRU chain maintenance> Reduces CPU time for LC14 contention processing

– V10 : PGSTEAL NONE for in memory data / index

Page 35: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation35

Internal DB2 Latch Contention …

� LC19 Log latch– Minimise #log records created via

• LOAD RESUME/REPLACE with LOG NO instead of massive INSERT/UPDATE/DELETE

• Segmented or UTS tablespace if mass delete occurs– Increase size of output log buffer if non-zero unavailable count

• When unavailable, first agent waits for log write• All subsequent agents wait for LC19

– Reduce size of output log buffer if non-zero output log buffer paging– Reduced LC19 Log latch contention in DB2 9

• Log latch not held while spinning for unique LRSN– V10 improvements :

• Log latch time held minimized• Conditional attempts to get log latch before unconditional request

Page 36: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation36

Internal DB2 Latch Contention …

� LC24 latch– Used for multiple functions :

• EDM LRU latch – FC X’18’ in IFCID 57 performance trace– Use EDMBFIT zparm of NO (V9)– Thread reuse with RELEASE DEALLOCATE instead of RELEASE COMMIT for

frequently executed packages• Prefetch scheduling – FC X’38’ in IFCID 57 performance trace

– Higher contention possible with many concurrent prefetches related to Sort, Workfile, Parallel query processing

– Disable dynamic prefetch for in memory data/index BP by setting VPSEQT=0 (V9)

– V10 : PGSTEAL NONE for in memory data/index BP– Use more partitions

– V10 : • Moving CT,PT from EDM pool to thread pools reduced LC24 significantly• Latch no longer used for Buffer Manager page latch/unlatch

Page 37: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation37

Additional V10 Latch contention reductions

� LC12 - Latch to coordinate Global Transactions– Increased number of hash entries and improved hashing

algorithm

� LC27 – Stored Procedure Queue Latch– Increased number of latch entries

� LC32 – Storage Pool Latch– Moved shared storage to private thread storage pool– Improved storage pool space management algorithm

� DB2 10 : High volume, many concurrent transactions with high latch contentions expected to benefit significantly with V10 latch contention reduction improvements

Page 38: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation38

Statistics Long Report – Locking Activity

Page 39: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation39

LOCK Tuning

� Take advantage of data page lock avoidance

– Use Isolation CS (Cursor Stability) and Bind option CURRENTDATA NO• Use Isolation UR (Uncommitted Read) where acceptable

� REORG to reduce locks on

– Variable/Compressed Overflow record/pointer lock– Locks related to pseudo deleted index entries

� LOCK Size Page vs Row :– Page Lock (Default) :

• Better CPU time, lower concurrency, better for sequential Insert / Update / Delete• Fewer locked resources – lower probability for lock escalation, less IRLM latch

contention, false contention– Row Lock :

• Recommended for workloads with deadlock, timeouts, high lock contention• Lock avoidance more critical• Data page P-lock overhead in data sharing environment

– Page lock with Maxrows 1 will avoid page P-lock and page latch contention on data pages

Page 40: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation40

Lock Tuning ……

� In V10 all DB2 Catalog & Directory Tablespaces except those with MAXROWS=1 changed to Row Level locking

– Much less DB2 Catalog / Directory contention

– Supports better concurrency for Bind, Prepare, DDL

• More locks acquired due to Row Level Locking

Page 41: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation41

Statistics Long Report – Data Sharing Locking

Page 42: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation42

DB2 Data Sharing Architecture

IRLM Buffer Pools

DB2A

Coupling Facilities

Locks

Group Buffer Pools

112234567891011

SCA

Sysplex timers

112234567891011

DB2A Log

IRLM Buffer Pools

DB2B

IRLM Buffer Pools

DB2n

...Shared DASD

DB2B Log DB2n Log... DB2 Cat/Dir DB2 DBs

Page 43: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation43

Data Sharing Lock Tuning

� Types of Data Sharing Lock Contentions :– IRLM Contention = IRLM resource contention– XES Contention = XES-level resource contention as XES only understands S or X

• Member 1 asking for IX and member 2 for IS • Big relief in V8 with IRLM Protocol 2 support

– False Contention = false hash contention on lock table hash anchor point• Minimized by increasing the # of Lock entries in the CF Lock table

� Keep Global Contention Rate (=CONT/(CONT+sum of SYNCH XES Lock, Unlock, and Change requests) where CONT=sum of IRLM, XES, and False Suspends) < 5 %

� Lock avoidance more important in data sharing

Page 44: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation44

Data Sharing Lock Tuning ….

� Page P- lock used for inter-system serialization in data sharing

– Page latch for intra-system serialization (non data sharing)

– Page P-lock held at member level• Typical Page P- locks for : Spacemap, Data Page, Index Leaf Page

� P-lock contention and negotiation can cause IRLM latch contention,Page latch contention, Asynchronous GBP write, Active log write, GBP read

– Breakdown by page P-lock type in GBP statistics• Spacemap page, Data page, Index Leaf page

� ‘Other’ P-lock negotiation for

– Index tree P-lock, Castout P-lock, SKCT/SKPT P-lock

� Page P-lock contention by one thread causes Page Latch contention for all other threads in the same member trying to get to the same page

Page 45: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation45

Data Sharing Lock Tuning ….

� To reduce Data and Spacemap Page P-Lock contention

– Member Cluster Option for high Insert environment

• Separate Spacemap page for 199 pages for each member

– Trackmod No Option when not using Incremental Image Copy

• Avoids Spacemap page update for changed pages

– Spread Inserts over partitions

• Use random key for partitioning if possible

Page 46: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation46

Data Sharing GBP Tuning

� Applicable to GBPCACHE=CHANGED

� GBP Thresholds for CASTOUT processing :

– GBPOOLT - GBP level – default 30%

– CLASST – Dataset / castout class level – default 5%

– GBPCHKPT – GBP Checkpoint interval – default 4 minutes

� Avoid Write Failed or DSNB325A Critical Shortage in GBPxx

– Use bigger GBP and/or smaller CLASST threshold

– CLASST threshold Castout processing is more efficient

� Increase the GBP size (more data pages) if the Sync Read XI miss ratio (SyncReadXInodata/SynchReadXI) > 10%

� Avoid cross invalidations due to directory reclaims

– Use XES Auto Alter (CF ALLOWAUTOALT(YES)) to avoid directory full and directory entry reclaim condition

Page 47: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation47

Statistics Long Report – Global DDF Activity

Page 48: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation48

Statistics Long Report – CPU Times

Page 49: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation49

DB2 CPU Times – major contributors

Application DBM1 MSTR IRLM

ACCOUNTING STATISTICS

TCB

SRB

SQL processing Synch I/O

Global lock requests*

Buffer updates

Lock requests Logical logging

GBP reads*

The same as in TCB case, but only in enclave preemptible SRB mode.

Reported in TCB instrumentation.

Dataset Open/Close

ExtendPreformat

Deferred write

Castout*

P-lock negotiation*

Async GBP write*

GBP checkpoints*

Archiving

BSDS processing

Physical log write

CheckpointsBackouts

Thread deallocation

Error checking

Management

Deadlock detection

IRLM and XES global contention*

(*) Data Sharing specific

DBM1 Full System Contraction

Update commit

incl. page P-lock unlock*Notify Exit*

Prefetch read

Parallel child tasks

Delete Name*

Async XES request*

Local IRLM latch contention

P-lock negotiation*

Page 50: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation50

Address Spaces CPU TimeCPU,TIMES TCB TIME PREEMPT SRB NONPREEMPT SRB TOTAL TIME PREEMPT IIP SRB /COMMIT------------------------------- --------------- --------------- --------------- --------------- --------------- --------------SYSTEM SERVICES ADDRESS SPACE 0.234282 10.509478 0.040913 10.784673 N/A 0.000014DATABASE SERVICES ADDRESS SPACE 0.045010 2.819815 0.014957 2.879781 32.869961 0.000004IRLM 0.000030 0.000000 0.387322 0.387352 N/A 0.000001DDF ADDRESS SPACE 0.073110 9:47.165313 54.447805 10:41.686227 10:03.853060 0.000859

TOTAL 0.352431 10:00.494605 54.890997 10:55.738033 10:36.723021 0.000877

• All TCB times should be low relative to MSTR and DBM1 SRB times.• IRLM SRB time should be low relative to MSTR and DBM1 SRB times.• For distributed application, DDF SRB is typically the highest as it includes Accounting TCB time also.• PREEMPT IIP SRB shows the portion of SRB time consumed on a zIIP processor.• Per Commit value does not include zIIP CPU time.

• If IRLM CPU is high, look into issues with locking• MSTR CPU is mostly for logging – typically not an issue.• If DBM1 TCB time is high look into dataset open / close (zparm DSMAX) issues• If DDF TCB time is high look into connection authorization issues.

V10 :• Most of the DBM1 and MSTR SRB processing are pre-emptible• Prefetch and Deferred write processing run under enclave SRB and hence zIIP eligible.

Page 51: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation51

Statistics Long Report – DBM1 Virtual Storage

� Single view from one interval

� Statistics trace gives better view – track changes

� Best is to use MEMU2 or MEMUSAGE to track virtual storage use over time

– Download from DB2 for z/OS Exchange:

– http://www.ibm.com/developerworks/software/exchange/db2zos

Page 52: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation52

DB2 10 DBM1 Storage improvement and scalability

Thread, Stack.Working memory

DB2 10

SKCT / SKPT

Global DSC DBD

CT/PT

Local DSC

Thread, Stack

Buffer Pools, Sort,RID pools

Use this space to Reduce CPU!

� EDM storage - 100% above the 2 GB bar

� Thread + Stack - 70-90% less usage in DB2 10 compared to DB2 9

• xPROC (SPROC, IPROC, UPROC, etc) loaded in below the 2GB bar

• Built in BIND time, shared at runtime

� Can support more concurrent threads

– Zparms CTHREAD and MAXDBAT limit increased– Enables consolidation of data sharing members

� Reduce CPU time at the expense of virtual storage – More thread reuse to avoid allocate/deallocate

– Wider usage for bind option RELEASE(DEALLOCATE)

– High Performance DBATs

– Larger MAXKEEPD values for KEEPDYNAMIC=YES users to avoid Prepares

� In V10 virtual storage is no longer an issue. Need to monitor real storage use.

– Zparms : REALSTORAGE_MAX, REALSTORAGE_MANAGEMENT

Page 53: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation53

Statistics Long Report – Real and Aux Storage

Page 54: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation54

Statistics Long Report – Buffer Pools

� Buffer pool monitoring and tuning probably a separate topic

– And many of you already know about it

� Setting VDWQT to 0 is good if the probability to re-write the page is low

– DB2 waits for up to 40 changed pages for 4K BP (24 for 8K, 16 for 16K, 12 for 32K) and writes out 32 pages for 4K BP (16 for 8K, 8 for 16K, 4 for 32K)

� Setting VDWQT and DWQT to 90 is good for objects that reside entirely in the buffer pool and are updated frequently

� In other cases, set VDWQT and DWQT low enough to achieve a "trickle" write effect in between successive system checkpoints

– Setting VDWQT and DWQT too low may result in poor write caching, writing the same page out many times, short deferred write I/Os, and increased DBM1 SRB CPU resource consumption

• If you want to set VDWQT in pages, do not specify anything below 128

� PGSTEAL algorithm

– LRU is default

– FIFO for in memory objects + VPSEQT=0 (V9)

– NONE for in memory objects (V10)

Page 55: DB2 Subsystem Health: Performance and … 2012 09 13 DB2 Stats.pdfTitle: DB2 Subsystem Health: Performance and Availability Based on Statistics and Stories ... Optim Query Workload

© 2012 IBM Corporation55

Questions

� If you think of a question later, just send me an e-mail.

� Thank you for your attention