db2 performance tuning: tips & techniques
TRANSCRIPT
#IDUG
DB2 Performance Tuning: Tips & Techniques
Sweta Singh IBM
Session 6 16:15 – 17:15 – Bangalore 25 March 2014 - Chennai 27 March 2014 Platform: DB2 LUW
#IDUG
Agenda
• Performance Basics
• Diagnosing Performance Problems • Sanity Check
• First Aid toolkit
• OS tools
• New Monitoring in DB2 9.7
• OS & DB2 Monitoring Combo
• RCA Decision Tree
• Interesting Case Study
2
#IDUG
3
Basics: Orchestration matters
Hardware: CPU, Memory, Storage and Network
• CPU cores and RAM available continue to grow
• 12 cores/chip for both Intel Ivybridge EP and IBM POWER8
• 8GB RAM/core is a good Rule of Thumb
• Think of IOPS and MB/sec read/write for sizing storage
• Include SSDs in your storage strategy– they can make an astonishing difference!
o Random I/O, especially reads, benefit significantly
o Internal SSDs have best and most cost-effective performance but
• They often don't have a write cache
• Consider HA and DR
Software: OS, DB2 and Application
• Use best practice recommendations
• Stay current on maintenance
#IDUG
Basics: Understand the physical layout of your database
4
• Disk spindles still matter – With sophisticated storage subsystems and storage
virtualization it just requires more sleuthing than ever to find them
– Drives keep getting bigger, 146GB-300GB now the norm
• “Spread everything everywhere” strategy works reasonably well • For OLTP, make sure that the transaction logs
reside on a separate array
• Be leery of Storage Administrators that
tell you – “Don’t worry, it doesn’t matter”
– “The cache will take care of it”
#IDUG
Basics: Storage best practices
5
#IDUG
Basics: Best Practices (DB2 configuration)
6
Use DB2 automatic storage – A modest number of storage paths that map sensibly to the SAN storage
Minimize the use of distinct page sizes – 8K is a good default (no more than 2)
– Specify the smaller one at create database time for the catalog
Minimize the number of bufferpools – 1 for data/index, 1 for temp is usually sufficient
Splitt tablespaces for logical groups – Often helpful to split data/index/LOB (especially LOB) into separate tablespaces
– No need to put EACH table in its own tablespace
Use the Autonomics – Autoconfigure & STMM
• But give DB2 a budget via instance_memory or database_memory
Compression – Turn on compression at create table time for top ~50 large tables
#IDUG
Agenda
• Performance Basics
• Diagnosing Performance Problems • Sanity Check
• First Aid toolkit
• OS tools
• New Monitoring in DB2 9.7
• How to identify Resource Bottleneck
• RCA Decision Tree
• Interesting Case Studies
7
#IDUG
Performance problem has occurred: Sanity Check
• Did something change since when performance met expectations?
• New database applications?
• Other non-database loads on the system?
• More users?
• More data?
• Software configuration changes?
• DB, DBM configuration, registry variables, schema changes, etc.
• Hardware configuration changes?
• If you can identify the change at this stage, can it / should it be undone? Or is this something you have to live with?
• If this is a real problem, a methodical approach is key to finding the root cause
8
#IDUG
First Aid toolkit: OS tools
• Make vmstat your best friend • It will give you the first hint at where your bottleneck is
• iostat gives valuable insights into I/O activity
• Use “iostat –D” on AIX, “iostat –x” on Linux to get I/O response time
9
kthr memory page faults cpu time
----- ----------- ------------------------ ------------ ----------- --------
r b avm fre re pi po fr sr cy in sy cs us sy id wa hr mi se
12 1 17339981 47266810 0 0 0 0 0 0 24924 82462 72310 22 6 55 16 00:08:51
8 0 17339986 47266800 0 0 0 0 0 0 24454 78758 70106 21 6 55 17 00:09:01
6 0 17339986 47266797 0 0 0 0 0 0 22510 64438 58757 17 6 59 18 00:09:11
13 0 17339987 47266793 0 0 0 0 0 0 26469 88133 78037 24 7 50 19 00:09:21
11 0 17340019 47266759 0 0 0 0 0 0 25557 84203 74263 23 7 54 17 00:09:31
9 0 17340019 47266756 0 0 0 0 0 0 25454 82285 73301 22 7 54 17 00:09:41
hdisk77 xfer: %tm_act bps tps bread bwrtn
99.0 10.6M 1293.7 8.8M 1.8M
read: rps avgserv minserv maxserv timeouts fails
1072.9 12.3 0.1 307.3 0 0
write: wps avgserv minserv maxserv timeouts fails
220.8 0.8 0.1 515.2 0 0
queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull
0.0 0.0 0.1 0.0 13.0 0.0
hdisk78 xfer: %tm_act bps tps bread bwrtn
99.3 10.5M 1282.2 8.8M 1.7M
read: rps avgserv minserv maxserv timeouts fails
1069.5 13.0 0.1 398.4 0 0
write: wps avgserv minserv maxserv timeouts fails
212.7 1.0 0.1 589.6 0 0
queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull
0.0 0.0 0.0 0.0 14.0 0.0
#IDUG
First Aid toolkit: New Monitoring in DB2 9.7
• DB2 9.7 brings many big improvements in monitoring
• Component & wait times
• Static SQL / SQL procedure monitoring
• Improvements in low-overhead activity monitoring via event monitors
• db2pd sorting & currently committed statistics
• Section actuals
• SQL Access to performance data
• A good set of monitor queries makes diagnosing problems much easier
• Choosing & tracking the most important metrics
• Calculating derived values (hit ratio, time per IO, etc.)
• Comparing against baseline values
• Transitioning from snapshots to SQL monitoring made easy
• MONREPORT module
10
#IDUG
Glimpse of New Monitoring: monreport.dbsummary
11
Monitoring report - database summary -------------------------------------------------------------------------------- Database: DTW Generated: 07/14/2010 22:59:05 Interval monitored: 10 ================================================================================ Part 1 - System performance Work volume and throughput -------------------------------------------------------------------------------- Per second Total --------------------- ----------------------- TOTAL_APP_COMMITS 137 1377 ACT_COMPLETED_TOTAL 3696 36963 APP_RQSTS_COMPLETED_TOTAL 275 2754 TOTAL_CPU_TIME = 23694526 TOTAL_CPU_TIME per request = 8603 Row processing ROWS_READ/ROWS_RETURNED = 1 (25061/20767) ROWS_MODIFIED = 22597 Wait times -------------------------------------------------------------------------------- -- Wait time as a percentage of elapsed time -- % Wait time/Total time --- ---------------------------------- For requests 70 286976/406856 For activities 70 281716/401015 -- Time waiting for next client request -- CLIENT_IDLE_WAIT_TIME = 13069 CLIENT_IDLE_WAIT_TIME per second = 1306 -- Detailed breakdown of TOTAL_WAIT_TIME -- % Total --- --------------------------------------------- TOTAL_WAIT_TIME 100 286976 I/O wait time POOL_READ_TIME 88 253042 POOL_WRITE_TIME 6 18114 DIRECT_READ_TIME 0 100 DIRECT_WRITE_TIME 0 0 LOG_DISK_WAIT_TIME 1 4258 LOCK_WAIT_TIME 3 11248 AGENT_WAIT_TIME 0 0 Network and FCM TCPIP_SEND_WAIT_TIME 0 0 TCPIP_RECV_WAIT_TIME 0 0 IPC_SEND_WAIT_TIME 0 198 IPC_RECV_WAIT_TIME 0 15 FCM_SEND_WAIT_TIME 0 0 FCM_RECV_WAIT_TIME 0 0 WLM_QUEUE_TIME_TOTAL 0 0 Component times -------------------------------------------------------------------------------- -- Detailed breakdown of processing time -- % Total ---------------- -------------------------- Total processing 100 119880 Section execution TOTAL_SECTION_PROC_TIME 11 13892 TOTAL_SECTION_SORT_PROC_TIME 0 47 Compile TOTAL_COMPILE_PROC_TIME 22 27565 TOTAL_IMPLICIT_COMPILE_PROC_TIME 2 3141 Transaction end processing TOTAL_COMMIT_PROC_TIME 0 230 TOTAL_ROLLBACK_PROC_TIME 0 0 Utilities TOTAL_RUNSTATS_PROC_TIME 0 0 TOTAL_REORGS_PROC_TIME 0 0 TOTAL_LOAD_PROC_TIME 0 0 Buffer pool -------------------------------------------------------------------------------- Buffer pool hit ratios Type Ratio Reads (Logical/Physical) --------------- --------------- ---------------------------------------------- Data 72 54568/14951 Index 79 223203/45875 XDA 0 0/0 Temp data 0 0/0 Temp index 0 0/0 Temp XDA 0 0/0 I/O -------------------------------------------------------------------------------- Buffer pool writes POOL_DATA_WRITES = 817 POOL_XDA_WRITES = 0 POOL_INDEX_WRITES = 824 Direct I/O DIRECT_READS = 122 DIRECT_READ_REQS = 15 DIRECT_WRITES = 0 DIRECT_WRITE_REQS = 0 Log I/O LOG_DISK_WAITS_TOTAL = 1275 Locking -------------------------------------------------------------------------------- Per activity Total ------------------------------ ---------------------- LOCK_WAIT_TIME 0 11248 LOCK_WAITS 0 54 LOCK_TIMEOUTS 0 0 DEADLOCKS 0 0 LOCK_ESCALS 0 0 Routines -------------------------------------------------------------------------------- Per activity Total ------------------------ ------------------------ TOTAL_ROUTINE_INVOCATIONS 0 1377 TOTAL_ROUTINE_TIME 2 105407 TOTAL_ROUTINE_TIME per invocation = 76 Sort -------------------------------------------------------------------------------- TOTAL_SORTS = 55 SORT_OVERFLOWS = 0 POST_THRESHOLD_SORTS = 0 POST_SHRTHRESHOLD_SORTS = 0 Network -------------------------------------------------------------------------------- Communications with remote clients TCPIP_SEND_VOLUME per send = 0 (0/0) TCPIP_RECV_VOLUME per receive = 0 (0/0) Communications with local clients IPC_SEND_VOLUME per send = 367 (1012184/2754) IPC_RECV_VOLUME per receive = 317 (874928/2754) Fast communications manager FCM_SEND_VOLUME per send = 0 (0/0) FCM_RECV_VOLUME per receive = 0 (0/0) Other -------------------------------------------------------------------------------- Compilation TOTAL_COMPILATIONS = 5426 PKG_CACHE_INSERTS = 6033 PKG_CACHE_LOOKUPS = 24826 Catalog cache CAT_CACHE_INSERTS = 0 CAT_CACHE_LOOKUPS = 14139 Transaction processing TOTAL_APP_COMMITS = 1377 INT_COMMITS = 0 TOTAL_APP_ROLLBACKS = 0 INT_ROLLBACKS = 0 Log buffer NUM_LOG_BUFFER_FULL = 0 Activities aborted/rejected ACT_ABORTED_TOTAL = 0 ACT_REJECTED_TOTAL = 0 Workload management controls WLM_QUEUE_ASSIGNMENTS_TOTAL = 0 WLM_QUEUE_TIME_TOTAL = 0 DB2 utility operations -------------------------------------------------------------------------------- TOTAL_RUNSTATS = 0 TOTAL_REORGS = 0 TOTAL_LOADS = 0 ================================================================================ Part 2 - Application performance drill down Application performance database-wide -------------------------------------------------------------------------------- TOTAL_CPU_TIME TOTAL_ TOTAL_APP_ ROWS_READ + per request WAIT_TIME % COMMITS ROWS_MODIFIED ---------------------- ----------- ------------- ---------------------------- 8603 70 1377 47658 Application performance by connection -------------------------------------------------------------------------------- APPLICATION_ TOTAL_CPU_TIME TOTAL_ TOTAL_APP_ ROWS_READ + HANDLE per request WAIT_TIME % COMMITS ROWS_MODIFIED ------------- ------------------- ----------- ------------- ------------- 7 0 0 0 0 22 10945 70 30 1170 23 7977 68 46 1117 24 12173 65 33 1257 58 14376 73 28 1419 59 8252 69 43 1337 60 12344 71 32 1569 61 9941 71 37 1558 63 0 0 0 0 Application performance by service class -------------------------------------------------------------------------------- SERVICE_ TOTAL_CPU_TIME TOTAL_ TOTAL_APP_ ROWS_READ + CLASS_ID per request WAIT_TIME % COMMITS ROWS_MODIFIED -------- ------------------- ----------- ------------- ------------- 11 0 0 0 0 12 0 0 0 0 13 8642 68 1379 47836 Application performance by workload -------------------------------------------------------------------------------- WORKLOAD_ TOTAL_CPU_TIME TOTAL_ TOTAL_APP_ ROWS_READ + NAME per request WAIT_TIME % COMMITS ROWS_MODIFIED ------------- ---------------------- ----------- ------------- ------------- SYSDEFAULTADM 0 0 0 0 SYSDEFAULTUSE 11108 68 1376 47782 ================================================================================ Part 3 - Member level information - I/O wait time is (POOL_READ_TIME + POOL_WRITE_TIME + DIRECT_READ_TIME + DIRECT_WRITE_TIME). TOTAL_CPU_TIME TOTAL_ RQSTS_COMPLETED_ I/O MEMBER per request WAIT_TIME % TOTAL wait time ------ ---------------------- ----------- ---------------- ----------------- 0 8654 68 2760 271870
Work volume and throughput
-------------------------------------------
Per second Total
------------------------ ----------- -------
TOTAL_APP_COMMITS 137 1377
ACT_COMPLETED_TOTAL 3696 36963
APP_RQSTS_COMPLETED_TOTAL 275 2754
TOTAL_CPU_TIME = 23694526
TOTAL_CPU_TIME per request = 8603
Row processing
ROWS_READ/ROWS_RETURNED = 1 (25061/20767)
ROWS_MODIFIED = 22597
Wait times
---------------------------------------
-- Wait time as a percentage of elapsed time
% Wait time/Total time
--- --------------------
For requests 70 286976/406856
For activities 70 281716/401015
-- Time waiting for next client request --
CLIENT_IDLE_WAIT_TIME = 13069
CLIENT_IDLE_WAIT_TIME per second = 1306
-- Detailed breakdown of TOTAL_WAIT_TIME
% Total
--- -------------
TOTAL_WAIT_TIME 100 286976
I/O wait time
POOL_READ_TIME 88 253042
POOL_WRITE_TIME 6 18114
DIRECT_READ_TIME 0 100
DIRECT_WRITE_TIME 0 0
LOG_DISK_WAIT_TIME 1 4258
LOCK_WAIT_TIME 3 11248
AGENT_WAIT_TIME 0 0
:
Component times
-----------------------------------------------
-- Detailed breakdown of processing time --
% Total
---- --------
Total processing 100 119880
Section execution
TOTAL_SECTION_PROC_TIME 11 13892
TOTAL_SECTION_SORT_PROC_TIME 0 47
Compile
TOTAL_COMPILE_PROC_TIME 22 27565
TOTAL_IMPLICIT_COMPILE_PROC_TIME 2 3141
Transaction end processing
TOTAL_COMMIT_PROC_TIME 0 230
TOTAL_ROLLBACK_PROC_TIME 0 0
:
Buffer pool
---------------------------------------------
Buffer pool hit ratios
Type Ratio Reads (Logical/Physical)
---------- -------- ------------------------
Data 72 54568/14951
Index 79 223203/45875
XDA 0 0/0
Temp data 0 0/0
Temp index 0 0/0
Temp XDA 0 0/0
#IDUG
First Aid toolkit Queries: Core Activity
• Transactions, statements, rows
12
select
current timestamp as "Timestamp",
substr(workload_name,1,32) as "Workload",
sum(TOTAL_APP_COMMITS) as "Total app. commits",
sum(ACT_COMPLETED_TOTAL) as "Total activities",
case when sum(TOTAL_APP_COMMITS) < 100 then null else
cast( sum(ACT_COMPLETED_TOTAL) / sum(TOTAL_APP_COMMITS) as decimal(6,1)) end
as "Activities / UOW",
case when sum(TOTAL_APP_COMMITS) = 0 then null else
cast( 1000.0 * sum(DEADLOCKS)/ sum(TOTAL_APP_COMMITS) as decimal(8,3)) end
as "Deadlocks / 1000 UOW",
case when sum(ROWS_RETURNED) < 1000 then null else
sum(ROWS_READ)/sum(ROWS_RETURNED) end as "Rows read/Rows ret",
case when sum(ROWS_READ+ROWS_MODIFIED) < 1000 then null else
cast(100.0 * sum(ROWS_READ)/sum(ROWS_READ+ROWS_MODIFIED) as decimal(4,1)) end
as "Pct read act. by rows"
from table(mon_get_workload(null,-2)) as t
group by rollup ( substr(workload_name,1,32) );
#IDUG
First Aid toolkit: Time-spent Query #1
• TOTAL_RQST_TIME is “Time Spent in DB2” • Wait Time vs Processing Time
13
select
current timestamp as "Timestamp",
TOTAL_RQST_TIME,
TOTAL_RQST_TIME / TOTAL_APP_COMMITS “Time Spent per txn”,
(TOTAL_CPU_TIME+500) / 1000 as TOTAL_CPU_TIME ,
TOTAL_WAIT_TIME,
case when TOTAL_RQST_TIME < 100 then null else
cast(100.0 * TOTAL_WAIT_TIME/TOTAL_RQST_TIME as decimal(4,1)) end
as Wait_time_pct,
cast(100.0 * ((TOTAL_CPU_TIME+500) / 1000) / TOTAL_RQST_TIME as decimal(4,1))
as CPU_time_pct
from table(mon_get_workload(null,-2)) as t
group by rollup ( substr(workload_name,1,32) );
#IDUG
First Aid toolkit Queries: Time-spent Query #2
14
select current timestamp as "Timestamp",
substr(workload_name,1,32) as "Workload", ...
case when sum(TOTAL_SECTION_SORTS) < 1000 then null else cast(
100.0 * sum(SORT_OVERFLOWS)/sum(TOTAL_SECTION_SORTS)
as decimal(4,1)) end as "Pct spilled sorts",
case when sum(TOTAL_SECTION_TIME) < 100 then null else cast(
100.0 * sum(TOTAL_SECTION_SORT_TIME)/sum(TOTAL_SECTION_TIME)
as decimal(4,1)) end as "Pct section time sorting",
case when sum(TOTAL_SECTION_SORTS) < 100 then null else cast(
float(sum(TOTAL_SECTION_SORT_TIME))/sum(TOTAL_SECTION_SORTS)
as decimal(6,1)) end as "Avg sort time",
case when sum(TOTAL_RQST_TIME) < 100 then null else cast(
100.0 * sum(TOTAL_COMPILE_TIME)/sum(TOTAL_RQST_TIME)
as decimal(4,1)) end as "Pct request time compiling”,
case when sum(PKG_CACHE_LOOKUPS) < 1000 then null else cast(
100.0 * sum(PKG_CACHE_LOOKUPS-PKG_CACHE_INSERTS)/sum(PKG_CACHE_LOOKUPS)
as decimal(4,1)) end as "Pkg cache h/r",
case when sum(CAT_CACHE_LOOKUPS) < 1000 then null else cast(
100.0 * sum(CAT_CACHE_LOOKUPS-CAT_CACHE_INSERTS)/sum(CAT_CACHE_LOOKUPS)
as decimal(4,1)) end as "Cat cache h/r"
• Sorting, Compilation Time
#IDUG
15
First Aid toolkit : Time-spent Query #3
select current timestamp as "Timestamp",substr(workload_name,1,32) as "Workload",
sum(TOTAL_RQST_TIME) as "Total request time",
sum(CLIENT_IDLE_WAIT_TIME) as "Client idle wait time",
case when sum(TOTAL_RQST_TIME) < 100 then null else
cast(float(sum(CLIENT_IDLE_WAIT_TIME))/sum(TOTAL_RQST_TIME) as decimal(10,2)) end
as "Ratio of client wt to request time",
case when sum(TOTAL_RQST_TIME) < 100 then null else
cast(100.0 * sum(TOTAL_WAIT_TIME)/sum(TOTAL_RQST_TIME) as decimal(4,1)) end
as "Wait time pct of request time",
case when sum(TOTAL_WAIT_TIME) < 100 then null else
cast(100.0*sum(LOCK_WAIT_TIME)/sum(TOTAL_WAIT_TIME) as decimal(4,1)) end
as "Lock wait time pct of Total Wait",
... sum(POOL_READ_TIME+POOL_WRITE_TIME)/ ... as "Pool I/O pct of Total Wait",
... sum(DIRECT_READ_TIME+DIRECT_WRITE_TIME)/ ... as "Direct I/O pct of Total Wait",
... sum(LOG_DISK_WAIT_TIME)/ ... as "Log disk wait pct of Total Wait",
... sum(TCPIP_RECV_WAIT_TIME+TCPIP_SEND_WAIT_TIME)/ ... as "TCP/IP wait pct ...",
... sum(IPC_RECV_WAIT_TIME+IPC_SEND_WAIT_TIME)/ ... as "IPC wait pct of Total Wait",
... sum(FCM_RECV_WAIT_TIME+FCM_SEND_WAIT_TIME)/ ... as "FCM wait pct of Total Wait",
... sum(WLM_QUEUE_TIME_TOTAL)/ ... as "WLM queue time pct of Total Wait",
... sum(XML_DIAGLOG_WRITE_WAIT_TIME)/ ... as "diaglog write pct of Total Wait"
• Lock wait, I/O Wait, Network wait, Log wait….
#IDUG
First Aid toolkit : Hot queries
• Sorted by CPU, ROWS_READ, NUM_EXECUTIONS
16
select MEMBER, TOTAL_ACT_TIME, TOTAL_CPU_TIME,
(TOTAL_CPU_TIME+500)/1000 as "TOTAL_CPU_TIME (ms)",
TOTAL_SECTION_SORT_PROC_TIME,
NUM_EXEC_WITH_METRICS, substr(STMT_TEXT,1,40) as stmt_text
from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as t
order by TOTAL_CPU_TIME desc fetch first 20 rows only;
select ROWS_READ, ROWS_RETURNED,
case when ROWS_RETURNED = 0 then null
else ROWS_READ / ROWS_RETURNED end as "Read / Returned",
TOTAL_SECTION_SORTS, SORT_OVERFLOWS, TOTAL_SECTION_SORT_TIME,
case when TOTAL_SECTION_SORTS = 0 then null
else TOTAL_SECTION_SORT_TIME / TOTAL_SECTION_SORTS end as "Time / sort",
NUM_EXECUTIONS, substr(STMT_TEXT,1,40) as stmt_text ...
select TOTAL_ACT_TIME, TOTAL_ACT_WAIT_TIME, LOCK_WAIT_TIME,
FCM_SEND_WAIT_TIME+FCM_RECV_WAIT_TIME as "FCM wait time",
LOCK_TIMEOUTS, LOG_BUFFER_WAIT_TIME, LOG_DISK_WAIT_TIME,
TOTAL_SECTION_SORT_TIME-TOTAL_SECTION_SORT_PROC_TIME as "Sort wait time",
NUM_EXECUTIONS, substr(STMT_TEXT,1,40) as stmt_text ...
#IDUG
Finding your resource bottleneck
17
“OS & DB2 Monitoring” Combo
Bottleneck OS System Monitoring DB2 Monitoring
Disk • High I/O wait seen in vmstat / iostat / perfmon
• Following symptoms in iostat • Disk > 80% busy • Average I/O Response time > 10 ms • Long disk queues
• Disk I/O Wait time pct of wait time > 10% - 30% • ms per Bufferpool Read > 10 ms • ms per Bufferpool Write > 8 ms • ms per Log Write > 6 ms
CPU • Total CPU utilization near 100% in vmstat / perfmon
• Majority of time spent in processing (TOTAL_RQST_TIME – TOTAL_WAIT_TIME)
• Percent of time spent in sorting • > 5% for OLTP • > 25% for DSS
• Percent of time spent in compilation • > 1% for OLTP • > 5-10% for DSS
#IDUG
18
Finding your resource bottleneck Bottleneck OS System Monitoring DB2 Monitoring
Memory Low free memory seen in vmstat / perfmon
Swapping reported in vmstat / perfmon
Activity on swap disks seen in iostat
Higher-than-normal system CPU time seen in vmstat
• Db2pd –memsets • Db2pd -mempools
Lazy System • Lower than expected CPU utilization in vmstat
• Lock Wait time pct of wait time > 10% • Latch Wait time pct of wait time > 10%
#IDUG
RCA Decision Tree(*)
19
Bottleneck
Type?
Disk
Bottleneck
Data
Tablespace
Temp
Tablespace
Index
Tablespace
Log
Devices
Bad plan giving tablescan?
• Old statistics?
• Need more indexes?
Insufficient prefetchers?
Over-aggressive cleaning?
LOB reads/writes?
Bad plan(s) giving excessive
index scanning?
• Need more/different indexes?
Insufficient sortheap?
Missing indexes?
Anything sharing the disks?
High transaction rate
• Too-frequent commits?
• Mincommit too low?
High data volume
• Logging too much data?
CPU
Bottleneck Network
Bottleneck
“Lazy
System”
Data
Tablespace
High User
Time
High
System
Time
SQL w/o parameter markers?
Too small dyn SQL cache?
Apps connecting/disconnecting?
Non-parallelize application?
Old device drivers
Creating/destroying agents
Too many connections
Excessive data flow?
• LOBs
• intermediate data
Shared n/w conflict
Lock escalation?
Lock contention?
Deadlocks?
Too few prefetchers?
Too few cleaners?
Append contention?
Mincommit too high?
Inadequate disk configuration / subsystem?
(*) Steve Rees
#IDUG
Agenda
• Performance Basics
• Diagnosing Performance Problems
• Sanity Check
• RCA Decision Tree
• First Aid toolkit
• OS tools
• New Monitoring in DB2 9.7
• How to identify Resource Bottleneck
• Interesting Case Study
20
#IDUG
Case study
Problem Statement: During month end processing of a financial application, end users complain of very poor response time. Sometimes it gets so bad that DBAs have to ask end users to log off to bring performance under control
21
BTIMESTAMP ACT_COMPLETED_TOTAL TOTAL_RQST_TIME TOTAL_CPU_TIME TOTAL_WAIT_TIME WAIT_TIME_PCT
------------------------------------------------- -------------------- -------------------- -------------------- ------------------------------------------------------------------------------------------------
2013-12-2709.56.06.953792 6529869 10849020 4013850 2824043 26.0
2013-12-2710.26.40.357524 7745399 54753448 5276679 19553857 35.7
2013-12-2710.57.22.446657 10858202 62094470 7724838 24864248 40.0
2013-12-2711.28.20.821182 8357401 170340138 7457519 91122840 53.4
2013-12-2711.59.16.159745 11007690 55956823 9997584 22008190 39.3
2013-12-2712.29.57.777965 9404492 58226702 10549689 26355559 45.2
2013-12-2713.00.47.218433 7274839 39615706 7158033 12774956 32.2
2013-12-2713.31.32.355720 8404885 36547009 6592374 18233963 49.8
2013-12-2714.02.19.522536 7663970 92621096 7499419 56746389 61.2
2013-12-2714.33.06.331026 8458188 182307546 8925829 105314269 57.7
2013-12-2715.07.57.484561 8062552 213516694 7368941 130611517 61.1
2013-12-2715.43.00.412277 9089660 169469181 9180359 115169806 67.9
2013-12-2716.16.16.403637 10209457 133897425 10349969 87745314 65.5
2013-12-2716.49.40.864305 12077339 99660976 13295425 55573696 55.7
2013-12-2717.21.12.302235 11608401 62731089 12411983 31806858 50.7
2013-12-2717.54.08.895777 11497929 78053625 11681713 42326863 54.2
2013-12-2718.27.15.545434 12356058 94755516 10579364 53262337 56.2
Time Spent Query #1 : Where is DB2 spending time?
#IDUG
22
Time Spent Query : Wait time breakup & I/O response time BTIMESTAMP TOTAL_RQST_TIME CLIENT_IDLE_WAIT_TIME TOTAL_WAIT_TIME Wait time pct of
request time Lock wait time pct of Total Wait Pool I/O pct of Total Wait Direct I/O pct of Total Wait Log
disk wait pct of Total Wait TCP/IP wait pct of Total Wait IPC wait pct of Total Wait FCM wait
pct of Total Wait Lock waits Lock timeouts Deadlocks Lock escalations
-------------------------- -------------------- --------------------- --------------------
2013-12-2714.33.06.331026 182307546 751154075 105314269
57.7 11.1 83.9 3.5
1.0 0.3 0.0 0.0
5044 8 3 0
2013-12-2715.07.57.484561 213516694 681158134 130611517
61.1 6.8 86.6 4.9
1.3 0.2 0.0 0.0
14553 18 1 0
2013-12-2715.43.00.412277 169469181 770049577 115169806
67.9 15.7 79.2 3.4
1.2 0.3 0.0 0.0
9546 22 5 0
2013-12-2716.16.16.403637 133897425 802722190 87745314
65.5 7.8 86.3 3.5
1.7 0.5 0.0 0.0
10540 0 1 0
BTIMESTAMP BPName Total Reads Total Writes Async writes
Page cleaning ratio Avg read time Avg write time
-------------------------- --------------------
2013-12-2714.02.19.485416 IBMDEFAULTBP 16920842 663812
660660 99.5 3.18 24.50
2013-12-2714.33.06.293863 IBMDEFAULTBP 20670203 738442
734670 99.4 9.24 44.02
2013-12-2715.07.57.393521 IBMDEFAULTBP 21575971 384557
381150 99.1 17.88 45.06
2013-12-2715.43.00.216328 IBMDEFAULTBP 20366286 567330
566939 99.9 14.44 29.56
2013-12-2716.16.16.374577 IBMDEFAULTBP 22007743 596890
596750 99.9 9.72 20.10
#IDUG
OS tools
23
xfer: %tm_act bps tps bread bwrtn rps avgserv minserv maxserv timeouts fails wps avgserv
minserv maxserv timeouts fails avgtime mintime maxtime avgwqsz avgsqsz sqfull
hdisk434 95.1 2.4M 242.5 74.5K 2.3M 14.7 10.4 0.1 2.5S 0 0 227.8 12.3 0.1
5.0S 0 0 5.6 0.0 250.3 1.0 2.0 27.6 15:00:59
hdisk434 94.5 1.8M 258.3 110.5K 1.7M 21.8 11.7 0.1 2.5S 0 0 236.5 12.3 0.1
5.0S 0 0 7.2 0.0 250.3 1.0 3.0 41.0 15:01:59
hdisk434 94.3 1.8M 258.9 108.5K 1.7M 21.3 10.2 0.1 2.5S 0 0 237.7 12.5 0.1
5.0S 0 0 7.6 0.0 250.3 2.0 3.0 44.3 15:02:59
hdisk434 93.3 1.8M 251.9 103.4K 1.7M 20.2 10.2 0.1 2.5S 0 0 231.7 11.8 0.1
5.0S 0 0 6.6 0.0 250.3 1.0 2.0 35.7 15:03:59
2 disks at ~100%
disk utilization
#IDUG
Hot Queries
24
TIME ROWNUM RR/RS Rows Read Rows Returned TOTAL_ACT_WAIT_TIME
NUM_EXECUTIONS SQL_TEXT
---------------------------------------------------------------------------------------------------------
2012-12-27-14.32.45.822962 1 112 32599 291 1600321 294
select a.percentage,b.descr,d.cts_rm_billability from X a
2012-12-27-14.32.45.822962 2 - 1309 0 871785 1309
UPDATE PSIBQUEUEINST SET
2012-12-27-14.32.45.822962 3 1 679 679 819480 2401
SELECT POL_TOTAL FROM YL WHERE PERIOD_END_DT=? AND EMPLID = ?
2012-12-27-14.32.45.822962 4 1 1177 1177 563161 523
SELECT SHEET_ID, SEQ_NUM, ATTACHSYSFILENAME, ATTACHUSERFILE, LAST_UPDATE_DTTM,
2012-12-27-14.32.45.822962 5 1 537 536 510173 556
SELECT CUST_ID, PROJ_ROLE, DESCR FROM XXX WHERE CUST_ID=?
2012-12-27-14.32.45.822962 6 0 0 949 352744 1025
SELECT 'Y' FROM YYY WHERE EMPLID = ? FETCH FIRST 1 ROWS ONLY
2012-12-27-14.32.45.822962 7 85 8391 98 340375 109
SELECT SHEET_ID, SEQ_NBR_3, EXPENSE_TYPE, DESCR, TXN_AMOUNT,
2012-12-27-14.32.45.822962 8 1 268 268 333748 131
SELECT CUST_ID, PROJ_ROLE, DESCR FROM..
2012-12-27-14.32.45.822962 9 1 3964 3859 325703 36
select b.emplid from x a, y b where a.oprid = ?
2012-12-27-14.32.45.822962 10 3 925 293 288198 293
SELECT BUSINESS_UNIT, PROJECT_ID, EFFDT, RESOURCE_CATEGORY
2012-12-27-14.32.45.822962 11 1 181 181 285040 112
SELECT FILL.PROJECT_MANAGER,FILL APPROVAL_STS,FILL.COUNT_7
• Explain plans for the hot queries shows lot of I/O happening due to large index-scans and
table-scans
#IDUG
Performance Tuning
25
• Two pronged approach • Query Tuning: Create indexes to reduce large index scans and table
scans
• DB level Tuning :
• Compressing the top 50 large tables
• Increasing the Bufferpool size
• Storage team advised to add I/O capacity
#IDUG
Sweta Singh IBM [email protected]
DB2 Performance Tuning: Tips & Techniques Session 6
Please fill out your session
evaluation before leaving!