top 5 performance and capacity challenges for z/os
TRANSCRIPT
www.metron-athene.com
Top 5 Performance and Capacity Challenges for z/OS
Commercial in Confidence
www.metron-athene.com
Agenda
• What has happened in the past 12 months with the mainframe
• How organizations are making better use of their mainframe capacity
• The Top 5 performance and capacity challenges facing organizations
Commercial in Confidence
www.metron-athene.com
Happy 50th Anniversary
• IBM announced the new System/360 family of computers in 1964 as a replacement for the low- to mid-size market (at the time).
Commercial in Confidence
www.metron-athene.com
4
Difference between Performance Management and Capacity Management
Capacity versus Performance
Performance Management deals with how quickly a resource responds to a request for service and mitigate any existing issues. Performance can be affected by capacity constraints.
Capacity Management deals with planned enhancements to existing systems, growth of existing systems or the design of new systems that are evaluated to find the necessary resources required to provide adequate performance at a reasonable cost.
www.metron-athene.com
05/03/2023 www.metron-athene.com 5
Activities
Business
Service
Component
Capacity
Plan
CMIS
Demand
ManagementModelling
Application
Sizing
Monitor
Analyze
Tune
Implement
www.metron-athene.com
What has happened in the past 12 months
• z12 machine builds on z10 / zEnterprise– Hexa-core 5.5 GHz CPUs, up to 101 cores, max 3 TB
memory, 1 TB per LPAR, 2GB page support– zManager controls z12 and zBX (AIX, Windows, Linux)– CFCC Level 19 : Flash Express for CF resilience
• zOS continues to provide innovation within the enterprise– z/OS v2.1 – better cross-memory communications,
Enterprise Data Compression. Data Ready, Cloud Ready, Security Ready
– PL/I compiler exploits Decimal-Floating-Point Zoned-Conversion Facility
Commercial in Confidence
www.metron-athene.com
How are organizations making better use of the mainframe
• Cloud
• Big Data
• New and exciting buzzwords for the non-mainframers...kind-of business as usual for us
• z/OS and z/VM and z/Linux make a winning combination
Commercial in Confidence
www.metron-athene.com
Top 5 Performance and Capacity Challenges
• zOS in the enterprise
• zIIP Capacity Planning and DB2
• MQ on z/OS
• Likely changes to the mainframe in the next 12 months
Commercial in Confidence
www.metron-athene.com
z/OS in the Enterprise
• Continues to be a critical IT asset housing mission-critical applications
• Greying of the mainframe workforce is causing issues within some organizations
• z/VM enables consolidation of hardware within the Enterprise
• Parallel Sysplex upgrades improve clustering of mainframes
Commercial in Confidence
www.metron-athene.com
Critical IMS Business Application
Commercial in Confidence
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
Jan/27/2013
I00EE027,IIMDDKU2,IMS0
I00EE046,IIMDDKU3,IMS0
I00EE047,IIMDD202,IMS0
I00EE089,IIMDDKU1,IMS0
I00EE116,IIMDD201,IMS0
N078,IIMDD201,IMS0
V535,IIMDMU01,IMS0
V572,IIMDMU01,IMS0
V578,IIMDMU01,IMS0
Y009,IIMCDB03,IMS0
IMS0 Logical UnitsTop 10
Average Response Time (Sec)Reporting Period: January/27
www.metron-athene.com
Tools for the Greying Workforce
Commercial in Confidence
• Tuning Hints– Provide automated analysis of SMF data – Point out anomalies– Provide focus on real issues
• Knowledge base of problems– Provide information as to the component in question– Allows for training of less experienced personnel
www.metron-athene.com
z/OS Tuning Hints
Commercial in Confidence
Summary
◎: No problem ○: No problem but there is some concern △: Possible issue in the future ×: There is a problem CWJ2
PROCESSOR ○
・ The Entire Host Processor usage is about 90% per interval at some times. ( 10:00~ 10:15、 10:45~ 11:00)The response time in ONLINE time is fast so there is no problem now.
If the number of transactions increase in future then online response time may be affected due to increasing processor wait.
・ CPU usage is more than defined by LPAR weight (21.43%) though there is no problem because processor wait is small.( ONLINE_ONLHI_01,「 ONLINE_ONLVHI_01) are Workloads of note
・ Processor is being used efficiently with Hiperdispatch. (High processor share : 2CP out of 12CP)
CENTRAL STORAGE ◎・ Central storage has adequate capacity. Usage was less then 80%.
.(Central storage size is 19 GB, and about 3GB was not used)・ No page-ins were seen and the average minimum UIC was 65535.
VIRTUAL STORAGE △・ An ESQA shortage occurred, and ECSA was being used. The maximum used size was about 28 MB
This is an emergency measure to avoid a system outage. It should not generally occur. Extend the size of ESQA to avoid this.
www.metron-athene.com
Parallel Sysplex Upgrades
• Cross-system Coupling Facility (XCF) watching for dead or dying LPARS and take action
• Enhanced detection when a coupling facility (CF) structure takes to long to respond
• Enhanced support XCF usage across multiple sites and use of high speed communication lines
Commercial in Confidence
www.metron-athene.com
Coupling Facility within Parallel Sysplex
Commercial in Confidence
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
Feb/04/2014
BBK1,INBOUND
BBK1,OUTBOUND
BBK2,INBOUND
BBK2,OUTBOUND
BBL1,INBOUND
BBL1,OUTBOUND
BBL2,INBOUND
BBL2,LOCAL
BBL2,OUTBOUND
BBL3,INBOUND
BBL3,LOCAL
BBL3,OUTBOUND
BBL4,INBOUND
BBL4,OUTBOUND
BBL5,INBOUND
BBL5,OUTBOUND
BBL6,INBOUND
BBL6,OUTBOUND
BBL7,INBOUND
BBL7,OUTBOUND
BBL8,INBOUND
BBL8,OUTBOUND
BBLA,INBOUND
BBLA,OUTBOUND
GLL3,INBOUND
GLL3,OUTBOUND
GLL6,INBOUND
GLL6,OUTBOUND
GLL8,INBOUND
GLL8,OUTBOUND
WYL2,INBOUND
WYL2,OUTBOUND
WYL5,INBOUND
WYL5,OUTBOUND
Coupling FacilitySend message count (Total)
Reporting Period: February/01 - February/28
Understanding the data usage profile to monitor peaks and spikes
www.metron-athene.com
• A couple of zIIP myths– Don’t run a zIIP more than 30% busy vs. you can’t run a zIIP
at 100% busy...nope!• The truth is somewhere between, although archetypal usage of
zIIP processors is low with occasional bursts– Only DB2 with DDF uses zIIP...nope!
• DRDA over TCPIP• Parallel Query• DB2 Utilities Load, Reorg, Rebuild• DB2 V9 z/OS remote native SQL procedures• TCPIP – IPSEC• zIIP Assisted HiperSockets Multiple Write• OMEGAMON XE for CICS on z/OS 510 and later• SORT / SYNCSORT / CA-SORT• Shadow Direct• And others...
Commercial in Confidence
zIIP Capacity Planning
www.metron-athene.com
Commercial in Confidence
zIIP Capacity Planning
0
10
20
30
40
50
60
70
00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
Mar 24
zIIP usage, 4 systems stacked4 GCPs : 1 zIIP
Typical usage profile. Mostly low, some peaks, couple of “spikes”
www.metron-athene.com
• Measure everywhere you can to see the fuller picture– LPARs– Service/Report Classes– Jobs– Transactions
• Decide what %busy is “full” for you– If you see zIIP-on-CP in your peak period, consider tuning
first; last resort = buy more hardware or just live with it!
Commercial in Confidence
zIIP Capacity Planning
www.metron-athene.com
• LPAR configuration– Marketing rules changed - now, 1 GCP : 2 zIIPs, was 1
GCPs : 1 zIIP – EC12 is the last machine to have zAAP hardware –
only zAAP-on-zIIP after this– Plan LPAR configuration weights / examine effect of
processor upgrades especially across families– Watch your logical : physical ratios – the queuing
characteristics of a uni- are much worse than even a two-way system...
Commercial in Confidence
zIIP Capacity Planning
www.metron-athene.com
Commercial in Confidence
zIIP Capacity Planning
Response Time
0 50% 100%Utilization
Service Time
Single CPU
Dual CPU16-way CPU
www.metron-athene.com
• LPAR configuration (cont’d)– You can’t choose what uses zIIP (or zAAP) : if it’s there,
and work is z**P-eligible, the system will try to use it. The only control you have is not to configure it to an LPAR
– With a low-weight LPAR, maybe if only DB2 “DBM1” likely to use zIIP maybe it’s better not to configure it rather than dilute available processing power?
• Upgrading DB2 to v10– In v10 100% of Deferred Write and Prefetch Engine
work is zIIP eligible; most of the work of the DBM1 address space becomes zIIP-eligible
– Highly performance sensitive
Commercial in Confidence
zIIP Capacity Planning
www.metron-athene.com
– You need to protect access to the CPU for critical address spaces in WLM
– Velocity of 90% or better to maintain throughput• Falling-back to CP
– If z**P can’t handle a piece of work in a given time, it may be rescheduled to the CP queue: can work in MSTR or DBM1 wait that at least long (wait for z**P + wait for CP)? Remember z**P may not be free but is free to use; CPs are neither free nor free to use
– CCCAWMT= in IEAOPTxx controls the wait time; different defaults if HIPERDISPATCH is on or off
– Look at z**P-on-CP time, number of transactions etc, and do some guessing as to how much it may delay things...there’s no easy way to report on this
Commercial in Confidence
zIIP Capacity Planning
www.metron-athene.com
• zAAP-on-zIIP – Newer mainframes will no have zAAPs, just zIIPs– zAAP-on-zIIP runs zAAP work on a zIIP– You can’t see what’s “zAAP” work when it runs on zIIP– Users of zAAP include CICS, Websphere, XMLSS
• The dog that didn’t bark– Some software checks if z**P is available and if none
present won’t generate z**P-eligible work– PROJECTPU= is great for showing what was
requested to run on z**P, but will show zeros - until you turn one on, then it may be too late...
Commercial in Confidence
zIIP Capacity Planning
www.metron-athene.com
WMQ on z/OS
• WMQ can only perform as well as the underlying resources that are used
• There are a number of things that can be done with the queue managers to help control performance issues, along with what has been exposed via the WMQ SMF data that aid in finding tuning opportunities
• The area for the largest affect on performance is bufferpools
• Often upgrading to the latest version will unlock performance improvements
Commercial in Confidence
www.metron-athene.com
WMQ Queue Manager Performance - Bufferpools
• Virtual Storage – Largest user are bufferpools– Bufferpools
• For private queues, messages are put into bufferpools• Bufferpool allocation and tuning are critical for private
queue performance• MQ Statistics track use of time
– Real time messages are indicators that the pool is currently constrained
– Bufferpool thrashing» Caused by messages being put in the queue
at the same time and being read into a constrained bufferpool
» MQ page datasets back messages on DASD
Commercial in Confidence
www.metron-athene.com
WMQ Queue Manager Performance - Bufferpools
• The best method in tuning WMQ on z/OS is from the evaluation of bufferpool usage
• Know your environment– Understand what is normal within your
environment as it can be difficult to figure out what may be a problem
– Don’t assume what may look normal is right
Detailed information can be found at: http://pic.dhe.ibm.com/infocenter/wmqv7/v7r1/index.jsp?topic=%2Fcom.ibm.mq.doc%2Fzc10680_.htm
Commercial in Confidence
www.metron-athene.com
WMQ Bufferpool Use - Recommendations
• Reserve Bufferpool 0 and Pageset 0 for WMQ
• Bufferpool 1 for ‘System’ queues that do not go deep’
• Bufferpool 2 for ‘System queues that may go deep’
• Reserve a separate bufferpool for the SYSTEM.CLUSTER.TRANSMIT.QUEUE
Commercial in Confidence
www.metron-athene.com
WMQ Bufferpool Statistics
--- BUFFER MANAGER STATISTICS ( MQP1 ) ---
BUFFER STEALABLE-BUFFER GETPAGE-REQUESTS READ SETWRITE WRITE WRITE WRITE THRESHOLDYY/MM/DD HHMM BP NUM FREE% SOS MISS CONT OLD NEW IO REQUEST PAGE IO SYNC SYNC ASYN REMARKS13/07/20 0006 0 50000 99.89 0 0 0 48608 9707 0 39533 0 0 0 0 0 1 20000 100.0 0 0 0 0 0 0 0 0 0 0 0 0 2 50000 99.96 0 0 0 2996 537 0 2475 0 0 0 0 0 3 20000 100.0 0 0 0 0 0 0 0 0 0 0 0 0 4 50000 100.0 0 0 0 0 0 0 0 0 0 0 0 013/07/20 0036 0 50000 99.87 0 0 0 51923 7498 0 44212 0 0 0 0 0 1 20000 100.0 0 0 0 0 0 0 0 0 0 0 0 0 2 50000 74.00 0 7890 0 26731 13099 0 39448 0 0 0 0 0 3 20000 100.0 0 0 0 0 0 0 0 0 0 0 0 0 4 50000 100.0 0 0 0 0 0 0 0 0 0 0 0 013/07/20 0106 0 50000 99.87 0 0 0 43213 5533 0 37795 0 0 0 0 0 1 20000 100.0 0 0 0 0 0 0 0 0 0 0 0 0 2 50000 44.77 0 14413 0 30001 14720 0 44319 0 0 0 0 0 3 20000 100.0 0 0 0 0 0 0 0 0 0 0 0 0 4 50000 100.0 0 0 0 0 0 0 0 0 0 0 0 0
Commercial in Confidence
Monitor the Free% and where you are going SOS
(Short on Storage)
www.metron-athene.com
Likely changes in the mainframe in the next 12 months
• “You canny break the laws of physics, Captain...”• Top end System z CPU clock speed is 5.5 GHz
and electricity does take a finite time to move from one place to another
• Don’t expect too many more increases in “clock speed” (not that System z does only one instruction per tick anyway, but...)
• IBM may announce a form of “hyperthreading for System z” to facilitate better throughput, like in AIX, Solaris, Windows, VMware, Xen etc.
Commercial in Confidence
www.metron-athene.com
• Newly announced Power8 chips : 8 threads/core• New top-end System z machines already support
over 100 physical processors. Imagine that doubles overnight to 200...or quadruples...or...
• You may need to rework applications to survive in a more parallel world– Big single-stream batch suites won’t see any benefit;
break apart, run in parallel, re-combine later– Big single-TCB batch jobs won’t see any benefit; break
apart, run in parallel, re-combine later– Big monolithic transactions may need looking at to see
how to make them exploit more TCBs
Commercial in Confidence
Likely changes in the mainframe in the next 12 months
www.metron-athene.com
• Watch out for how that looks in reporting! For example, Windows doesn’t tell you how busy your cores are
• Hopefully IBM will do a better job, like they have on AIX where the physical core usage is recorded regardless of how virtualized
Commercial in Confidence
Likely changes in the mainframe in the next 12 months
www.metron-athene.com
Questions
Commercial in Confidence