smarter z/os infrastructure management

49
Smarter z/OS Infrastructure Management with IntelliMagic Vision v8.8 Brent Phillips – Managing Director, Americas Todd Havekost – Sr. IntelliMagic Consultant August 24, 2016

Upload: intellimagic

Post on 13-Apr-2017

25 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Smarter z/OS Infrastructure Management

1

Smarter z/OS Infrastructure Management with IntelliMagic Vision v8.8

Brent Phillips – Managing Director, AmericasTodd Havekost – Sr. IntelliMagic Consultant

August 24, 2016

Page 2: Smarter z/OS Infrastructure Management

2

Topics

Modernizing how we process and

understand RMF/SMF IntelliMagic

Vision v8.8 new feature highlights How to identify

unrealized MLC reduction

opportunities

Page 3: Smarter z/OS Infrastructure Management

3

Commonly Held Assumptions

1. After 30+ years, the RMF/SMF reporting process is matureReality – the greatest potential value in the data is still unutilized

2. SMF is for forensics and trending, no value for real-time Reality – it can provide “sooner than real-time” visibility

3. CPU’s should be run at 100% to maximize cost efficiencyReality – you can optimize cost without compromising on quality

Page 4: Smarter z/OS Infrastructure Management

4

At IntelliMagic wecreate new intelligence out of your performance and configuration data.

“Any sufficiently advanced technology is indistinguishable from magic”

Arthur C. Clarke, 1962

Page 5: Smarter z/OS Infrastructure Management

5

Keep the z/OS infrastructure running the application workloadswithout service disruptions and as efficiently as possible:• Prevent: See issues before they can disrupt service and availability

• Resolve: Quickly identify and solve underlying problems

• Optimize: Save cost without risk to performance or availability

• Elevate: Amplify the strengths of the IT team

Your goals?

Page 6: Smarter z/OS Infrastructure Management

6

• Realize these goals with intelligence from the SMF data, but only if interpreted using embedded expert knowledge:‒ Derive new, meaningful metrics out of the raw SMF data

‒ Throughput limits for internal components

‒ Best practices for configuration and performance management

‒ Balance and redundancy identification

‒ Relationship and interaction of logical and physical resources

Modernizing with Embedded Expert Knowledge

Page 7: Smarter z/OS Infrastructure Management

7

• Far more powerful than statistical tools that look at anomalies:‒ The lack of interpretation means it can not do predictions

‒ Can suffer from a lot of false positives

‒ Does not help in creating a higher efficiency

• Far more efficient than a human looking through all the data:‒ Human intelligence is powerful, but there simply isn’t enough time

‒ Very ‘boring’ to review every metric/result to find the needle

The Difference

Page 8: Smarter z/OS Infrastructure Management

8

Performance Today: Either Good or Bad

Unconcerned

Strategic Focus

Panic - Hard to focus

What’s wrong with this picture??

Page 9: Smarter z/OS Infrastructure Management

9

Performance using Better Intelligence

Unconcerned

Strategic Focus

Panic - Hard to focus

Page 10: Smarter z/OS Infrastructure Management

10© IntelliMagic 2014

Time

Response Time

Your existing monitors look at symptoms here,

only after users experience problems

SLA

Perf

orm

ance

Real-time Performance Monitoring

Easy metric to get, but is an effect,

not a cause

Page 11: Smarter z/OS Infrastructure Management

11

Availability Intelligence identifies risk here, before

response time suffers

© IntelliMagic 2014

Time

Response Time

Sub-component SaturationSL

A Pe

rfor

man

ce

Monitoring with Availability Intelligence

Requires evaluating every data point

with expert domain knowledge about every component

Page 12: Smarter z/OS Infrastructure Management

12© IntelliMagic 2014

Time

Response Time Sub-component Saturation

SLA

Perf

orm

ance

Most infrastructure “fires” can be prevented by

intervening here

Avoiding Disruptions

Page 13: Smarter z/OS Infrastructure Management

13

I/O Performance Example

Storage Array Response

Times

Within Array

Between Arrays

Imbalance?

Application Workloads

Config or Failure

Changes?Disk Device

Loads

FW Bypass, etc.

Back-end,Cache

AdapterUtilization

Fibre Switch Errors

Front-endLag

Measure:

Lead Measures:Lead Measures:

Page 14: Smarter z/OS Infrastructure Management

14

Automation & the Power of Knowing

• Automatically identify risk in every interval, every device, every data center

• Like a “thousand pairs of eyes”, automated interpretation of what the data means is the only way to continuously achieve ITIL v3 definition of capacity management:

– ensuring…the IT Infrastructure is able to deliver agreed Service Level Targets in a cost effective and timely manner…considers all Resources required to deliver the IT Service...

Page 15: Smarter z/OS Infrastructure Management

15

IntelliMagic Vision for z/OS Systems

PreventResolveOptimizeElevate

z/OS System•Processor•CECs, LPARs•Specialty Engines (zIIP, zAAP)•Etc

z/OS System•Processor Cache•Relative Nest Intensity•SMF 113 Records•Etc

z/OS System•Paging Reports•Virtual Storage•zEDC /PCI Express •Etc

z/OS System•Workload Manager•Several MIPS/MSU reports•Trending/Comparison•Etc

Coupling Facility/XCF•CF / XCF Health•CF / XCF Analysis•Trending•Etc

Jobs and Datasets•Data Sets•Address Spaces•Trending•Etc

Page 16: Smarter z/OS Infrastructure Management

16

IntelliMagic for z/OS Disk & Replication

PreventResolveOptimizeElevate

Disk Storage•Front End•Back End•Channels & zHPF•Etc

Disk Replication•Replication Status•Rating Over Time•Verify Balance•Etc

FICON Directors•Director Health•Service Statistics•Channel Health•Etc

Jobs and Datasets•Data Sets•Address Spaces•Trending•Etc

Intelligent Trending•Selected Statistics •Summarized Hourly•Summarized Daily•Etc

Comparison•All Reports•Compare Daily•Compare Weekly, Monthly•Etc

Page 17: Smarter z/OS Infrastructure Management

17

IntelliMagic Vision for z/OS Virtual Tape

PreventResolveOptimizeElevate

Tape System• Cache• Throttling• Balance• etc

Tape Replication• Send/Receive• Grid Transfers/queues• Balance• etc

Intelligent Trending• Selected Statistics • Summarized Hourly• Summarized Daily• etc

Host Activity• Systems/Jobs/Programs• Volume Groups• Device Groups• etc

Front End• Throughput• Virtual Devices/Mounts• Balance• etc

Back End • Pools• Migration/Recall/Reclaim• Balance• etc

Detailed webinar

overview on Virtual Tape Analytics:

Tuesday August 30:

bit.ly/zvtape

Page 18: Smarter z/OS Infrastructure Management

18

• Good problem to solve with Software as a Service• Easy Access to intelligence relevant to different roles• Access to IntelliMagic experts for knowledge transfer, analysis• Solution infrastructure is managed for you, creating more focus

IntelliMagic Vision as a Service

Page 19: Smarter z/OS Infrastructure Management

19

IntelliMagic Vision Homepage

Page 20: Smarter z/OS Infrastructure Management

20

Embedded Expertise: Infrastructure KRI’s

Focus area: Disk Subsystem, LPAR, CPU, WLM,Virtual Tape, etc.

Performance Metrics

Key Risk Indicators

Automatically identify and rate performance risks and efficiency opportunities with 1000’s of automated health checks

Page 21: Smarter z/OS Infrastructure Management

21

Embedded Expertise: Quantify good vs bad

Automatically rate existing and new metrics using embedded expert knowledge about z/OS and your infrastructure

to derive intelligence about performance threats and efficiency opportunities

No Border, Opinion N/A

Green Border, Good

Yellow Border, Early Warning

Red Border, Exceptions

Page 22: Smarter z/OS Infrastructure Management

22

Embedded Expertise: Rate Exception Severity

A three level rating system based on hardware capabilities

A three level, dynamic rating based on both workload

characteristics and hardware

Page 23: Smarter z/OS Infrastructure Management

23

IntelliMagic Vision v8.8 Highlights

Page 24: Smarter z/OS Infrastructure Management

24

8.8 z/OS Reporting Enhancements

• Application Groups

• Real and Virtual Storage

• Processor Reporting

Page 25: Smarter z/OS Infrastructure Management

25

Application Groups

Page 26: Smarter z/OS Infrastructure Management

26

Application Groups

Page 27: Smarter z/OS Infrastructure Management

27

Application Groups – Data Sources

• Data sets

• Coupling Facility structures

• Disk volumes

• XCF members

• XCF transmission groups

• Jobs

• Service Classes

• Report Classes

Page 28: Smarter z/OS Infrastructure Management

28

Page 29: Smarter z/OS Infrastructure Management

29

Real and Virtual Storage Reporting• Virtual storage

• Real storage

• LFAREA / 1MB Pages

• High Virtual Common (HVCOMMON) and High Virtual Shared (HVSHARE) areas

• Storage Class Memory / Flash Storage

Page 30: Smarter z/OS Infrastructure Management

30

Page 31: Smarter z/OS Infrastructure Management

31

Processor Reporting – Guiding Principles

• Separate reporting of CPU on general purpose CPs from zIIPsand zAAPs

• Use MIPS to refer to the processor capacity rating metric (not in its classic sense of millions of instructions per second)

Page 32: Smarter z/OS Infrastructure Management

32

Processor Reporting – New Report Sets• Tables and CP Usage

• 4 Hr Rolling Avg and Capping

• LPAR Mgmt and Capture Ratio

• Priority Raised

• Mobile

Page 33: Smarter z/OS Infrastructure Management

33

Page 34: Smarter z/OS Infrastructure Management

34

Tables and CP Usage• Overview tables and multicharts from "All Processors“ report

set

• WLM Parms table from “WLM Constants” report set

• 2 sets of reports for CPU on general purpose CPs‒ Sequenced by CEC, LPAR, Workload, Service Class‒ First set in units of MIPS‒ Second set in units of Processors

Page 35: Smarter z/OS Infrastructure Management

35

Page 36: Smarter z/OS Infrastructure Management

36

4 Hr Rolling Avg and Capping (1 of 5)

• New CECs table with data and drilldowns focused on Vertical CP configurations (“polarization”) and LPAR Topology (99.14)

• New LPAR Config table with key metrics for Vertical CP configuration and MLC analysis‒ Physical & Logical CPs, Vertical CP configuration,

LPAR % Weight, Capacity Group, Soft Cap‒ Polarity drilldown shows MIPS by Polarity (VH, VM, VL)‒ Changes drilldown lists changes in Vertical CP config

Page 37: Smarter z/OS Infrastructure Management

37

4 Hr Rolling Avg and Capping (2 of 5)

• Rolling 4 Hour Average charts - by CEC and LPAR‒ Drilldown compares 4HRA vs. interval RMF CPU usage‒ Further drilldowns by Workload, Service Class, Address Space

• % WLM Capping – critical aspect of MLC Reduction‒ Capping Limited Processor Resources (%)‒ Capping Processor Resources Considered by WLM (%)

• This indicates the time interval when the LPAR's access to CPU was limited and the vertical CP configuration was disrupted

Page 38: Smarter z/OS Infrastructure Management

38

4 Hr Rolling Avg and Capping (3 of 5)

• % Logical CP Utilization‒ Low utilizations can indicate surplus logical CPs‒ Online Processor drilldown helpful to confirm LPARs where logical CPs

may be over-specified

• CP Weight – CP Usage vs. LPAR Weight (rated)‒ Low values can help identify LPARs that could “donate” weight to

increase Vertical Highs for high-use LPARs

• Logical CP tuning can help optimize Vertical CP configuration and minimize PR/SM overhead‒ 2:1 Logical/Physical is general Rule of Thumb

Page 39: Smarter z/OS Infrastructure Management

39

4 Hr Rolling Avg and Capping (4 of 5)

• RNI by LPAR‒ Also appears here (in addition to Processor Hardware focal

point) because it is a critical metric for tuning‒ Processor drilldown very helpful to show RNI impact of work

executing on VMs & VLs (after filtering out zIIPs and parked VLs)

• % CPs Vert High – % Physical CPs Defined as Vertical Highs for CEC‒ Workloads executing on Vertical Highs experience improved

processor cache efficiency

Page 40: Smarter z/OS Infrastructure Management

40

4 Hr Rolling Avg and Capping (5 of 5)

• Polarity CEC - % CP Time Dispatched on Vertical Highs‒ Work executing on Vertical Highs typically has a lower RNI and

thus executes more efficiently‒ Drilldowns by System and MIPS by Polarity

• Dispatch Pol. – Dispatched CP MIPS by LPAR by Polarity‒ Shows MIPS executing on VHs, VMs and VLs for all LPARs‒ Extremely helpful to identify opportunities for RNI tuning‒ By Time and Changes drilldowns also very useful

• WLM Nodes – LPAR Topology from 99.14 records

Page 41: Smarter z/OS Infrastructure Management

41

Page 42: Smarter z/OS Infrastructure Management

42

LPAR Mgmt and Capture Ratio (1 of 2)

• Improves reporting on PR/SM LPAR overhead

• Phys CP % - Unattributed LPAR Overhead for Physical Partition‒ RMF collects overhead that cannot be attributed to a specific

LPAR and reports in *PHYSICAL* LPAR‒ Expressed as % of entire CEC

Page 43: Smarter z/OS Infrastructure Management

43

LPAR Mgmt and Capture Ratio (2 of 2)

• CP LPAR Mgmt % - Overhead assigned to an LPAR‒ Expressed as % of that LPAR's total utilization (e.g., attributed LPAR

overhead of 0.5% of CEC for LPAR that consumed 10% of CEC would be 5% on this report)

‒ Capture Ratio drilldown compares LPAR Mgmt % vs. RMF capture ratio (typically an inverse relationship, lower LPAR Mgmt % correlates to higher RMF capture ratio)

• CP Capt % - CP Capture Ratio‒ General purpose CPU time captured in RMF 72.3 (and assigned to

service classes) as % of total general purpose CPU consumption per RMF 70, by system

‒ Drilldown to compare against LPAR Mgmt %

Page 44: Smarter z/OS Infrastructure Management

44

Processor Reporting – New Report Sets• Tables and CP Usage

• 4 Hr Rolling Avg and Capping

• LPAR Mgmt and Capture Ratio

• Priority Raised – WLM raising priority of tasks holding resources other tasks are waiting for

• Mobile – New WLM capability to classify activity originating from mobile devices

Page 45: Smarter z/OS Infrastructure Management

45

Page 46: Smarter z/OS Infrastructure Management

46

Page 47: Smarter z/OS Infrastructure Management

47

8.8 z/OS Reporting Enhancements

• Application Groups

• Real and Virtual Storage

• Processor Reporting

Page 48: Smarter z/OS Infrastructure Management

48

How to Identify Unrealized MLC Opportunities

Page 49: Smarter z/OS Infrastructure Management

49

• FTP RMF/SMF data to IntelliMagic• IntelliMagic will analyze and identify opportunities• You get access to logon and explore your data

MLC Assessment Service