smartcloud monitoring and capacity planning
TRANSCRIPT
© 2013 IBM Corporation
SmartCloud MonitoringSimon Coote
29 May 2013, Copenhagen
Agenda
Introduction
What is SmartCloud Monitoring? Demonstration
Health Dashboards Predictive Analytics Capacity Planning Reporting
Summary
Monitoring – Holistic view of performance and availability
Operating Systems
Applications
Hypervisors
Hardware
Tra
nsa
ctio
ns
SmartCloud Monitoring
What is SmartCloud Monitoring?
Health dashboards to provide an instant, consolidated glimpse into cloud health
Topology views of the key interrelated components of the cloud
Reports on the health trends of cloud components and workloads, powered by Cognos
What-If capacity planning scenarios
Policy-Based optimization to put workloads where they’ll perform best, not just where they’ll fit
Performance Analytics for right-sizing of virtual machines
Integration with industry-leading IBM service management portfolio
VMware KVM (IBM, Redhat) Citrix XenServer Citrix XenApp, Citrix XenDesktop Hyper-V
•X86-Hypervisors
•Transactions •Middleware•Applications
• Transaction tracking• Response time• Robotic transactions
• Monitoring of zVM and Linux on System z
•IBM z/VM
•Databases
• SAP• Exchange• Lotus Notes• PeopleSoft• etc
• DB2
• Oracle
• SQL
• etc
• WebSphere
• MQ
• WebLogic
• etc
• Monitoring of IBM Power VM (CEC, VIOS, LPARS (AIX, Linux), HMC)
•IBM Power
• Windows• Linux• AIX, Linux on p, i5• zOS, z/VM• Solaris• HP-UX
•OS •Storage/Network
• TPC: IBM Storage, EMC, Hitachi, NetApp
• DFM: NetApp• Ethernet Switches• etc
• Solaris/Zones
•Other Hypervisors
•Integrate with ITM/ITCAM and other tools to extend scope beyond hypervisors
• x86 (IBM non IBM)• IBM Power 5, 6, 7• IBM system z, zEnterprise• Sun (SPARC)• HP• Cisco UCS• etc
•Server Platforms/OS
SmartCloud Monitoring Broad Hypervisor & Platform Support
Monitor the Virtualization/Cloud Ecosystem
ITMITM
ESX/ESXiESX/ESXiESX/
ESXiESX/ESXiESX/
ESXiESX/ESXiESX/
ESXiESX/ESXiESX/
ESXiESX/ESXiESX/
ESXiESX/ESXiESX/
ESXiESX/ESXi
vCenterServer
vCenterServer
Holistic Approach to Monitoring including storage, networking, hypervisor, etc.
NetApp Storage Agent:
– Provides Monitoring data in ITM
– Integrates into Health Dashboard
TADDM Integration
– TADDM DLA discovers the vCenter environment/topology
– TADDM provides change data to VMware Health Dashboard
IBM Director Integration
– ITM Agent provides integration with the Director Server
– Allows for Management of VMware resources
– Historical Collection of HW data
Tivoli Storage Productivity Center (TPC):
– Agent provides storage metrics in TEP
– Integrates into Health Dashboard
– Warehouse storage metrics for reporting and analysis
Network Monitoring Agent
– Monitor switches used by VMware
– Integrate Network Events into Dashboard
Consider adding application monitoring to the ecosystem
NetAppNetApp
NetApp Agent
NetApp Agent
DFMDFM
TADDMTADDM
Health Dashboard
Health Dashboard
TPCTPC
IBM Storage
IBM Storage
HitachiHitachi
EMCEMC
NetAppNetApp
NetworkSwitchesNetworkSwitches
NetworkSwitchesNetworkSwitches
VI AgentVI Agent
Apps / Midleware
Apps / Midleware
SmartCloud Monitoring 7.2 – What's new?
Additional VMware Metrics: − Orphaned VMDK files
Completely rewritten VMware Health Dashboard: − Lighter weight/Faster Response Time − More intuitive and easy to navigate − Fewer clicks to drill down to root cause
New DASH user interface: − Single User Interface where multiple Tivoli products are integrated − Includes TCR 3.1 which includes Active Reporting − Sample Active Report Attached Here: − Self-Service dashboarding capabilities. Build dashboards using any ITM data
and data using Tivoli Directory Integrator − Support for tablet devices
Improved VMware Capacity Planning: − Can save existing customization − Can do partial loads of the VMware environment − New VMware Expense Reduction Report and other reports − Improved benchmark matching − Evaluates CPU, Mem, Network I/O, Storage, and Storage Topology
SmartCloud Monitoring 7.2 – What's new?
Power Systems: − Capacity Planning: What-if scenarios, server sizing, etc. for Power Systems− Enhanced Power Systems Agents including consolidation of UNIX OS Agent
and Premium AIX Agent Enhancements to other hypervisors:
− Other hypervisors such as Citrix and Cisco UCS have been enhanced ITM 6.3 Enhancements:
− OS Dashboard in DASH− OSLC Linked Data for integration with TBSM and other products− 64-bit TEPS that doubles the number of concurrent users and improves scale− Warehouse Range Partitioning
This can dramatically improve performance by eliminating the need for Pruning of the historical data
− Authorization Policy Server To restrict the access for dashboard users to Managed System Groups and
to individual agent managed systems. The ability to grant role-based access control in addition to Access Control
Lists, making access control easier and safer.Role inheritance for scalable management.
Health Dashboards
Today’s Agenda
10
High level Vmware dashboard showing all clusters, events, and key KPI’s
Click to drill down
11
Single Cluster view showing events and KPI’sCan be real-time or historical
Select any of the links below to go to Servers, VMs, or Datastores
12
Single Cluster view showing VSphere ServersClick on a link to drill down to a single server
13
Single Server showing historical dataActions allow you to launch in context to TEP or TCR
Select any of the links below to go to Cluster, Servers, VMs, or Datastores
14
Configuration tab shows config data from TADDM
Bread crumbs allow for easy navigation
15
Change History data from TADDM
16
Networking KPI’s for the VSphere Server
Select any of the links below to go to Cluster, VMs, or Datastores
17
Virtual Machine Page showing real-time or historical KPI’s and Events
Link to OS Dashboard for the VM
18
OS Dashboard with metrics and Events from the OS Agent
19
Detailed OS Dashboard Page
20
VMware DatastoresSelect to drill down
21
Single Datastore shows real-time or historical data and Events
Select any of the links below to go to connected Clusters, Servers, or VMs
Predictive Analytics
Dynamic Thresholding & Adaptive Monitoring View real-time and time aligned
historical data Analyze the trends and see trends vs
anomalies Use Avg, Max, Min, Percentile, Mode Monitors can be defined for shifts or can
be adjusted for seasonal differences
Agents provide static monitoring thresholds
IBM Tivoli Monitoring provides static thresholds and Dynamic Thresholds/Adaptive Monitoring
The system learns the “normal” behaviour for a resource and sets the threshold based on historical data
24
Performance Analyzer: Predictive Trending
• Hands off capacity monitoring• Automates performance analysis and reporting
• Prediction of application bottlenecks • Creation of alerts for potential service threats.
• “What will my resources look like tomorrow, next week. next month or next year?”
• “What IT Resources should I worry about next?”• “Will I have enough capacity to get me through
Monday?”
Leverage collected data to spot trends and highlight emerging concerns
•Time
•Metric
•Predicted trend
•Threshold •Predicted
•Metric Violation
•Actual Monitor Data
Key Capabilities of Performance Analyzer Out of the box analytics tasks for:
– OS Agents (CPU, Memory, Disk, Network)
– DB2 (Pool Read/Write, Sort Time, Memory, Tablespaces, etc.)
– Oracle (Cache Hits, Archive Space, Tablespaces, Transactions, etc.)
– Vmware (Physical Server, VMs, Memory, CPU, Datastores, etc.)
– Power Systems (SEA, Network, CPU, Memory, I/O, Entitlement, etc.)
– Response Time (Web, Robotic, and Client Response Time)
Easily Customized to Analyze any numeric data– Arithmetic Modules for calculating data and normalizing data
– Analytic Modules for performing predictive analytics
– Predict the following:
• How many days until I reach a Warning or Critical Threshold
• Predict the value 7, 30, 90 days into the future (customizable)
When defining Situations in Performance Analyzer, define the number of days notice you need
– Take into account the statistical values such as the strength, number of data points, etc.
Key Capabilities of Performance Analyzer
Select Agent Type
Select Attribute Group
Select Hourly, Daily, Weekly, etc.
Analyze all or a subset of your agents
Key Capabilities of Performance Analyzer
Define Warning and Critical Thresholds
Key Capabilities of Performance Analyzer
Define prediction durations
Performance Analyzer for: Vmware and Power Systems (CPU Trends, Disk Utilization, Memory, Network) out of the box
For Vmware, recommend setting up Analytic Task for: Cluster Percent Effective CPU utilization, Cluster Percent Effective Memory utilization, CPU
Percent Ready Setup Analytic Tasks for other hypervisors (Hyper-V, KVM, Solaris Zones) Linear and non-Linear trending…the tool picks the model that best fits the data
Model ChosenTime to Critical and Warning
Bar Chart represents the historical data and the line graph represents the statistical trend line
Capacity Planning
Why is Capacity Management Critical to Cloud Management?
Helps consolidate and reduce IT costsReduces HW and labor costsFewer physical servers neededReduce hypervisor license costs Increase VM density to drive Cloud ROIPredict how many more customers / VMs can be serviced
Helps ensure application availability and reduce riskAre any resources overloaded? When will physical resources reach their
limits?Have there been any significant changes in my environment recently? Identify trends to predict bottlenecks, or free up space and balance
workloadsEnsure supply can meet demandEnsure technical and business policies are met to reduce risk
Helps optimize resource utilizationRight size virtual machines and allocate based on usage, over-commit
within known risk limitsPack VMs on the infrastructure to optimize resources
Capacity Planning & Analytics - Architecture
…Platforms
Storage Network
HypervisorsServers
Workload Characterization- Establish patterns using historical data- Capture workload attributes to enable optimization policies
Capacity Planning Database
Optimization Engine to size and place VMs
Optimization Engine to size and place VMs
PlanRecommendation
(minimize systems, license, balance)
Business and Technical policies
Copy, Federate
Custom Tagsenhance Config Profiles and
workload relationships
Benchmarking data
Usage profiles, workload relationships
The Case for Holistic Capacity Analytics
Unbalanced
Can’t move workload to this cluster because it’s almost out of datastore space
On one screen, I can check all of the key resources to see if my workload is balanced
Unbalanced across Clustersand within a cluster
Need to look at all key attributes to look for bottlenecks and imbalance
Disk I/O Metrics also available
The Case for Holistic Capacity Analytics
SmartCloud Monitoring Capacity Planning Center
Planning Centre – applying parameters
Policy Driven Capacity Planning
Opens a new tab with pre-built rules or a rule editor for customer rules
Choose rules for “what-if” scenario
Out of the box rules
•Colocation/Anti-colocation• Place Win2003 32-bit on the
same ESX server
•Boundary Rules• Place Win2003 32-bit on the
same ESX server
•Utilization Rules
• Provide 20% growth for key business application
Create custom rules
SPECint data used for analysis
Spec data provides granular capacity planning
39
Reduce from 9 ESX servers down to 4 while lowering memory and CPU utilization
Utilization projections accommodated growth
Started with 9 ESX servers and 24 VMs
The tool always includes headroom to ensure you don’t run out of capacity
Capacity Planning ResultsRecommendations to optimize workloads; reduce risk while eliminating or reallocating servers
40
Utilization after optimization
Headroom
Reduce both CPU and Memory Utilization reservations to free up resources
ROI Case Study – IBM Test & Development CloudCluster consisting of 18 Servers and 1802 Virtual machines
Goal: Analyze an existing, production virtual environment in search of further optimization, and show ROI using management and capacity planningSolution: Used IBM SmartCloud Monitoring to analyze the current environment and perform “what-if” analysisResults: More Optimized environment uses fewer physical servers, which results in savings in hardware, administrator /support, energy, data center floor space andlicense costs, resulting in an additional ROI of 14.4.% over a year, and the ability to accommodate an additional 113 virtual machines.
Optimization of an IBM Internal development and test cloud using IBM SmartCloud monitoring results in an additional ROI of 14.2%
“In order to realize true cost savings from a virtualization or cloud investment, customers need to be able to run virtual machines densely enough to maximize consolidation, yet be assured that their workloads are still running as well as they were before being virtualized, with room for expansion.”
14.4 % Annual ROI
VMware Expense Reduction Report - Example
VMware Expense Reduction Report - Example
VMware Expense Reduction Report - Example
SmartCloud Monitoring
Virtual Machines | Storage | Networks
Provides greater visibility to cloud health• Track cloud service levels & performance, and predict cloud problems before clients are impacted
• Understand performance and capacity today, and know what it will look like months from now
Lowers total cost of operations • Optimize workload placement to wring maximum capacity and performance out of your cloud investment
• Freedom from expensive hypervisor or OS lock-in with a heterogeneous cloud infrastructure monitoring solution
Optimizes cloud performance• Built-in performance analytics for right-sizing of virtual machines and resource optimization in the cloud
• Real-time proactive & predictive alerts help identify and fix problems quickly
Hea
lth
D
ash
bo
ard
s
Cap
acit
y A
nal
ytic
s
Per
form
ance
O
pti
miz
atio
n Increased Density
Reduced Risk
Minimized Outages
Optimized Workload Placement
Improved ServiceLevels
Years of experience managing mission critical workloads in the World’s largest enterprises yield peerless best practices advice
IBM SmartCloud Monitoring: Optimize your cloud performance and maximize ROI
Kiitos!
Tack!
Thank you!
Takk!
Tak!