2 © 2014 CA. ALL RIGHTS RESERVED.
Agenda
Why so many metrics with APM?– “Big Data”?
What we are learning with CA-ABA (analytics)
How to find KPIs
What’s new for CA-APM 9.6 Release
3 © 2014 CA. ALL RIGHTS RESERVED.
Typical APM Cluster
Dozens to hundreds of applications– 2800 JVMs/CLRs
Up to 5M metrics, every 15 seconds
Large applications span multiple data centers– 2-8 APM clusters, typical
– 30-70 EM Collectors for a nationwide portal application
12M to 28M metrics, every 15 seconds
… certainly sounds like big data!!!
4 © 2014 CA. ALL RIGHTS RESERVED.
What is Big Data???APM information is “big”… but it is not “big data” without enrichment
5M Metrics
that you don’t fully
understand
OR
5M Metrics
that you don’t
fully understand
Trouble
Management
Version
Control
Time of ____
Constraints
Air Traffic
Advisories
Weather
Forecast
AP News
Updates
Marketing
Campaigns
E N R I C H M E N T
Correlation
Trends
Insights
Anomalies
5 © 2014 CA. ALL RIGHTS RESERVED.
Challenges for Big Data
Data Variety – different sources gives different perspectives. Does your data have a significant perspective?
Validation – is the data source meaningful/predictive?
Consistency – are the values trustworthy?
Data Structure and Nomenclature – Mapping, Transformation
Temporal Impedance Mismatch– APM: real-time with 15 second reporting interval
– Trouble Management: +15-30 minutes later
– Stock Ticker: +15-30 minutes later
– Air Traffic Advisories: +30-60 minutes later
– Version Control: days to weeks in advance
– Marketing Campaign Assessment: 2-4 weeks later
6 © 2014 CA. ALL RIGHTS RESERVED.
KPI Management Maturity
SGCM: Stalls, GC Settings, Concurrency, Memory Management Trends
APC : Availability, Performance, Capacity
EKB: Errors, Key Resource Performance, Business Transaction Survey
VA
LUE
KPI MATURITY
(Platform) (Application) (Transaction)
What We are Learning with CA-ABA
ABA Logical Architecture
APM Cluster
5M Metrics100k
Metrics(via RegEx)
Anomaly Engine
Anomalies Alerts
Why only 100k Metrics???Why not 5M???
RegEx == Regular Expression
analytics.metricfeed.process.3 =
Custom Metric Host (Virtual) \\|Custom Metric Process (Virtual)\\|Custom Business Application Agent (Virtual)
analytics.metricfeed.metric.3 =
By Business Service\\|[^|]+\\|[^|]+\\|[^|]+:.+
RegEx is hard… but easy to validate
Metricfeed.3
0
20
40
60
80
100
120
140
160
180
200
Series1
metricfeed.3
Broader collection of metrics but only 87/500 == 17.4% are generally known as useful
Suspects Identified via Baseline Technique
SiteMinder Backends JSP Frontends JMX Custom
0
2
4
6
8
10
12
14
16
18
Series1
Suspects via Baseline TechniquesAverage RT only
100% Useful metrics, ready for validation: 47/43625 == 0.1%
Metric Count TypeView
What is an Application?
Front-ends– Browser? Webservice? Messaging?
Back-ends– Databases Webservices Messaging Mainframes Trading_Partners
Muck-in-the-Middle– Software quality, stability and scalability
- We want to identify KPIs for each of these elements– - helps us build a useful dashboard for Operations
– - helps expose with the resources are really doing
– - helps us define acceptance criteria, to act proactively
– - helps us to triage really effectively
How to Find KPIs
Capacity KPIs – “Tree Rings”
Performance KPIs
High Volume
+
Significant Response Time
Create a Simple Alert and Threshold (ConnectionStatus)
Create a Simple Alert, Find Restart and threshold (MetricCount)
“UP” – but not actually doing anything!!!
Understanding Your Environment
Identify the KPIs– Availability
Agent ConnectionStatus
Number Live Metrics (Metric Count)
– Performance High Volume components with significant response time
– NOT “Top 10 Response Time”
– Capacity Highest Volume Components
Don’t Wait for Production!!!– Make it part of your pre-production review
– Manage the application lifecycle by trending KPIs
Good Better (additional) Best (additional)
Stalls Availability – Connected Status
Errors
GC Settings Availability - Metric Count
Key Resource Performance
Concurrency Suspect Performance Business Transaction Survey
Memory Management (graph)
Suspect Capacity
PlatformCoarse information..but not really APM
Application, Transactions, ResourcesThe APM Advantage
KPI Evolution
What’s New in CA APM 9.6Simplified, automated, and built on CA APM strengths.
Seamless Mainframe Awareness
Faster, Easier APM
• Intelligent Deep Transaction Trace is now dynamic, automated, and requires less developer involvement for deep dives into apps supporting the transactions
• Simplified Triage with easier drill down with Application Triage Map including Socket Grouping
• Improved response times with software based Transaction Impact Monitor (end-user experience)
• Expanding APMs scope with Java 7 EM & Agents
• Increased insight by adding DB2 details to transaction traces
• Greater awareness with CA SYSVIEW MQ alerts & complete status in APM
• Driving further cross enterprise depth with CTG traces to fully expand backend calls
• Other mainframe based enhancements
Preparing to Upgrade
HealthCheck the existing cluster prior to any upgrade
Good: – - Do a clean install of the APM Cluster, alongside of the existing cluster version.
- Manually duplicate management modules, domains.xml, etc.
- Bring down the old version, then bring up the new
Better:– - Install the new version in a separate environment, reduced size
– - migrate a few applications to the new environment for validation
– - upgrade the primary environment after validation achieved
Best:– - Install a new GOLD environment in production, separate from original cluster
– - migrate agents, as schedules permit, until original cluster may be decommissioned
– - this provides an opportunity to introduce pre-production review and generally correct any bad deployment habits
Resources
APM Community Site ( https://communities.ca.com/web/ca-wily-global-user-community
– - Cookbook: APM HealthCheck
– - Understanding Which Metrics Matter (KPI discussion)
– - Cookbook: Application Audit
- more details on the baseline techniques and process
APM best practices – Realizing Application Performance Management
– available on Amazon.com and Apress.com
- Baselines, Test Plans, App Audits, Triage, Firefighting
- Organizational Models, Service Catalogs
APM Web Page : Ca.com/apm