application monitoring
TRANSCRIPT
The Northwestern Mutual Life Insurance Company – Milwaukee, WI
Application Monitoring
Jeremy Kalsow
Why Application Monitoring
• Majority of all corporations• Northwestern Mutual• Total 1,000+ servers • Team is 6 people• Team uses 16 servers• Average 50 applications per server• Need a way to know status fast
What is it?
• The ability to monitor performance and availability
• Gather metrics• Show trends• Pretty pictures for management
Why?
• Trends predict future problems• Solve application issues faster• Uptime relates directly to profit for
many companies• View all applications, servers,
databases and other items being monitored with a single dashboard.
Types of Monitoring
• Fault• Performance• Configuration• Security• Accounting
Fault
• Detects major errors• Easy to implement• Examples– Network loss– Database Connectivity
• Very Important
Fault
Type of Monitoring
What to Monitor
When to monitor
Hardware CPU utilization CPU load Load > 99% for x minutes
Memory utilization Memory load Load > 99% for x minutes
Storage System Available space System out of Space
Applications Application available
Application working
Working or Error
Application Logs Error Log monitoring
If error occurred
Databases Database online Database is online Database is up/down
Network Latency Latency Latency > acceptable range
Performance
• Slow Performance• Service Level Agreements• Metrics• Old and New Metrics• Visual Display
Performance
http://www.ibm.com/developerworks/websphere/library/techarticles/0304_polozoff/polozoff.html
Configuration
• Configuration variables• Connectivity• Speed• Performance• Proactive
• Servers and Applications
Configuration
• Why would the configuration change?
• Hardware• Storage• Service packs• Hot fixes• Windows Updates
Security
• Attempts to access the system• Open ports• Inventories• Firewall• Packets• System events• Blocked Exploits
Accounting
• Monitors Usage• Generally used for fees• Profit/Loss
• Example– Electric Company– Northwestern Mutual
Types of Monitoring Recap
• Fault• Performance• Configuration• Security• Accounting
Types of Monitoring Recap
• Historical data• Baseline test • Current test• Performance disagreements
Types of Monitoring Recap
• Allows for trends to be seen• Modifications can be made• Trends over multiple releases
Types of Monitoring Recap
• Monitoring is important• Not enough time is given• Implemented After discovery of an
issue• Monitoring only in areas of known
problems• Adding monitoring requires time and
money
Challenges of application monitoring• Various types of systems• Shared• Clustered• Virtualized• Production logging
Shared Systems
• 1 server / Multiple applications• System resources are shared• Tracking individual usage is difficult• Many applications may be impacted • Server without access (production)
Clustered Systems
• Applications on more than one server• Avoid single point of failure• May be hard to target the issue
Production Logging
• Generally Limited• Most errors repeated in test• Application downtime• Use of company resources
Implement Application Monitoring• Plan Early• Monitor Proactively• Create a Recovery Plan• Create and use SLAs
Plan Early
• Planning stage• Add monitoring during development• Late additions cover known issues
Monitor Proactively
• Harder to implement• Issues are dealt with before end user
knows
Monitor Proactively
• Tools based approach• Easy and relatively fast setup• No code• Multiple applications
Monitor Proactively
• Logging is directly in the code• Less efficient• More specific• Developers have less time
Create a Recovery Plan
• Fast resolution• Knowledge management
Recovery Plan Template
Service Level Agreements• What percentage of time that the services will be up (uptime)• How many people can use the application at once without performance issues• Performance metrics and benchmarks to be used with performance monitoring alerts• The rules for notification announcements• What statistics will be monitored and when and where they will be available• Acceptable response time
Service Level Agreements
Using the Statistics
• Visual display• Alerts• Tickets
Visual (Dashboard)
• Easily view statistics• Comparison results• Trend comparison• Cross Platform• Auto-generated management reports
Dashboard
Alerts and Tickets
• Auto-generated alerts• Tickets for queue system• Vital information in each
Alerts and Tickets
• Most common: Email• Text, popup, printout, recording and
more• Tickets: auto-generated• Knowledge databases• Common fixes and resolutions
Application Monitoring
• Maximize application uptime• Higher end user satisfaction• Higher Profit
References• Polozoff, A. (2003, April 9). Proactive Application Monitoring. IBM - United States.
Retrieved October 20, 2011, from http://www.ibm.com/developerworks/websphere/library/techarticles/0304_polozoff/polozoff.html
• Choice. (2009, December 20). Application Monitoring. Adminschoice - Unix Made Easy. Retrieved October 31, 2011, from http://adminschoice.com/application-monitoring
• Application Monitoring Software - uptime software. (n.d.). Server Monitoring Software - IT Systems Management, Capacity Planning, Application and Server Monitoring Tool by uptime software. Retrieved October 31, 2011, from http://www.uptimesoftware.com/application-monitoring.php
• Marko, K. (2005, December 30). Proactive Application Monitoring. Processor.com:
• Data Center IT Equipment at Processor, Routers, Storage, Rackmount Servers, Computer Room Cabling and Flooring. Retrieved October 29, 2011, from http://www.processor.com/editorial/article.asp?article=articles%2Fp2752%2F43p52%2F43p52.asp
• "IT Service Level Agreement Templates | ContinuityPlanTemplates." ContinuityPlanTemplates |Free Business Continuity Plan (BCP) Templates. ContinuityPlan Templates, n.d. Web.30 Oct. 2011. http://www.continuityplantemplates.com/it-service-level-agreement-templates
XML
Upcoming events with Dashboard•Ability to display visualized graphs and other pertinent information
•Ability to click a failed component and have the system auto generate a ticket
•Ability to Alert others of the issue found
•Performance monitoring as well as fault