Download - Monitoring in the DevOps Era
© 2013 Cloud Technology Partners, Inc. / Confidential
1
Cloud Technology Partners / April 2014 / www.cloudtp.com
Monitoring in the DevOps Era
© 2013 Cloud Technology Partners, Inc. / Confidential
2
About the Presenter
@madgreek65
mikekavis
madgreek65
VP/Principal Architect @ Cloud Technology Partners
Mike Kavis
The Virtualization Practice
madgreek65
DevOps.com
© 2013 Cloud Technology Partners, Inc. / Confidential
3
Topics of Discussion
1. Service Centric Ops2. Logging Strategies3. Monitoring Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
4
Service Centric Ops
© 2013 Cloud Technology Partners, Inc. / Confidential
5
What needs to Change?
Shift thinking away from product-centric to service-centric
Operating a Service 24x7x365Shipping Product
© 2013 Cloud Technology Partners, Inc. / Confidential
6
What needs to Change?
Traditional Challenge – Dev needs speed, Ops needs control
SpeedAPIs
SecurityComplianceAvailability
Auditing
The Great Balancing Act
© 2013 Cloud Technology Partners, Inc. / Confidential
7
What needs to Change?
Shift thinking away from product-centric to service-centricOld Way New Way
Software is built and shipped Services are running and managed
Development of features are done Services are never done until they are turned off
Product owner focus only on features Product owner owns operational results along with product feature set
Each silo owns their own area All groups focus on end user satisfaction
Dev must go through Ops to get work done Ops enables Dev to get work done
Ops monitors Apps Ops provides Dev with tools to operate Apps
Reactive monitoring/Ops Proactive monitoring/Ops
Dev, Ops, Security and Product owners must work together throughout the SDLC and have a shared responsibility for the overall quality and reliability of the services
© 2013 Cloud Technology Partners, Inc. / Confidential
8
What needs to Change?
Whoever prioritizes the backlog must be accountable for reliability and quality, not just speed to market
Don’t be a crash test dummy
Speed to market should not negatively impact customer satisfaction!
© 2013 Cloud Technology Partners, Inc. / Confidential
9
Logging Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
10
Top Log Use Cases– Troubleshooting – debugging information and
error messages are collected for analyzing what is occurring in the production environment
– Security – tracking all user access, both successful and unsuccessful access attempts. Intrusion detection leverages this information
– Auditing – providing a trail of data for auditors is extremely important for audits. It is one thing to have a process flow on paper, it is another to show real data in the logs
– Monitoring – identifying trends, anomalies, thresholds and other variables proactively allow companies to resolve issues before they become noticeable and/or critical to the end users
Logging Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
11
Centralized Logs
– Pipe logs to Sysout and direct to log services
– Consider SaaS solutions so logging service does not go down with apps (e.g. Splunk)
Best Practices
– Block all developer access to servers
– Direct developers to logging app instead
– Standard log message codes and severity codes
Logging Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
12
Without standards, Logs are “Garbage in, Garbage out”
– Things to consider• Logs need to be easy to search
• Logs must be easy to use or people won’t use them
• External consumers of APIs expect standards
– Standard codes• HTTP Status codes (200, 404, 503, etc.)
• RFC 5424 Severity Levels
– Standard Message Formats• Settle on a standard format
• Build an API
Logging Strategies
Source Wikipedia http://en.wikipedia.org/wiki/Syslog#Severity_levels
© 2013 Cloud Technology Partners, Inc. / Confidential
13
Best Practices
– Log Everything, Monitor Everything• Infrastructure logs
• App Stack Logs (OS, app server, database, programming language)
• API logs
• Application logs
• Security logs
• Events, notifications, alerts
• Changes, config mgmt., deployment
• Access
• Patching history, machine images
What to collect
© 2013 Cloud Technology Partners, Inc. / Confidential
14
Common Logging Solutions
Open Source Commercial
© 2013 Cloud Technology Partners, Inc. / Confidential
15
Monitoring Strategies
© 2013 Cloud Technology Partners, Inc. / Confidential
16
Nagios is not a Monitoring Strategy
Blind spots can kill you
© 2013 Cloud Technology Partners, Inc. / Confidential
17
What needs to be Monitored?Data Category Description
Performance Page loads, query times, response times, upload/download speeds, etc.
Capacity Disk space, memory, CPU, bandwidth, etc.
Uptime Availability (e.g.. Four 9’s)
Throughput Every layer (web, cache, database, network, app stack, etc.)
SLAs Availability, reliability, security, etc.
KPIs Examples: Revenue per minute, Avg concurrent users, etc.
User Metrics Registrations, page views, bounce rates, click rates, etc.
Governance/Compliance Access, permissions, intrusion detection, intrusion prevention, cost containment, etc.
Log file analysis Predictive analytics, pattern recognition, etc.
© 2013 Cloud Technology Partners, Inc. / Confidential
18
End to end Monitoring is Required
There is no ONE tool that does it all
Application
Presentation
Session
Transport
Network
Data Link
Physical
InfrastructureMonitoring
User Metrics, KPIs
Web, Browser Metrics
Sessions, Transactions
App Svr, Database, Cache
Packets, Access, Data Transfer
Bandwidth, Trace routes, Requests
CPU, Memory, Disk
© 2013 Cloud Technology Partners, Inc. / Confidential
19
Who needs Monitoring/Logging Data?Actor Purpose
Product Manager Owns Features, reliability, and quality of product
Developers Trace transactions, understand performance/bottlenecks, troubleshoot issues
Testers Performance and regression testing, requirements traceability for the “ilities”
Operations Support infrastructure
NOC and Help Desk First level support and customer support
Business Stakeholders Manage key business metrics, understand user behavior, forecasting, profitability
Deployment team Validate deployment, ensure no negative impact of deployments
Security team Enforcement of policies, intrusion detection & prevention
Compliance team SLA Management, auditing, customer requests for information
Customers/Users Account information, real time billing, application specific metrics
© 2013 Cloud Technology Partners, Inc. / Confidential
20
Synthesized Production Data and Monitoring
Production data that is artificially created to simulate real users within a system in order to test and monitor system features, performance, reliability, and/or scalability
What is Synthetic Data?
Example Use Cases:
1. Test customer in a live production environment2. Test user ID in a live production account3. Netflix’s Simian Army (Purposely creating failures to
test resiliency)
© 2013 Cloud Technology Partners, Inc. / Confidential
21
Think ahead: Create strategies for logging & monitoring
– Log and monitor everything
– Create standards to prevent “Garbage in Garbage out” in your logs
– Put both reactive and proactive monitors in place
– Know what your baseline metrics are and raise alerts when they change
– Be prepared before auditors walk in the door
– Make sure everyone is accountable for reliability and quality
Summary
© 2013 Cloud Technology Partners, Inc. / Confidential
22
Thank you for your time and interest.